Efficient schemes for similarity-aware refinement of aggregation queries

Albarrak, Abdullah M.; Sharaf, Mohamed A.

doi:10.1007/s11280-017-0434-4

Efficient schemes for similarity-aware refinement of aggregation queries

Published: 23 January 2017

Volume 20, pages 1237–1267, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Abdullah M. Albarrak¹ &
Mohamed A. Sharaf¹

385 Accesses
Explore all metrics

Abstract

Interactive data exploration platforms in Web, business and scientific domains are becoming increasingly popular. Typically, users without prior knowledge of data interact with these platforms in an exploratory manner hoping they might retrieve the results they are looking for. One way to explore large-volume data is by posing aggregate queries which group values of multiple rows by an aggregate operator to form a single value: an aggregated value. Though, when a query fails, i.e., returns undesired aggregated value, users will have to undertake a frustrating trial-and-error process to refine their queries, until a desired result is attained. This data exploration process, however, is growing rather difficult as the underlying data is typically of large-volume and high-dimensionality. While heuristic-based techniques are fairly successful in generating refined queries that meet specified requirements on the aggregated values, they are rather oblivious to the (dis)similarity between the input query and its corresponding refined version. Meanwhile, enforcing a similarity-aware query refinement is rather a non-trivial challenge, as it requires a careful examination of the query space while maintaining a low processing cost. To address this challenge, we propose an innovative scheme for efficient Similarity-Aware Refinement of Aggregation Queries called (EAGER) which aims to balance the tradeoff between satisfying the aggregate and similarity constraints imposed on the refined query to maximize its overall benefit to the user. To achieve that goal, EAGER implements efficient strategies to minimize the costs incurred in exploring the available search space by utilizing similarity-based and monotonic-based pruning techniques to bound the search space and quickly find a refined query that meets users’ expectations. Our extensive experiments show the scalability exhibited by EAGER under various workload settings, and the significant benefits it provides.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://www.sdss.org
Categorical predicates are beyond the scope of this paper and left for future work.
avg is a special case, we’ll address it later on
http://www.sdss.org

References

Albarrak, A., Noboa, T., Khan, H.A., Sharaf, M.A., Zhou, X., Sadiq, S.: Orange: Objective-aware range query refinement. In: MDM (2014)
Albarrak, A., Sharaf, M.A., Zhou, X.: Saqr: An efficient scheme for similarity-aware query refinement. In: DASFAA (2014)
Aref, W.G., Samet, H.: Efficient processing of window queries in the pyramid data structure. In: PODS, pp. 265–272 (1990)
Bruno, N., Chaudhuri, S., Thomas, D.: Generating queries with cardinality constraints for dbms testing. IEEE Trans. Knowl. Data Eng. 18(12), 1721–1725 (2006)
Article Google Scholar
Çetintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., Zdonik, S.B.: Query steering for interactive data exploration. In: CIDR (2013)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals CoRR arXiv:0701155 (2007)
Idreos, S., Liarou, E.: dbtouch: Analytics at your fingertips. In: CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research. Online Proceedings, CA, USA (2013)
Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Islam, M.S., Liu, C., Zhou, R.: A framework for query refinement with user feedback. J Syst Softw 86(6), 1580–1595 (2013)
Article Google Scholar
Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Interactive data exploration with smart drill-down. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, pp. 906–917, Helsinki, Finland (2016)
Kadlag, A., Wanjari, A.V., Freire, J., Haritsa, J.R.: Supporting exploratory queries in databases. In: DASFAA, pp. 594–605 (2004)
Kalinin, A., Çetintemel, U., Zdonik, S.B.: Interactive data exploration using semantic windows. In: International Conference on Management of Data, SIGMOD 2014, pp. 505–516, UT, USA (2014)
Kantere, V.: Query similarity for approximate query answering. In: Database and Expert Systems Applications - 27th International Conference, DEXA 2016, pp. 355–367. Proceedings, Part II, Porto, Portugal (2016)
Google Scholar
Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.K.: Query relaxation across heterogeneous data sources. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 473–482, VIC, Australia (2015)
Kersten, M.L., Idreos, S., Manegold, S., Liarou, E.: The researcher’s guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB 4(12), 1474–1477 (2011)
Google Scholar
Koudas, N., Li, C., Tung, A.K.H., Vernica, R.: Relaxing join and selection queries. In: VLDB, pp. 199–210 (2006)
Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: A location-aware recommender system. In: ICDE, pp. 450–461 (2012)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over Web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Article Google Scholar
Mindolin, D., Chomicki, J.: Discovering relative importance of skyline attributes. PVLDB 2(1), 610–621 (2009)
Google Scholar
Mishra, C., Koudas, N.: Interactive query refinement. In: EDBT, pp. 862–873 (2009)
Mishra, C., Koudas, N., Zuzarte, C.: Generating targeted queries for database testing. In: SIGMOD Conference, pp. 499–510 (2008)
Monchaux, S., Amadieu, F., Chevalier, A., Mariné, C.: Query strategies during information searching: Effects of prior domain knowledge and complexity of the information problems to be solved. Inf. Process. Manag. 51(5), 557–569 (2015)
Article Google Scholar
Muslea, I.: Machine learning for online query relaxation. In: KDD, pp. 246–255 (2004)
Muslea, I., Lee, T.J.: Online query relaxation via bayesian causal structures discovery. In: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, pp. 831–836, Pennsylvania, USA (2005)
Pan, L., Luo, J., Li, J.: Probing queries in wireless sensor networks. In: ICDCS, pp. 546–553 (2008)
Sellam, T., Kersten, M.L.: Meet charles, big data query advisor. In: CIDR (2013)
Sellam, T., Kersten, M.L.: Cluster-driven navigation of the query space. IEEE Trans. Knowl. Data Eng. 28(5), 1118–1131 (2016)
Article Google Scholar
Tao, Y., Xiao, X., Pei, J.: Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng. 19(8), 1072–1088 (2007)
Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: Toward user- and query-dependent ranking for Web databases. IEEE Trans. Knowl. Data Eng. 24(9), 1671–1685 (2012)
Tran, Q.T., Chan, C.Y.: How to conquer why-not questions. In: SIGMOD Conference, pp. 15–26 (2010)
Vartak, M., Raghavan, V., Rundensteiner, E.A.: Qrelx: generating meaningful queries that provide cardinality assurance. In: SIGMOD Conference, pp. 1215–1218 (2010)
Vartak, M., Raghavan, V., Rundensteiner, E.A., Madden, S.: Refinement driven processing of aggregation constrained queries. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, pp. 101–112, Bordeaux, France (2016)

Download references

Author information

Authors and Affiliations

University of Queensland, Queensland, Australia
Abdullah M. Albarrak & Mohamed A. Sharaf

Authors

Abdullah M. Albarrak
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed A. Sharaf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdullah M. Albarrak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albarrak, A.M., Sharaf, M.A. Efficient schemes for similarity-aware refinement of aggregation queries. World Wide Web 20, 1237–1267 (2017). https://doi.org/10.1007/s11280-017-0434-4

Download citation

Received: 09 December 2015
Revised: 19 October 2016
Accepted: 11 January 2017
Published: 23 January 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11280-017-0434-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient schemes for similarity-aware refinement of aggregation queries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

Scalable aggregation predictive analytics

Online Aggregation: A Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient schemes for similarity-aware refinement of aggregation queries

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

Scalable aggregation predictive analytics

Online Aggregation: A Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation