Abstract
Interactive data exploration platforms in Web, business and scientific domains are becoming increasingly popular. Typically, users without prior knowledge of data interact with these platforms in an exploratory manner hoping they might retrieve the results they are looking for. One way to explore large-volume data is by posing aggregate queries which group values of multiple rows by an aggregate operator to form a single value: an aggregated value. Though, when a query fails, i.e., returns undesired aggregated value, users will have to undertake a frustrating trial-and-error process to refine their queries, until a desired result is attained. This data exploration process, however, is growing rather difficult as the underlying data is typically of large-volume and high-dimensionality. While heuristic-based techniques are fairly successful in generating refined queries that meet specified requirements on the aggregated values, they are rather oblivious to the (dis)similarity between the input query and its corresponding refined version. Meanwhile, enforcing a similarity-aware query refinement is rather a non-trivial challenge, as it requires a careful examination of the query space while maintaining a low processing cost. To address this challenge, we propose an innovative scheme for efficient Similarity-Aware Refinement of Aggregation Queries called (EAGER) which aims to balance the tradeoff between satisfying the aggregate and similarity constraints imposed on the refined query to maximize its overall benefit to the user. To achieve that goal, EAGER implements efficient strategies to minimize the costs incurred in exploring the available search space by utilizing similarity-based and monotonic-based pruning techniques to bound the search space and quickly find a refined query that meets users’ expectations. Our extensive experiments show the scalability exhibited by EAGER under various workload settings, and the significant benefits it provides.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-017-0434-4/MediaObjects/11280_2017_434_Fig11_HTML.gif)
Similar content being viewed by others
Notes
Categorical predicates are beyond the scope of this paper and left for future work.
avg is a special case, we’ll address it later on
References
Albarrak, A., Noboa, T., Khan, H.A., Sharaf, M.A., Zhou, X., Sadiq, S.: Orange: Objective-aware range query refinement. In: MDM (2014)
Albarrak, A., Sharaf, M.A., Zhou, X.: Saqr: An efficient scheme for similarity-aware query refinement. In: DASFAA (2014)
Aref, W.G., Samet, H.: Efficient processing of window queries in the pyramid data structure. In: PODS, pp. 265–272 (1990)
Bruno, N., Chaudhuri, S., Thomas, D.: Generating queries with cardinality constraints for dbms testing. IEEE Trans. Knowl. Data Eng. 18(12), 1721–1725 (2006)
Çetintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., Zdonik, S.B.: Query steering for interactive data exploration. In: CIDR (2013)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals CoRR arXiv:0701155 (2007)
Idreos, S., Liarou, E.: dbtouch: Analytics at your fingertips. In: CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research. Online Proceedings, CA, USA (2013)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Islam, M.S., Liu, C., Zhou, R.: A framework for query refinement with user feedback. J Syst Softw 86(6), 1580–1595 (2013)
Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Interactive data exploration with smart drill-down. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, pp. 906–917, Helsinki, Finland (2016)
Kadlag, A., Wanjari, A.V., Freire, J., Haritsa, J.R.: Supporting exploratory queries in databases. In: DASFAA, pp. 594–605 (2004)
Kalinin, A., Çetintemel, U., Zdonik, S.B.: Interactive data exploration using semantic windows. In: International Conference on Management of Data, SIGMOD 2014, pp. 505–516, UT, USA (2014)
Kantere, V.: Query similarity for approximate query answering. In: Database and Expert Systems Applications - 27th International Conference, DEXA 2016, pp. 355–367. Proceedings, Part II, Porto, Portugal (2016)
Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.K.: Query relaxation across heterogeneous data sources. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 473–482, VIC, Australia (2015)
Kersten, M.L., Idreos, S., Manegold, S., Liarou, E.: The researcher’s guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB 4(12), 1474–1477 (2011)
Koudas, N., Li, C., Tung, A.K.H., Vernica, R.: Relaxing join and selection queries. In: VLDB, pp. 199–210 (2006)
Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: A location-aware recommender system. In: ICDE, pp. 450–461 (2012)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over Web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Mindolin, D., Chomicki, J.: Discovering relative importance of skyline attributes. PVLDB 2(1), 610–621 (2009)
Mishra, C., Koudas, N.: Interactive query refinement. In: EDBT, pp. 862–873 (2009)
Mishra, C., Koudas, N., Zuzarte, C.: Generating targeted queries for database testing. In: SIGMOD Conference, pp. 499–510 (2008)
Monchaux, S., Amadieu, F., Chevalier, A., Mariné, C.: Query strategies during information searching: Effects of prior domain knowledge and complexity of the information problems to be solved. Inf. Process. Manag. 51(5), 557–569 (2015)
Muslea, I.: Machine learning for online query relaxation. In: KDD, pp. 246–255 (2004)
Muslea, I., Lee, T.J.: Online query relaxation via bayesian causal structures discovery. In: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, pp. 831–836, Pennsylvania, USA (2005)
Pan, L., Luo, J., Li, J.: Probing queries in wireless sensor networks. In: ICDCS, pp. 546–553 (2008)
Sellam, T., Kersten, M.L.: Meet charles, big data query advisor. In: CIDR (2013)
Sellam, T., Kersten, M.L.: Cluster-driven navigation of the query space. IEEE Trans. Knowl. Data Eng. 28(5), 1118–1131 (2016)
Tao, Y., Xiao, X., Pei, J.: Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng. 19(8), 1072–1088 (2007)
Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: Toward user- and query-dependent ranking for Web databases. IEEE Trans. Knowl. Data Eng. 24(9), 1671–1685 (2012)
Tran, Q.T., Chan, C.Y.: How to conquer why-not questions. In: SIGMOD Conference, pp. 15–26 (2010)
Vartak, M., Raghavan, V., Rundensteiner, E.A.: Qrelx: generating meaningful queries that provide cardinality assurance. In: SIGMOD Conference, pp. 1215–1218 (2010)
Vartak, M., Raghavan, V., Rundensteiner, E.A., Madden, S.: Refinement driven processing of aggregation constrained queries. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, pp. 101–112, Bordeaux, France (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Albarrak, A.M., Sharaf, M.A. Efficient schemes for similarity-aware refinement of aggregation queries. World Wide Web 20, 1237–1267 (2017). https://doi.org/10.1007/s11280-017-0434-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-017-0434-4