Skip to main content
Log in

Efficient schemes for similarity-aware refinement of aggregation queries

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Interactive data exploration platforms in Web, business and scientific domains are becoming increasingly popular. Typically, users without prior knowledge of data interact with these platforms in an exploratory manner hoping they might retrieve the results they are looking for. One way to explore large-volume data is by posing aggregate queries which group values of multiple rows by an aggregate operator to form a single value: an aggregated value. Though, when a query fails, i.e., returns undesired aggregated value, users will have to undertake a frustrating trial-and-error process to refine their queries, until a desired result is attained. This data exploration process, however, is growing rather difficult as the underlying data is typically of large-volume and high-dimensionality. While heuristic-based techniques are fairly successful in generating refined queries that meet specified requirements on the aggregated values, they are rather oblivious to the (dis)similarity between the input query and its corresponding refined version. Meanwhile, enforcing a similarity-aware query refinement is rather a non-trivial challenge, as it requires a careful examination of the query space while maintaining a low processing cost. To address this challenge, we propose an innovative scheme for efficient Similarity-Aware Refinement of Aggregation Queries called (EAGER) which aims to balance the tradeoff between satisfying the aggregate and similarity constraints imposed on the refined query to maximize its overall benefit to the user. To achieve that goal, EAGER implements efficient strategies to minimize the costs incurred in exploring the available search space by utilizing similarity-based and monotonic-based pruning techniques to bound the search space and quickly find a refined query that meets users’ expectations. Our extensive experiments show the scalability exhibited by EAGER under various workload settings, and the significant benefits it provides.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

Notes

  1. http://www.sdss.org

  2. Categorical predicates are beyond the scope of this paper and left for future work.

  3. avg is a special case, we’ll address it later on

  4. http://www.sdss.org

References

  1. Albarrak, A., Noboa, T., Khan, H.A., Sharaf, M.A., Zhou, X., Sadiq, S.: Orange: Objective-aware range query refinement. In: MDM (2014)

  2. Albarrak, A., Sharaf, M.A., Zhou, X.: Saqr: An efficient scheme for similarity-aware query refinement. In: DASFAA (2014)

  3. Aref, W.G., Samet, H.: Efficient processing of window queries in the pyramid data structure. In: PODS, pp. 265–272 (1990)

  4. Bruno, N., Chaudhuri, S., Thomas, D.: Generating queries with cardinality constraints for dbms testing. IEEE Trans. Knowl. Data Eng. 18(12), 1721–1725 (2006)

    Article  Google Scholar 

  5. Çetintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., Zdonik, S.B.: Query steering for interactive data exploration. In: CIDR (2013)

  6. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)

  7. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals CoRR arXiv:0701155 (2007)

  9. Idreos, S., Liarou, E.: dbtouch: Analytics at your fingertips. In: CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research. Online Proceedings, CA, USA (2013)

    Google Scholar 

  10. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)

  11. Islam, M.S., Liu, C., Zhou, R.: A framework for query refinement with user feedback. J Syst Softw 86(6), 1580–1595 (2013)

    Article  Google Scholar 

  12. Joglekar, M., Garcia-Molina, H., Parameswaran, A.G.: Interactive data exploration with smart drill-down. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, pp. 906–917, Helsinki, Finland (2016)

  13. Kadlag, A., Wanjari, A.V., Freire, J., Haritsa, J.R.: Supporting exploratory queries in databases. In: DASFAA, pp. 594–605 (2004)

  14. Kalinin, A., Çetintemel, U., Zdonik, S.B.: Interactive data exploration using semantic windows. In: International Conference on Management of Data, SIGMOD 2014, pp. 505–516, UT, USA (2014)

  15. Kantere, V.: Query similarity for approximate query answering. In: Database and Expert Systems Applications - 27th International Conference, DEXA 2016, pp. 355–367. Proceedings, Part II, Porto, Portugal (2016)

    Google Scholar 

  16. Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.K.: Query relaxation across heterogeneous data sources. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 473–482, VIC, Australia (2015)

  17. Kersten, M.L., Idreos, S., Manegold, S., Liarou, E.: The researcher’s guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB 4(12), 1474–1477 (2011)

    Google Scholar 

  18. Koudas, N., Li, C., Tung, A.K.H., Vernica, R.: Relaxing join and selection queries. In: VLDB, pp. 199–210 (2006)

  19. Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: A location-aware recommender system. In: ICDE, pp. 450–461 (2012)

  20. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over Web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)

    Article  Google Scholar 

  21. Mindolin, D., Chomicki, J.: Discovering relative importance of skyline attributes. PVLDB 2(1), 610–621 (2009)

    Google Scholar 

  22. Mishra, C., Koudas, N.: Interactive query refinement. In: EDBT, pp. 862–873 (2009)

  23. Mishra, C., Koudas, N., Zuzarte, C.: Generating targeted queries for database testing. In: SIGMOD Conference, pp. 499–510 (2008)

  24. Monchaux, S., Amadieu, F., Chevalier, A., Mariné, C.: Query strategies during information searching: Effects of prior domain knowledge and complexity of the information problems to be solved. Inf. Process. Manag. 51(5), 557–569 (2015)

    Article  Google Scholar 

  25. Muslea, I.: Machine learning for online query relaxation. In: KDD, pp. 246–255 (2004)

  26. Muslea, I., Lee, T.J.: Online query relaxation via bayesian causal structures discovery. In: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, pp. 831–836, Pennsylvania, USA (2005)

  27. Pan, L., Luo, J., Li, J.: Probing queries in wireless sensor networks. In: ICDCS, pp. 546–553 (2008)

  28. Sellam, T., Kersten, M.L.: Meet charles, big data query advisor. In: CIDR (2013)

  29. Sellam, T., Kersten, M.L.: Cluster-driven navigation of the query space. IEEE Trans. Knowl. Data Eng. 28(5), 1118–1131 (2016)

    Article  Google Scholar 

  30. Tao, Y., Xiao, X., Pei, J.: Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng. 19(8), 1072–1088 (2007)

  31. Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: Toward user- and query-dependent ranking for Web databases. IEEE Trans. Knowl. Data Eng. 24(9), 1671–1685 (2012)

  32. Tran, Q.T., Chan, C.Y.: How to conquer why-not questions. In: SIGMOD Conference, pp. 15–26 (2010)

  33. Vartak, M., Raghavan, V., Rundensteiner, E.A.: Qrelx: generating meaningful queries that provide cardinality assurance. In: SIGMOD Conference, pp. 1215–1218 (2010)

  34. Vartak, M., Raghavan, V., Rundensteiner, E.A., Madden, S.: Refinement driven processing of aggregation constrained queries. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, pp. 101–112, Bordeaux, France (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullah M. Albarrak.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Albarrak, A.M., Sharaf, M.A. Efficient schemes for similarity-aware refinement of aggregation queries. World Wide Web 20, 1237–1267 (2017). https://doi.org/10.1007/s11280-017-0434-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-017-0434-4

Keywords

Navigation