PyExplore 2.0: Explainable, Approximate and Combined Clustering Based SQL Query Recommendations

Glenis, Apostolos

doi:10.1007/978-3-031-51643-6_7

Apostolos Glenis¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2022))

Included in the following conference series:

International Conference on Management of Digital

232 Accesses

Abstract

While the benefit of data exploration becomes increasingly more prominent, factors such as the data volume and complexity and user unfamiliarity with the database contents, make querying data a non-trivial, time-consuming process. The big challenge for users is to find which query to ask at any point. PyExplore is a data exploration framework that aims to help users explore datasets by providing SQL query recommendations. The user provides an initial SQL query and then pyExplore provides new SQL queries with augmented WHERE clause. In this paper, we extend pyExplore with four new workflows one for approximate query recommendations, one for explainable query completions, one for combined explainable and approximate recommendation and finally a sampled decision tree workflow that is similar to pyExplore’s original workflow but this time only a small portion of the dataset gets processed. The purpose of the explainable workflows is to provide recommendations that are intuitive to the end user while the purpose of approximate workflows is to significantly reduce execution time compared to the full workflow. We evaluated the four workflows in terms of execution time and speedup compared to the full workflow. We found out that a) the quality of the approximate recommendations is on-par with the full workflow b) the explainable workflow is faster than using a decision tree classifier to produce the queries c) the approximate workflow is significantly faster than the full workflow.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Assisted design of data science pipelines

Article Open access 13 February 2024

WaveLSea: helping experts interactively explore pattern mining search spaces

Article 26 May 2024

Enriching SQL-Driven Data Exploration with Different Machine Learning Models

Notes

References

Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Chapter Google Scholar
Bader, M.: Space-Filling Curves: An Introduction with Applications in Scientific Computing, vol. 9. Springer, Cham (2012)
Google Scholar
Dimitriadou, K., Papaemmanouil, O., Diao, Y.: AIDE: an active learning-based approach for interactive data exploration. IEEE Trans. Knowl. Data Eng. 28(11), 2842–2856 (2016). https://doi.org/10.1109/TKDE.2016.2599168
Article Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: ACM SIGKDD, pp. 71–80 (2000)
Google Scholar
Eirinaki, M., Patel, S.: Querie reloaded: using matrix factorization to improve database query recommendations. In: 2015 IEEE International Conference on Big Data, Big Data 2015, Santa Clara, CA, USA, October 29 - November 1, 2015, pp. 1500–1508. IEEE Computer Society (2015). https://doi.org/10.1109/BigData.2015.7363913
Glenis, A., Koutrika, G.: Pyexplore: query recommendations for data exploration without query logs. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2731–2735 (2021)
Google Scholar
Howe, B., Cole, G., Khoussainova, N., Battle, L.: Automatic example queries for ad hoc databases. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A., Velegrakis, Y. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12–16, 2011, pp. 1319–1322. ACM (2011)
Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining,(PAKDD), pp. 21–34. Singapore (1997)
Google Scholar
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. DMKD 3(8), 34–39 (1997)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Article Google Scholar
Kalinin, A., Çetintemel, U., Zhao, Z., Zdonik, S.B.: Dynamic query refinement for interactive data exploration. In: Bonifati, A., Zhou, Y., Salles, M.A.V., Böhm, A., Olteanu, D., Fletcher, G.H.L., Khan, A., Yang, B. (eds.) Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, pp. 49–60. OpenProceedings.org (2020). https://doi.org/10.5441/002/edbt.2020.06
Khoussainova, N., Kwon, Y., Balazinska, M., Suciu, D.: SnipSuggest: context-aware autocompletion for SQL. Proc. VLDB Endow. 4(1), 22–33 (2010)
Article Google Scholar
Le Guilly, M., Petit, J.M., Scuturici, V.M., Ilyas, I.F.: Explique: interactive databases exploration with SQL. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2877–2880 (2019)
Google Scholar
Luo, Y., Qin, X., Tang, N., Li, G.: DeepEye: towards automatic data visualization. In: 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16–19, 2018, pp. 101–112. IEEE Computer Society (2018)
Google Scholar
Sculley, D.: Web-scale k-means clustering. In: World Wide Web Conference, pp. 1177–1178 (2010)
Google Scholar
Sellam, T., Kersten, M.: Cluster-driven navigation of the query space. IEEE Trans. Knowl. Data Eng. 28(5), 1118–1131 (2016)
Article Google Scholar
Sellam, T., Kersten, M.: Have a chat with clustine, conversational engine to query large tables. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, pp. 1–6 (2016)
Google Scholar
Sellam, T., Kersten, M.: Ziggy: characterizing query results for data explorers. Proc. VLDB Endowment 9(13), 1473–1476 (2016)
Article Google Scholar
Tahery, S., Farzi, S.: Customized query auto-completion and suggestion - a review. Inf. Syst. 87, 101415 (2020)
Article Google Scholar
Yang, X., Procopiuc, C.M., Srivastava, D.: Recommending join queries via query log analysis. In: Ioannidis, Y.E., Lee, D.L., Ng, R.T. (eds.) Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, pp. 964–975. IEEE Computer Society (2009)
Google Scholar
Zhang, X., Ge, X., Chrysanthis, P.K., Sharaf, M.A.: Viewseeker: an interactive view recommendation tool. In: Papotti, P. (ed.) Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019. CEUR Workshop Proceedings, vol. 2322. CEUR-WS.org (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Athena RC, Athens, Greece
Apostolos Glenis

Authors

Apostolos Glenis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Apostolos Glenis .

Editor information

Editors and Affiliations

University of Pau & Pays de l'Adour, Anglet, France
Richard Chbeir
Claude Bernard University Lyon 1, Villeurbanne Cedex, France
Djamal Benslimane
Technical University of Crete, Chania, Greece
Michalis Zervakis
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
Wroclaw University of Science, Wroclaw, Poland
Ngoc Thanh Ngyuen
Lebanese American University, Byblos, Lebanon
Joe Tekli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Glenis, A. (2024). PyExplore 2.0: Explainable, Approximate and Combined Clustering Based SQL Query Recommendations. In: Chbeir, R., Benslimane, D., Zervakis, M., Manolopoulos, Y., Ngyuen, N.T., Tekli, J. (eds) Management of Digital EcoSystems. MEDES 2023. Communications in Computer and Information Science, vol 2022. Springer, Cham. https://doi.org/10.1007/978-3-031-51643-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-51643-6_7
Published: 02 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51642-9
Online ISBN: 978-3-031-51643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PyExplore 2.0: Explainable, Approximate and Combined Clustering Based SQL Query Recommendations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Assisted design of data science pipelines

WaveLSea: helping experts interactively explore pattern mining search spaces

Enriching SQL-Driven Data Exploration with Different Machine Learning Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PyExplore 2.0: Explainable, Approximate and Combined Clustering Based SQL Query Recommendations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Assisted design of data science pipelines

WaveLSea: helping experts interactively explore pattern mining search spaces

Enriching SQL-Driven Data Exploration with Different Machine Learning Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation