skip to main content
10.1145/3448016.3452762acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

PyExplore: Query Recommendations for Data Exploration without Query Logs

Published: 18 June 2021 Publication History

Abstract

Helping users explore data becomes increasingly more important as databases get larger and more complex. In this demo, we present PyExplore, a data exploration tool aimed at helping end users formulate queries over new datasets. PyExplore takes as input an initial query from the user along with some parameters and provides interesting queries by leveraging data correlations and diversity.

Supplementary Material

MP4 File (3448016.3452762.mp4)
Helping users explore data becomes increasingly more important as databases get larger and more complex. In this demo, we present PyExplore, a data exploration tool aimed at helping end users formulate queries over new datasets. PyExplore takes as input an initial query from the user along with some parameters and provides interesting queries by leveraging data correlations and diversity.

References

[1]
Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: ordering points to identify the clustering structure. ACM Sigmod record, Vol. 28, 2 (1999), 49--60.
[2]
Harald Cramér. 1999. Mathematical methods of statistics . Vol. 43. Princeton university press.
[3]
Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In ACM SIGKDD. 71--80.
[4]
Magdalini Eirinaki and Sweta Patel. 2015. QueRIE reloaded: Using matrix factorization to improve database query recommendations. In IEEE Big Data. 1500--1508.
[5]
Bill Howe, Garrett Cole, Nodira Khoussainova, and Leilani Battle. 2011. Automatic example queries for ad hoc databases. In ACM SIGMOD. ACM, 1319--1322.
[6]
Zhexue Huang. 1997 a. Clustering large data sets with mixed numeric and categorical values. In PAKDD. Singapore, 21--34.
[7]
Zhexue Huang. 1997 b. A fast clustering algorithm to cluster very large categorical data sets in data mining. DMKD, Vol. 3, 8 (1997), 34--39.
[8]
Zhexue Huang. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, Vol. 2, 3 (1998), 283--304.
[9]
Nodira Khoussainova, YongChul Kwon, Magdalena Balazinska, and Dan Suciu. 2010. SnipSuggest: Context-Aware Autocompletion for SQL . Proc. VLDB Endow., Vol. 4, 1 (2010), 22--33.
[10]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. 1188--1196.
[11]
Marie Le Guilly, Jean-Marc Petit, Vasile-Marian Scuturici, and Ihab F Ilyas. 2019. ExplIQuE: Interactive Databases Exploration with SQL. In ACM CIKM. 2877--2880.
[12]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[13]
Tova Milo and Amit Somech. 2020. Automating Exploratory Data Analysis via Machine Learning: An Overview. In SIGMOD Conference 2020. ACM, 2617--2622.
[14]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, Vol. 12 (2011), 2825--2830.
[15]
M. Qiu. 2004. Evaluation of Clustering Techniques In Data Mining Tools.
[16]
David Sculley. 2010. Web-scale k-means clustering. In World Wide Web Conference. 1177--1178.
[17]
Thibault Sellam and Martin Kersten. 2016. Cluster-driven navigation of the query space. IEEE TKDE, Vol. 28, 5 (2016), 1118--1131.
[18]
Claude E Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, Vol. 27, 3 (1948), 379--423.
[19]
Xiaoyan Yang, Cecilia M. Procopiuc, and Divesh Srivastava. 2009. Recommending Join Queries via Query Log Analysis. In ICDE. 964--975.

Cited By

View all
  • (2024)Guided SQL-Based Data Exploration with User Feedback2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00372(4884-4896)Online publication date: 13-May-2024
  • (2024)PyExplore 2.0: Explainable, Approximate and Combined Clustering Based SQL Query RecommendationsManagement of Digital EcoSystems10.1007/978-3-031-51643-6_7(88-102)Online publication date: 2-Feb-2024
  • (2023)TRANSQLATION: TRANsformer-based SQL RecommendATION2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386277(4703-4711)Online publication date: 15-Dec-2023
  • Show More Cited By

Index Terms

  1. PyExplore: Query Recommendations for Data Exploration without Query Logs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
    June 2021
    2969 pages
    ISBN:9781450383431
    DOI:10.1145/3448016
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. data exploration
    3. query recommendations

    Qualifiers

    • Short-paper

    Conference

    SIGMOD/PODS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Guided SQL-Based Data Exploration with User Feedback2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00372(4884-4896)Online publication date: 13-May-2024
    • (2024)PyExplore 2.0: Explainable, Approximate and Combined Clustering Based SQL Query RecommendationsManagement of Digital EcoSystems10.1007/978-3-031-51643-6_7(88-102)Online publication date: 2-Feb-2024
    • (2023)TRANSQLATION: TRANsformer-based SQL RecommendATION2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386277(4703-4711)Online publication date: 15-Dec-2023
    • (2023)A survey on deep learning approaches for text-to-SQLThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00776-832:4(905-936)Online publication date: 23-Jan-2023
    • (2022)Personalized Query Suggestion with Searching Dynamic Flow for Online RecruitmentProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557416(2773-2783)Online publication date: 17-Oct-2022
    • (2022)BETZE: Benchmarking Data Exploration Tools with (Almost) Zero Effort2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00224(2385-2398)Online publication date: May-2022
    • (2021)DatAgentProceedings of the VLDB Endowment10.14778/3476311.347635214:12(2815-2818)Online publication date: 1-Jul-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media