skip to main content
10.1145/2618243.2618253acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

DivIDE: efficient diversification for interactive data exploration

Published: 30 June 2014 Publication History

Abstract

Today, Interactive Data Exploration (IDE) has become a main constituent of many discovery-oriented applications, in which users repeatedly submit exploratory queries to identify interesting subspaces in large data sets. Returning relevant yet diverse results to such queries provides users with quick insights into a rather large data space. Meanwhile, search results diversification adds additional cost to an already computationally expensive exploration process. To address this challenge, in this paper, we propose a novel diversification scheme called DivIDE, which targets the problem of efficiently diversifying the results of queries posed during data exploration sessions. In particular, our scheme exploits the properties of data diversification functions while leveraging the natural overlap occurring between the results of different queries so that to provide significant reductions in processing costs. Our extensive experimental evaluation on both synthetic and real data sets shows the significant benefits provided by our scheme as compared to existing methods.

References

[1]
R. Agrawal et al. Diversifying search results. In WSDM, 2009.
[2]
S. Agrawal et al. Automated ranking of database query results. In CIDR, 2003.
[3]
A. Albarrak, M. A. Sharaf, and X. Zhou. Saqr: An efficient scheme for similarity-aware query refinement. In DASFAA, 2014.
[4]
A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. CoRR, abs/1203.6397, 2012.
[5]
U. Çetintemel et al. Query steering for interactive data exploration. In CIDR, 2013.
[6]
G. Chatzopoulou, M. Eirinaki, and N. Polyzotis. Query recommendations for interactive database exploration. In SSDBM. 2009.
[7]
G. Chatzopoulou et al. The querie system for personalized query recommendations. IEEE Data Eng. Bull., 34(2), 2011.
[8]
C. L. A. Clarke et al. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008.
[9]
A. Deshpande and S. Madden. Mauvedb: supporting model-based user views in database systems. In SIGMOD, 2006.
[10]
K. Dimitriadou, O. Papaemmanouil, and Y. Diao. Explore-by-example: An automatic query steering framework for interactive data exploration. In SIGMOD, 2014.
[11]
M. Drosou and E. Pitoura. Diversity over continuous data. IEEE Data Eng. Bull., 32(4), 2009.
[12]
M. Drosou and E. Pitoura. Search result diversification. SIGMOD Record, 39(1), 2010.
[13]
M. Drosou and E. Pitoura. Disc diversity: result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012.
[14]
M. Drosou and E. Pitoura. Dynamic diversification of continuous data. In EDBT, 2012.
[15]
E. Erkut. The discrete p-dispersion problem. European Journal of Operational Research, 46(1), 1990.
[16]
E. Erkut, Y. Ülküsal, and O. Yeniçerioglu. A comparison of p-dispersion heuristics. Computers & OR, 21(10), 1994.
[17]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
[18]
I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv., 40(4), 2008.
[19]
M. L. Kersten et al. The researcher's guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB, 4(12), 2011.
[20]
H. A. Khan, M. Drosou, and M. A. Sharaf. Dos: an efficient scheme for the diversification of multiple search results. In SSDBM, 2013.
[21]
H. A. Khan, M. Drosou, and M. A. Sharaf. Scalable diversification of multiple search results. In CIKM, 2013.
[22]
E. Minack, W. Siberski, and W. Nejdl. Incremental diversification for very large sets: a streaming-based approach. In SIGIR, 2011.
[23]
C. Mishra and N. Koudas. Interactive query refinement. In EDBT, 2009.
[24]
D. Nanongkai et al. Interactive regret minimization. In SIGMOD Conference, 2012.
[25]
O'Reilly, S. Boslaugh, and P. Andrew. Statistics in a Nutshell, a desktop quick reference (2. ed.). 2012.
[26]
Parameswaran et al. Seedb: Visualizing database queries efficiently. VLDB, 7, 2013.
[27]
S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Facility dispersion problems: Heuristics and special cases. In WADS, 1991.
[28]
P. Roy et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[29]
A. D. Sarma et al. Beyond skylines and top-k queries: representative databases and e-commerce product search. In CIKM, 2013.
[30]
S. Borzsony, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, 2001.
[31]
T. Sellam and M. L. Kersten. Meet charles, big data query advisor. In CIDR, 2013.
[32]
M. A. Sharaf et al. Balancing energy efficiency and quality of aggregate data in sensor networks. VLDB J., 13(4), 2004.
[33]
Y. Tao, X. Xiao, and J. Pei. Efficient skyline and top-k retrieval in subspaces. IEEE Trans. Knowl. Data Eng., 19(8), 2007.
[34]
Q. T. Tran et al. Query by output. In SIGMOD, 2009.
[35]
M. Vartak, V. Raghavan, and E. A. Rundensteiner. Qrelx: Generating meaningful queries that provide cardinality assurance. In SIGMOD, 2010.
[36]
M. R. Vieira et al. On query result diversification. In ICDE, 2011.

Cited By

View all
  • (2023)Efficient Diversification for Recommending Aggregate Data VisualizationsIEEE Access10.1109/ACCESS.2023.328345711(62261-62280)Online publication date: 2023
  • (2022)Automatic assessment of interactive OLAP explorationsInformation Systems10.1016/j.is.2018.06.00882:C(148-163)Online publication date: 19-Apr-2022
  • (2020)Linked Data Visualization: Techniques, Tools, and Big DataSynthesis Lectures on the Semantic Web: Theory and Technology10.2200/S00967ED1V01Y201911WBE01910:1(1-157)Online publication date: 18-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '14: Proceedings of the 26th International Conference on Scientific and Statistical Database Management
June 2014
417 pages
ISBN:9781450327220
DOI:10.1145/2618243
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SSDBM '14

Acceptance Rates

SSDBM '14 Paper Acceptance Rate 26 of 71 submissions, 37%;
Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Diversification for Recommending Aggregate Data VisualizationsIEEE Access10.1109/ACCESS.2023.328345711(62261-62280)Online publication date: 2023
  • (2022)Automatic assessment of interactive OLAP explorationsInformation Systems10.1016/j.is.2018.06.00882:C(148-163)Online publication date: 19-Apr-2022
  • (2020)Linked Data Visualization: Techniques, Tools, and Big DataSynthesis Lectures on the Semantic Web: Theory and Technology10.2200/S00967ED1V01Y201911WBE01910:1(1-157)Online publication date: 18-Mar-2020
  • (2020)Serendipity-based Points-of-Interest NavigationACM Transactions on Internet Technology10.1145/339119720:4(1-32)Online publication date: 1-Oct-2020
  • (2019)Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping StudyIEEE Access10.1109/ACCESS.2018.28822447(10691-10717)Online publication date: 2019
  • (2018)RC-indexProceedings of the VLDB Endowment10.14778/3192965.319296911:7(773-786)Online publication date: 1-Mar-2018
  • (2018)Data Service API Design for Data AnalyticsServices Computing – SCC 201810.1007/978-3-319-94376-3_6(87-102)Online publication date: 19-Jun-2018
  • (2017)A hierarchical aggregation framework for efficient multilevel visual exploration and analysisSemantic Web10.3233/SW-1602268:1(139-179)Online publication date: 1-Jan-2017
  • (2017)Efficient diversified set monitoring for mobile sensor stream environments2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257964(500-507)Online publication date: Dec-2017
  • (2017)Model-Based Diversification for Sequential Exploratory QueriesData Science and Engineering10.1007/s41019-017-0038-02:2(151-168)Online publication date: 27-Mar-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media