research-article

Answering top-k representative queries on graph databases

Authors:

Ambuj SinghAuthors Info & Claims

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 1163 - 1174

https://doi.org/10.1145/2588555.2610524

Published: 18 June 2014 Publication History

Abstract

Given a function that classifies a data object as relevant or irrelevant, we consider the task of selecting k objects that best represent all relevant objects in the underlying database. This problem occurs naturally when analysts want to familiarize themselves with the relevant objects in a database using a small set of k exemplars. In this paper, we solve the problem of top-k representative queries on graph databases. While graph databases model a wide range of scientific data, solving the problem in the context of graphs presents us with unique challenges due to the inherent complexity of matching structures. Furthermore, top-k representative queries map to the classic Set Cover problem, making it NP-hard. To overcome these challenges, we develop a greedy approximation with theoretical guarantees on the quality of the answer set, noting that a better approximation is not feasible in polynomial time. To further optimize the quadratic computational cost of the greedy algorithm, we propose an index structure called NB-Index to index the \theta-neighborhoods of the database graphs by employing a novel combination of Lipschitz embedding and agglomerative clustering. Extensive experiments on real graph datasets validate the efficiency and effectiveness of the proposed techniques that achieve up to two orders of magnitude speed-up over state-of-the-art algorithms.

References

[1]

SNAP, http://snap.stanford.edu/.

[2]

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, 2009.

Digital Library

[3]

A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD Conference, pages 781--792, 2011.

Digital Library

[4]

C. G. Ballard. Advances in the treatment of alzheimer's disease: benefits of dual cholinesterase inhibition. Eur Neurol, 47(1):64--70, 2002.

[5]

J. Bourgain. On lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics, 52:46--52, 1985.

[6]

G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Efficient diversification of web search results. PVLDB, 4, 2011.

Digital Library

[7]

V. Chaoji, S. Ranu, R. Rastogi, and R. Bhatt. Recommendations to boost content spread in social networks. In WWW, pages 529--538, 2012.

Digital Library

[8]

H. Cheng, D. Lo, Y. Zhou, X. Wang, and X. Yan. Identifying bug signatures using discriminative graph mining. In Symposium on software testing and analysis, 2009.

Digital Library

[9]

M. Drosou and E. Pitoura. Disc diversity: result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012.

Digital Library

[10]

DUD. http://dud.docking.org/r2/.

[11]

U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45:634--652, July 1998.

Digital Library

[12]

H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, 2006.

Digital Library

[13]

M. Hua, J. Pei, A. Fu, X. Lin, and H.-F. Leung. Top-k typicality queries and efficient query answering methods on large databases. The VLDB Journal, 18(3):809--835, 2009.

Digital Library

[14]

M. Hua, J. Pei, A. W. C. Fu, X. Lin, and H. fung Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.

Digital Library

[15]

R. Li and J. X. Yu. Scalable diversified ranking on graphs. In ICDM, 2011.

Digital Library

[16]

X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In ICDE, pages 86--95, 2007.

[17]

K. Macropol and A. Singh. Content-based modeling and prediction of information dissemination. In ASONAM, 2011.

Digital Library

[18]

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions-i. Mathematical Programming, 14(1):265--294, 1978.

Digital Library

[19]

L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. PVLDB, 5(11):1124--1135, 2012.

Digital Library

[20]

S. Ranu, B. T. Calhoun, A. K. Singh, and S. J. Swamidass. Probabilistic substructure mining from small-molecule screens. Molecular Informatics, 30(9):809--815, 2011.

[21]

S. Ranu, M. Hoang, and A. Singh. Mining discriminative subgraphs from global-state networks. In SIGKDD, pages 509--517, 2013.

Digital Library

[22]

S. Ranu and A. K. Singh. Answering top-k queries over a mixture of attractive and repulsive dimensions. PVLDB, 5(3):169--180, 2011.

Digital Library

[23]

S. Ranu and A. K. Singh. Indexing and mining topological patterns for drug discovery. In EDBT, pages 562--565, 2012.

Digital Library

[24]

H. Tong, J. He, Z. Wen, R. Konuru, and C.-Y. Lin. Diversified ranking on large graphs: an optimization viewpoint. In SIGKDD, 2011.

Digital Library

[25]

E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, 2008.

Digital Library

[26]

X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, 2005.

Digital Library

[27]

C. Yu, L. Lakshmanan, and S. A. Yahia. It takes variety to make a world: diversification in recommender systems. In EDBT, 2009.

Digital Library

[28]

Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1), 2009.

Digital Library

[29]

P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach. Springer-Verlag, 2006.

Digital Library

[30]

Y. Zhu, L. Qin, J. X. Yu, and H. Cheng. Finding top-k similar graphs in graph databases. In EDBT, pages 456--467, 2012.

Digital Library

[31]

L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Digital Library

Cited By

Ranjan RGrover SMedya SChakravarthy VSabharwal YRanu SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)GREEDProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601906(22518-22530)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601906
Siddiqui TJo SWu WWang CNarasayya VChaudhuri SIves ZBonifati AEl Abbadi A(2022)ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index TuningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526152(660-673)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526152
Kim J(2022)Efficient Top-k Graph Similarity Search With GED ConstraintsIEEE Access10.1109/ACCESS.2022.319455910(79180-79191)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3194559
Show More Cited By

Index Terms

Answering top-k representative queries on graph databases
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Applications of Top-k Representative Queries
SWIM'14: Proceedings of Semantic Web Information Management on Semantic Web Information Management

Top-k queries find a list of k objects that have the largest scores based on some user-provided relevance function. In practice, objects in the top-k list are often similar to each other and are thus informationally redundant. This possibility of ...
Answering Top-k Keyword Queries on Relational Databases

Keyword search in relational databases allows the user to search information without knowing database schema and using structural query language. As results needed by user are assembled from connected tuples of multiple relations, ranking keyword ...
Top-k dominating queries in uncertain databases
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

June 2014

1645 pages

ISBN:9781450323765

DOI:10.1145/2588555

General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SIGMOD/PODS'14

Sponsor:

SIGMOD

SIGMOD/PODS'14: International Conference on Management of Data

June 22 - 27, 2014

Utah, Snowbird, USA

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
875
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ranjan RGrover SMedya SChakravarthy VSabharwal YRanu SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)GREEDProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601906(22518-22530)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601906
Siddiqui TJo SWu WWang CNarasayya VChaudhuri SIves ZBonifati AEl Abbadi A(2022)ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index TuningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526152(660-673)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526152
Kim J(2022)Efficient Top-k Graph Similarity Search With GED ConstraintsIEEE Access10.1109/ACCESS.2022.319455910(79180-79191)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3194559
Lyu BQin LLin XZhang YQian ZZhou J(2022)Maximum and top-k diversified biclique search at scaleThe VLDB Journal10.1007/s00778-021-00681-631:6(1365-1389)Online publication date: 18-Apr-2022
https://doi.org/10.1007/s00778-021-00681-6
Zhu XHuang XChoi BXu JCheung WZhang YLiu J(2021)Efficient and Optimal Algorithms for Tree Summarization with Weighted TerminologiesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3120722(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3120722
Deep SGruenheid AKoutris PNaughton JViglas S(2020)Comprehensive and efficient workload compressionProceedings of the VLDB Endowment10.5555/3430915.344243914:3(418-430)Online publication date: 14-Dec-2020
Prateek AKhan AGoyal ARanu S(2020)Mining Top-k pairs of correlated subgraphs in a large networkProceedings of the VLDB Endowment10.14778/3397230.339724513:9(1511-1524)Online publication date: 26-Jun-2020
https://dl.acm.org/doi/10.14778/3397230.3397245
Ben Lahmar HHerschel M(2020)Collaborative filtering over evolution provenance data for interactive visual data explorationInformation Systems10.1016/j.is.2020.101620(101620)Online publication date: Aug-2020
https://doi.org/10.1016/j.is.2020.101620
Wang YLi YFan JYe CChai M(2020)A survey of typical attributed graph queriesWorld Wide Web10.1007/s11280-020-00849-0Online publication date: 20-Nov-2020
https://doi.org/10.1007/s11280-020-00849-0
Vachery JArora ARanu SBhattacharya A(2019)RAQ: Relationship-Aware Graph Querying in Large NetworksThe World Wide Web Conference10.1145/3308558.3313448(1886-1896)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313448
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten