skip to main content
10.1145/2588555.2610524acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Answering top-k representative queries on graph databases

Published: 18 June 2014 Publication History

Abstract

Given a function that classifies a data object as relevant or irrelevant, we consider the task of selecting k objects that best represent all relevant objects in the underlying database. This problem occurs naturally when analysts want to familiarize themselves with the relevant objects in a database using a small set of k exemplars. In this paper, we solve the problem of top-k representative queries on graph databases. While graph databases model a wide range of scientific data, solving the problem in the context of graphs presents us with unique challenges due to the inherent complexity of matching structures. Furthermore, top-k representative queries map to the classic Set Cover problem, making it NP-hard. To overcome these challenges, we develop a greedy approximation with theoretical guarantees on the quality of the answer set, noting that a better approximation is not feasible in polynomial time. To further optimize the quadratic computational cost of the greedy algorithm, we propose an index structure called NB-Index to index the \theta-neighborhoods of the database graphs by employing a novel combination of Lipschitz embedding and agglomerative clustering. Extensive experiments on real graph datasets validate the efficiency and effectiveness of the proposed techniques that achieve up to two orders of magnitude speed-up over state-of-the-art algorithms.

References

[1]
SNAP, http://snap.stanford.edu/.
[2]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, 2009.
[3]
A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD Conference, pages 781--792, 2011.
[4]
C. G. Ballard. Advances in the treatment of alzheimer's disease: benefits of dual cholinesterase inhibition. Eur Neurol, 47(1):64--70, 2002.
[5]
J. Bourgain. On lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics, 52:46--52, 1985.
[6]
G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. Efficient diversification of web search results. PVLDB, 4, 2011.
[7]
V. Chaoji, S. Ranu, R. Rastogi, and R. Bhatt. Recommendations to boost content spread in social networks. In WWW, pages 529--538, 2012.
[8]
H. Cheng, D. Lo, Y. Zhou, X. Wang, and X. Yan. Identifying bug signatures using discriminative graph mining. In Symposium on software testing and analysis, 2009.
[9]
M. Drosou and E. Pitoura. Disc diversity: result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012.
[10]
DUD. http://dud.docking.org/r2/.
[11]
U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45:634--652, July 1998.
[12]
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, 2006.
[13]
M. Hua, J. Pei, A. Fu, X. Lin, and H.-F. Leung. Top-k typicality queries and efficient query answering methods on large databases. The VLDB Journal, 18(3):809--835, 2009.
[14]
M. Hua, J. Pei, A. W. C. Fu, X. Lin, and H. fung Leung. Efficiently answering top-k typicality queries on large databases. In VLDB, pages 890--901, 2007.
[15]
R. Li and J. X. Yu. Scalable diversified ranking on graphs. In ICDM, 2011.
[16]
X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In ICDE, pages 86--95, 2007.
[17]
K. Macropol and A. Singh. Content-based modeling and prediction of information dissemination. In ASONAM, 2011.
[18]
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions-i. Mathematical Programming, 14(1):265--294, 1978.
[19]
L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. PVLDB, 5(11):1124--1135, 2012.
[20]
S. Ranu, B. T. Calhoun, A. K. Singh, and S. J. Swamidass. Probabilistic substructure mining from small-molecule screens. Molecular Informatics, 30(9):809--815, 2011.
[21]
S. Ranu, M. Hoang, and A. Singh. Mining discriminative subgraphs from global-state networks. In SIGKDD, pages 509--517, 2013.
[22]
S. Ranu and A. K. Singh. Answering top-k queries over a mixture of attractive and repulsive dimensions. PVLDB, 5(3):169--180, 2011.
[23]
S. Ranu and A. K. Singh. Indexing and mining topological patterns for drug discovery. In EDBT, pages 562--565, 2012.
[24]
H. Tong, J. He, Z. Wen, R. Konuru, and C.-Y. Lin. Diversified ranking on large graphs: an optimization viewpoint. In SIGKDD, 2011.
[25]
E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, 2008.
[26]
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, 2005.
[27]
C. Yu, L. Lakshmanan, and S. A. Yahia. It takes variety to make a world: diversification in recommender systems. In EDBT, 2009.
[28]
Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1), 2009.
[29]
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach. Springer-Verlag, 2006.
[30]
Y. Zhu, L. Qin, J. X. Yu, and H. Cheng. Finding top-k similar graphs in graph databases. In EDBT, pages 456--467, 2012.
[31]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Cited By

View all
  • (2022)GREEDProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601906(22518-22530)Online publication date: 28-Nov-2022
  • (2022)ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index TuningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526152(660-673)Online publication date: 10-Jun-2022
  • (2022)Efficient Top-k Graph Similarity Search With GED ConstraintsIEEE Access10.1109/ACCESS.2022.319455910(79180-79191)Online publication date: 2022
  • Show More Cited By

Index Terms

  1. Answering top-k representative queries on graph databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
    June 2014
    1645 pages
    ISBN:9781450323765
    DOI:10.1145/2588555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph search
    2. representative power
    3. top-k

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS'14
    Sponsor:

    Acceptance Rates

    SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)GREEDProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601906(22518-22530)Online publication date: 28-Nov-2022
    • (2022)ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index TuningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526152(660-673)Online publication date: 10-Jun-2022
    • (2022)Efficient Top-k Graph Similarity Search With GED ConstraintsIEEE Access10.1109/ACCESS.2022.319455910(79180-79191)Online publication date: 2022
    • (2022)Maximum and top-k diversified biclique search at scaleThe VLDB Journal10.1007/s00778-021-00681-631:6(1365-1389)Online publication date: 18-Apr-2022
    • (2021)Efficient and Optimal Algorithms for Tree Summarization with Weighted TerminologiesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3120722(1-1)Online publication date: 2021
    • (2020)Comprehensive and efficient workload compressionProceedings of the VLDB Endowment10.5555/3430915.344243914:3(418-430)Online publication date: 14-Dec-2020
    • (2020)Mining Top-k pairs of correlated subgraphs in a large networkProceedings of the VLDB Endowment10.14778/3397230.339724513:9(1511-1524)Online publication date: 26-Jun-2020
    • (2020)Collaborative filtering over evolution provenance data for interactive visual data explorationInformation Systems10.1016/j.is.2020.101620(101620)Online publication date: Aug-2020
    • (2020)A survey of typical attributed graph queriesWorld Wide Web10.1007/s11280-020-00849-0Online publication date: 20-Nov-2020
    • (2019)RAQ: Relationship-Aware Graph Querying in Large NetworksThe World Wide Web Conference10.1145/3308558.3313448(1886-1896)Online publication date: 13-May-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media