Article

Substructure similarity search in graph databases

Authors:

Jiawei HanAuthors Info & Claims

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Pages 766 - 777

https://doi.org/10.1145/1066157.1066244

Published: 14 June 2005 Publication History

Abstract

Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. The most fundamental support needed in these applications is the efficient search of complex structured data. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently.In this paper, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed missing features, our structural filtering algorithm, called Grafil, can filter many graphs without performing pairwise similarity computations. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy for filtering. By examining the effect of different feature selection mechanisms, we develop a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features. We identify the criteria to form effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly. Moreover, the concept presented in Grafil can be applied to searching approximate non-consecutive sequences, trees, and other complicated structures as well.

References

[1]

S. Beretti, A. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content based retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23:1089--1105, 2001.

Digital Library

[2]

H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig, I. Shindyalov, and P. Bourne. The protein data bank. Nucleic Acids Research, 28:235--242, 2000.

[3]

H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19:255--259, 1998.

Digital Library

[4]

D. Hochbaum (ed.). Approximation Algorithms for NP-Hard Problems. PWS Publishing, MA, 1997.

Digital Library

[5]

U. Feige. A threshold of In n for approximating set cover. Journal of the ACM, 45:634--652, 1998.

Digital Library

[6]

M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman & Co., New York, 1979.

Digital Library

[7]

L. Gravano, P. Ipeirotis, H. Jagadish, N. Koudas, S. Muthukrishnan, L. Pietarinen, and D. Srivastava. Using q-grams in a dbms for approximate string processing. Data Engineering Bulletin, 24:28--37, 2001.

[8]

T. Hagadone. Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci., 32:515--521, 1992.

[9]

L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94), pages 169--180, 1994.

[10]

K. Kailing, H. Kriegel, S. Schnauer, and T. Seidl. Efficient similarity search for hierarchical data in large databases. In Proc: 9th Int. Conf. on Extending Database Technology (EDBT'04), pages 676--693, 2004.

[11]

M. Kanehisa and S. Goto. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28:27--30, 2000.

[12]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. on Data Mining (ICDM'01), pages 313--320, 2001.

Digital Library

[13]

B. Messmer and H. Bunke. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20:493--504, 1998.

Digital Library

[14]

G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33:31--88, 2001.

Digital Library

[15]

N. Nilsson. Principles of Artificial Intelligence. Morgan Kaufmann, Palo Alto, CA, 1980.

Digital Library

[16]

National Library of Medicine. http://chem.sis.nlm.nih.gov/chemidplus.

[17]

E. Petrakis and C. Faloutsos. Similarity searching in medical image databases. Knowledge and Data Engineering, 9(3):435--447, 1997.

Digital Library

[18]

J. Raymond, E. Gardiner, and P. Willett. Rascal: Calculation of graph similarity using maximum common edge subgraphs. The Computer Journal, 45:631--644, 2002.

[19]

D. Shasha, J. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proc. 21th ACM Symp. on Principles of Database Systems (PODS'02), pages 39--52, 2002.

Digital Library

[20]

S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. 2003 Int. Conf. on Very Large Data Bases, pages 975--986, 2003.

Digital Library

[21]

E. Ukkonen. Approximate string matching with q-grams and maximal matches. Theoretic Computer Science, pages 191--211, 1992.

Digital Library

[22]

J. Ullmann. Binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words. The Computer Journal, 20:141--147, 1977.

[23]

J. Wang, K. Zhang, K. Jeong, and D. Shasha. A system for approximate tree matching. IEEE Trans. on Knowledge and Data Engineering, 6:559 - 571, 1994.

Digital Library

[24]

P. Willett, J. Barnard, and G. Downs. Chemical similarity searching. J. Chem. Inf. Comput. Sci., 38:983--996, 1998.

[25]

X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. 2004 ACM Int. Conf. on Management of Data (SIGMOD'04), pages 335--346, 2004.

Digital Library

Cited By

Agarwal SDutta SBhattacharya A(2024)VeNoM: Approximate Subgraph Matching with Enhanced Neighbourhood Structural InformationProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632459(18-26)Online publication date: 4-Jan-2024
https://doi.org/10.1145/3632410.3632459
Brosse CLagoutte ALimouzy VMary APastor L(2024)Efficient enumeration of maximal split subgraphs and induced sub-cographs and related classesDiscrete Applied Mathematics10.1016/j.dam.2023.10.025345(34-51)Online publication date: Mar-2024
https://doi.org/10.1016/j.dam.2023.10.025
Wang CWang WYang CShi CXie RLu YYang HZhang X(2024)Group-to-group recommendation with neural graph matchingWorld Wide Web10.1007/s11280-024-01250-x27:2Online publication date: 5-Mar-2024
https://doi.org/10.1007/s11280-024-01250-x
Show More Cited By

Recommendations

Feature-based similarity search in graph structures

Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of substructure similarity search using indexed features in graph ...
Graph similarity search on large uncertain graph databases

Many studies have been conducted on seeking an efficient solution for graph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description ...
Efficient processing of graph similarity search

A graph similarity search is to find a set of graphs from a graph database that are similar to a given query graph. Existing works solve this problem by first defining a similarity measure between two graphs, and then presenting a filtering mechanism ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

June 2005

990 pages

ISBN:1595930604

DOI:10.1145/1066157

Conference Chair:
Fatma Ozcan
IBM Almaden Research Center

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGMOD/PODS05

Sponsor:

SIGMOD/PODS05: International Conference on Management of Data and Symposium on Principles Database and Systems

June 14 - 16, 2005

Maryland, Baltimore

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

241
Total Citations
View Citations
2,494
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)5

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Agarwal SDutta SBhattacharya A(2024)VeNoM: Approximate Subgraph Matching with Enhanced Neighbourhood Structural InformationProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632459(18-26)Online publication date: 4-Jan-2024
https://doi.org/10.1145/3632410.3632459
Brosse CLagoutte ALimouzy VMary APastor L(2024)Efficient enumeration of maximal split subgraphs and induced sub-cographs and related classesDiscrete Applied Mathematics10.1016/j.dam.2023.10.025345(34-51)Online publication date: Mar-2024
https://doi.org/10.1016/j.dam.2023.10.025
Wang CWang WYang CShi CXie RLu YYang HZhang X(2024)Group-to-group recommendation with neural graph matchingWorld Wide Web10.1007/s11280-024-01250-x27:2Online publication date: 5-Mar-2024
https://doi.org/10.1007/s11280-024-01250-x
Liu XZhang LSun JYang YYang HKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)D2MatchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619341(22454-22472)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619341
Wang SZheng YJia XHuang HWang C(2023)PrigSim: Towards Privacy-Preserving Graph Similarity Search as a Cloud ServiceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.326644935:10(10478-10496)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2023.3266449
Xu WLi SHa MGuo XMa QLiu XChen LZhu Z(2023)Neural Node Matching for Multi-Target Cross Domain Recommendation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00167(2154-2166)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00167
Wu FGao L(2023)Scalable top-k query on information networks with hierarchical inheritance relationsDistributed and Parallel Databases10.1007/s10619-023-07432-242:1(1-30)Online publication date: 3-Jun-2023
https://doi.org/10.1007/s10619-023-07432-2
Bhowmick SChoi BBhowmick SChoi B(2023)BackgroundPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_2(15-20)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_2
Abodo FMarvin PBrown J(2022)Graph Reachability Pruning: Adaptive Data Reduction for Inexact Subgraph Matching2022 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICKG55886.2022.00008(1-5)Online publication date: Nov-2022
https://doi.org/10.1109/ICKG55886.2022.00008
Huang ZZhou F(2022)An Approximation Method for Querying Similar Large Graphs2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020310(5934-5943)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10020310
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten