skip to main content
article

Feature-based similarity search in graph structures

Published: 01 December 2006 Publication History

Abstract

Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed feature misses, our structural filtering algorithm can filter graphs without performing pairwise similarity computation. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy that could maximize the filtering capability. We prove that the complexity of optimal feature set selection is Ω(2m) in the worst case, where m is the number of features for selection. In practice, we identify several criteria to build effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly within a multifilter composition framework. The proposed feature-based filtering concept can be generalized and applied to searching approximate nonconsecutive sequences, trees, and other structured data as well.

References

[1]
Beretti, S., Bimbo, A., and Vicario, E. 2001. Efficient matching and indexing of graph models in content based retrieval. IEEE Trans. Patt. Analys. Mach. Intell. 23, 1089--1105.
[2]
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., and Bourne, P. 2000. The protein data bank. Nucl. Acids Res. 28, 235--242.
[3]
Bunke, H. and Shearer, K. 1998. A graph distance metric based on the maximal common subgraph. Patt. Recogn. Lett. 19, 255--259.
[4]
Erickson, J. 1996. Lower bounds for fundamental geometric problems. Ph.D. dissertation. University of California at Berkeley, Berkley, CA.
[5]
Feige, U. 1998. A threshold of ln n for approximating set cover. J. ACM 45, 634--652.
[6]
Garey, M. and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman & Co., New York, NY.
[7]
Giugno, R. and Shasha, D. 2002. Graphgrep: A fast and universal method for querying graphs. Proceedings of the 2002 International Conference on Pattern Recognition (ICPR'02). 112--115.
[8]
Gravano, L., Ipeirotis, P., Jagadish, H., Koudas, N., Muthukrishnan, S., Pietarinen, L., and Srivastava, D. 2001. Using q-grams in a dbms for approximate string processing. Data Eng. Bull. 24, 28--37.
[9]
Hagadone, T. 1992. Molecular substructure similarity searching: Efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci. 32, 515--521.
[10]
Hochbaum, D. 1997. Approximation Algorithms for NP-Hard Problems. PWS Publishing, Boston, MA.
[11]
Holder, L., Cook, D., and Djoko, S. 1994. Substructure discovery in the subdue system. In Proceedings of the AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94). 169--180.
[12]
Kailing, K., Kriegel, H., Schnauer, S., and Seidl, T. 2004. Efficient similarity search for hierarchical data in large databases. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT'04). 676--693.
[13]
Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 28, 27--30.
[14]
Kislicyn, S. S. 1964. On the selection of the kth element of an ordered set by pairwise comparisons. Sibirskii Mat. Z. (in Russian) 5, 557--564.
[15]
Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In Proceedings of the 2001 International Conference on Data Mining (ICDM'01). 313--320.
[16]
Messmer, B. and Bunke, H. 1998. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. Patt. Analys. Mach. Intell. 20, 493--504.
[17]
Navarro, G. 2001. A guided tour to approximate string matching. ACM Comput. Surv. 33, 31--88.
[18]
Nilsson, N. 1980. Principles of Artificial Intelligence. Morgan Kaufmann, Palo Alto, CA.
[19]
Padberg, M. 1995. Linear Optimization and Extensions. Springer-Verlag, Berlin, Germany.
[20]
Petrakis, E. and Faloutsos, C. 1997. Similarity searching in medical image databases. Knowl. Data Eng. 9, 3, 435--447.
[21]
Raymond, J., Gardiner, E., and Willett, P. 2002. Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 631--644.
[22]
Shasha, D., Wang, J., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the 21th ACM Symposium on Principles of Database Systems (PODS'02). 39--52.
[23]
Srinivasa, S. and Kumar, S. 2003. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB'03). 975--986.
[24]
Ukkonen, E. 1992. Approximate string matching with q-grams and maximal matches. Theoret. Comput. Sci. 191--211.
[25]
Ullmann, J. 1977. Binary n-gram technique for automatic correction of substitution, deletion, insertion, and reversal errors in words. Comput. J. 20, 141--147.
[26]
Wang, J., Zhang, K., Jeong, K., and Shasha, D. 1994. A system for approximate tree matching. IEEE Trans. Knowl. Data Eng. 6, 559--571.
[27]
Willett, P., Barnard, J., and Downs, G. 1998. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983--996.
[28]
Yan, X., Yu, P., and Han, J. 2004. Graph indexing: A frequent structure-based approach. In Proceedings of the 2004 ACM International Conference on Management of Data (SIGMOD'04). 335--346.
[29]
Yan, X., Yu, P., and Han, J. 2005. Substructure similarity search in graph databases. In Proceedings of the 2005 ACM International Conference on Management of Data (SIGMOD'05). 766--777.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 31, Issue 4
December 2006
357 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1189769
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2006
Published in TODS Volume 31, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph database
  2. complexity
  3. index
  4. similarity search

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)BibliographyData Mining10.1016/B978-0-12-811760-6.00024-2(681-734)Online publication date: 2023
  • (2023)Pattern mining: advanced methodsData Mining10.1016/B978-0-12-811760-6.00015-1(175-237)Online publication date: 2023
  • (2022)DAHP: Deep Attention-Guided Hashing With Pairwise LabelsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.307012932:3(933-946)Online publication date: Mar-2022
  • (2022)Deep parameter-free attention hashing for image retrievalScientific Reports10.1038/s41598-022-11217-512:1Online publication date: 30-Apr-2022
  • (2021)Deep Hash with Improved Dual Attention for Image RetrievalInformation10.3390/info1207028512:7(285)Online publication date: 20-Jul-2021
  • (2021)An energy aware grouping memetic algorithm to schedule the sensing activity in WSNs-based IoT for smart citiesApplied Soft Computing10.1016/j.asoc.2021.107473108(107473)Online publication date: Sep-2021
  • (2020)Information Is Selection—A Review of Basics Shows Substantial Potential for Improvement of Digital Information RepresentationInternational Journal of Environmental Research and Public Health10.3390/ijerph1708297517:8(2975)Online publication date: 24-Apr-2020
  • (2020)A survey on graph-based methods for similarity searches in metric spacesInformation Systems10.1016/j.is.2020.101507(101507)Online publication date: Feb-2020
  • (2018)Optimized and Frequent Subgraphs: How Are They Related?IEEE Access10.1109/ACCESS.2018.28466046(37237-37249)Online publication date: 2018
  • (2017)Querying Web-Scale Knowledge Graphs Through Effective Pruning of Search SpaceIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.266547828:8(2342-2356)Online publication date: 1-Aug-2017
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media