Skip to main content
Log in

Efficient algorithms for supergraph query processing on graph databases

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large number of graphs in databases and the NP-completeness of subgraph isomorphism testing make it challenging to efficiently processing supergraph queries. In this paper, a new approach to processing supergraph queries is proposed. Specifically, a method for compactly organizing graph databases is first presented. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from the stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating the significant feature set with optimal order are proposed, followed by the algorithms for indices construction on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm for testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all the above techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrafiotis DK, Bandyopadhyay D, Wegner JK, van Vlijmen H (2007) Recent advances in chemoinformatics. J Chem Inf Model 47(4):1279–1293

    Article  Google Scholar 

  • Bohannon P, Fan W, Flaster M, Narayan PPS (2005) Information preserving XML schema embedding. In: Proceedings of the international conference on very large data bases, pp 85–96

  • Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the IEEE international conference on data mining, pp 51–58

  • Bunke H (2000) Graph matching: Theoretical foundations, algorithms, and applications. In: Vision interface, pp 82–88

  • Burge M, Kropatsch WG (1999) A minimal line property preserving representation of line images. Computing 62(4):355–368

    Article  MATH  Google Scholar 

  • Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: Proceedings of European conference on principles and practice of knowledge discovery in databases, pp 445–452

  • Chen C, Yan X, Yu PS, Han J, Zhang D-Q, Gu X (2007) Towards graph containment search and indexing. In: Proceedings of the international conference on very large data bases, pp 926–937

  • Cheng J, Ke Y, Ng W, Lu A (2007) Fg-index: towards verification-free query processing on graph databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 857–872

  • Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298

    Article  Google Scholar 

  • Cordella LP, Foggia P, Sansone C, Vento M (2000) Fast graph matching for detecting cad image components. In: Proceedings of the international conference on pattern recognition, pp 6034–6037

  • Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372

    Article  Google Scholar 

  • Fortin S (1996) The graph isomorphism problem. Technical report, University of Alberta

  • Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York. ISBN 0-7167-1044-7

    MATH  Google Scholar 

  • Gupta AK, Suciu D (2003) Stream processing of xpath queries with predicates. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 419–430

  • He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: Proceedings of the international conference on data engineering, p 38

  • Jiang H, Wang H, Yu PS, Zhou S (2007) Gstring: a novel approach for efficient search in graph databases. In: Proceedings of the international conference on data engineering, pp 566–575

  • Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining, pp 313–320

  • Li X-Y, Wan P-J, Wang Y, Yi C-W (2003) Fault tolerant deployment and topology control in wireless networks. In: Proceedings of the ACM international symposium on mobile ad hoc networking and computing, pp 117–128

  • Liu Y, Li J, Gao H (2008) Summarizing graph patterns. In: Proceedings of the international conference on data engineering, pp 903–912

  • Messmer BT, Bunke H (1999) A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognit 32(12):1979–1998

    Article  Google Scholar 

  • Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323

    Article  Google Scholar 

  • Petrakis EGM, Faloutsos C (1997) Similarity searching in medical image databases. IEEE Trans Knowl Data Eng 9(3):435–447

    Article  Google Scholar 

  • Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc VLDB Endow 1(1):364–375

    Google Scholar 

  • Shasha D, Wang JT-L, Giugno R (2002) Algorithmics and applications of tree and graph searching. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 39–52

  • Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42

    Article  MathSciNet  Google Scholar 

  • Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 316–325

  • Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor 5(1):59–68

    Article  Google Scholar 

  • Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38(6):983–996

    Google Scholar 

  • Williams DW, Huan J, Wang W (2007) Graph database indexing using structured graph decomposition. In: Proceedings of the international conference on data engineering, pp 976–985

  • Wörlein M (2006) Extension and parallelization of a graph-mining-algorithm. Master’s thesis, Friedrich-Alexander-Universität, Erlangen-Nürnberg

  • Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining, pp 721–724

  • Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 286–295

  • Yan X, Yu PS, Han J (2005) Graph indexing based on discriminative frequent structure analysis. ACM Trans Database Syst 30(4):960–993

    Article  Google Scholar 

  • Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2):13

    Article  Google Scholar 

  • Zhang S, Hu M, Yang J (2007) Treepi: a novel graph indexing method. In: Proceedings of the international conference on data engineering, pp 966–975

  • Zhao P, Yu JX, Yu PS (2007) Graph indexing: Tree + delta ≥ graph. In: Proceedings of the international conference on very large data bases, pp 938–949

  • Zou L, Chen L, Yu JX, Lu Y (2008) A novel spectral coding in a large graph database. In: Proceedings of the international conference on extending database technology, pp 181–192

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianzhong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Gao, X., Wu, W. et al. Efficient algorithms for supergraph query processing on graph databases. J Comb Optim 21, 159–191 (2011). https://doi.org/10.1007/s10878-009-9221-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-009-9221-1

Keywords

Navigation