Efficient algorithms for supergraph query processing on graph databases

Zhang, Shuo; Gao, Xiaofeng; Wu, Weili; Li, Jianzhong; Gao, Hong

doi:10.1007/s10878-009-9221-1

Efficient algorithms for supergraph query processing on graph databases

Published: 17 March 2009

Volume 21, pages 159–191, (2011)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Shuo Zhang¹,
Xiaofeng Gao²,
Weili Wu²,
Jianzhong Li¹ &
…
Hong Gao¹

294 Accesses
3 Citations
Explore all metrics

Abstract

We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large number of graphs in databases and the NP-completeness of subgraph isomorphism testing make it challenging to efficiently processing supergraph queries. In this paper, a new approach to processing supergraph queries is proposed. Specifically, a method for compactly organizing graph databases is first presented. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from the stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating the significant feature set with optimal order are proposed, followed by the algorithms for indices construction on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm for testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all the above techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrafiotis DK, Bandyopadhyay D, Wegner JK, van Vlijmen H (2007) Recent advances in chemoinformatics. J Chem Inf Model 47(4):1279–1293
Article Google Scholar
Bohannon P, Fan W, Flaster M, Narayan PPS (2005) Information preserving XML schema embedding. In: Proceedings of the international conference on very large data bases, pp 85–96
Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the IEEE international conference on data mining, pp 51–58
Bunke H (2000) Graph matching: Theoretical foundations, algorithms, and applications. In: Vision interface, pp 82–88
Burge M, Kropatsch WG (1999) A minimal line property preserving representation of line images. Computing 62(4):355–368
Article MATH Google Scholar
Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: Proceedings of European conference on principles and practice of knowledge discovery in databases, pp 445–452
Chen C, Yan X, Yu PS, Han J, Zhang D-Q, Gu X (2007) Towards graph containment search and indexing. In: Proceedings of the international conference on very large data bases, pp 926–937
Cheng J, Ke Y, Ng W, Lu A (2007) Fg-index: towards verification-free query processing on graph databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 857–872
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298
Article Google Scholar
Cordella LP, Foggia P, Sansone C, Vento M (2000) Fast graph matching for detecting cad image components. In: Proceedings of the international conference on pattern recognition, pp 6034–6037
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Article Google Scholar
Fortin S (1996) The graph isomorphism problem. Technical report, University of Alberta
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York. ISBN 0-7167-1044-7
MATH Google Scholar
Gupta AK, Suciu D (2003) Stream processing of xpath queries with predicates. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 419–430
He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: Proceedings of the international conference on data engineering, p 38
Jiang H, Wang H, Yu PS, Zhou S (2007) Gstring: a novel approach for efficient search in graph databases. In: Proceedings of the international conference on data engineering, pp 566–575
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining, pp 313–320
Li X-Y, Wan P-J, Wang Y, Yi C-W (2003) Fault tolerant deployment and topology control in wireless networks. In: Proceedings of the ACM international symposium on mobile ad hoc networking and computing, pp 117–128
Liu Y, Li J, Gao H (2008) Summarizing graph patterns. In: Proceedings of the international conference on data engineering, pp 903–912
Messmer BT, Bunke H (1999) A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognit 32(12):1979–1998
Article Google Scholar
Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323
Article Google Scholar
Petrakis EGM, Faloutsos C (1997) Similarity searching in medical image databases. IEEE Trans Knowl Data Eng 9(3):435–447
Article Google Scholar
Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc VLDB Endow 1(1):364–375
Google Scholar
Shasha D, Wang JT-L, Giugno R (2002) Algorithmics and applications of tree and graph searching. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 39–52
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42
Article MathSciNet Google Scholar
Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 316–325
Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor 5(1):59–68
Article Google Scholar
Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38(6):983–996
Google Scholar
Williams DW, Huan J, Wang W (2007) Graph database indexing using structured graph decomposition. In: Proceedings of the international conference on data engineering, pp 976–985
Wörlein M (2006) Extension and parallelization of a graph-mining-algorithm. Master’s thesis, Friedrich-Alexander-Universität, Erlangen-Nürnberg
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining, pp 721–724
Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 286–295
Yan X, Yu PS, Han J (2005) Graph indexing based on discriminative frequent structure analysis. ACM Trans Database Syst 30(4):960–993
Article Google Scholar
Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2):13
Article Google Scholar
Zhang S, Hu M, Yang J (2007) Treepi: a novel graph indexing method. In: Proceedings of the international conference on data engineering, pp 966–975
Zhao P, Yu JX, Yu PS (2007) Graph indexing: Tree + delta ≥ graph. In: Proceedings of the international conference on very large data bases, pp 938–949
Zou L, Chen L, Yu JX, Lu Y (2008) A novel spectral coding in a large graph database. In: Proceedings of the international conference on extending database technology, pp 181–192

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, China
Shuo Zhang, Jianzhong Li & Hong Gao
University of Texas at Dallas, Dallas, USA
Xiaofeng Gao & Weili Wu

Authors

Shuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Weili Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianzhong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Gao, X., Wu, W. et al. Efficient algorithms for supergraph query processing on graph databases. J Comb Optim 21, 159–191 (2011). https://doi.org/10.1007/s10878-009-9221-1

Download citation

Published: 17 March 2009
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10878-009-9221-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient algorithms for supergraph query processing on graph databases

Abstract

Access this article

Similar content being viewed by others

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Graph Databases: Their Power and Limitations

Small Subgraphs with Large Average Degree

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient algorithms for supergraph query processing on graph databases

Abstract

Access this article

Similar content being viewed by others

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Graph Databases: Their Power and Limitations

Small Subgraphs with Large Average Degree

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation