Abstract
We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large number of graphs in databases and the NP-completeness of subgraph isomorphism testing make it challenging to efficiently processing supergraph queries. In this paper, a new approach to processing supergraph queries is proposed. Specifically, a method for compactly organizing graph databases is first presented. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from the stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating the significant feature set with optimal order are proposed, followed by the algorithms for indices construction on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm for testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all the above techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude.
Similar content being viewed by others
References
Agrafiotis DK, Bandyopadhyay D, Wegner JK, van Vlijmen H (2007) Recent advances in chemoinformatics. J Chem Inf Model 47(4):1279–1293
Bohannon P, Fan W, Flaster M, Narayan PPS (2005) Information preserving XML schema embedding. In: Proceedings of the international conference on very large data bases, pp 85–96
Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the IEEE international conference on data mining, pp 51–58
Bunke H (2000) Graph matching: Theoretical foundations, algorithms, and applications. In: Vision interface, pp 82–88
Burge M, Kropatsch WG (1999) A minimal line property preserving representation of line images. Computing 62(4):355–368
Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. In: Proceedings of European conference on principles and practice of knowledge discovery in databases, pp 445–452
Chen C, Yan X, Yu PS, Han J, Zhang D-Q, Gu X (2007) Towards graph containment search and indexing. In: Proceedings of the international conference on very large data bases, pp 926–937
Cheng J, Ke Y, Ng W, Lu A (2007) Fg-index: towards verification-free query processing on graph databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 857–872
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298
Cordella LP, Foggia P, Sansone C, Vento M (2000) Fast graph matching for detecting cad image components. In: Proceedings of the international conference on pattern recognition, pp 6034–6037
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Fortin S (1996) The graph isomorphism problem. Technical report, University of Alberta
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, New York. ISBN 0-7167-1044-7
Gupta AK, Suciu D (2003) Stream processing of xpath queries with predicates. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 419–430
He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: Proceedings of the international conference on data engineering, p 38
Jiang H, Wang H, Yu PS, Zhou S (2007) Gstring: a novel approach for efficient search in graph databases. In: Proceedings of the international conference on data engineering, pp 566–575
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining, pp 313–320
Li X-Y, Wan P-J, Wang Y, Yi C-W (2003) Fault tolerant deployment and topology control in wireless networks. In: Proceedings of the ACM international symposium on mobile ad hoc networking and computing, pp 117–128
Liu Y, Li J, Gao H (2008) Summarizing graph patterns. In: Proceedings of the international conference on data engineering, pp 903–912
Messmer BT, Bunke H (1999) A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognit 32(12):1979–1998
Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323
Petrakis EGM, Faloutsos C (1997) Similarity searching in medical image databases. IEEE Trans Knowl Data Eng 9(3):435–447
Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc VLDB Endow 1(1):364–375
Shasha D, Wang JT-L, Giugno R (2002) Algorithmics and applications of tree and graph searching. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 39–52
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42
Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 316–325
Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor 5(1):59–68
Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38(6):983–996
Williams DW, Huan J, Wang W (2007) Graph database indexing using structured graph decomposition. In: Proceedings of the international conference on data engineering, pp 976–985
Wörlein M (2006) Extension and parallelization of a graph-mining-algorithm. Master’s thesis, Friedrich-Alexander-Universität, Erlangen-Nürnberg
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining, pp 721–724
Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 286–295
Yan X, Yu PS, Han J (2005) Graph indexing based on discriminative frequent structure analysis. ACM Trans Database Syst 30(4):960–993
Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2):13
Zhang S, Hu M, Yang J (2007) Treepi: a novel graph indexing method. In: Proceedings of the international conference on data engineering, pp 966–975
Zhao P, Yu JX, Yu PS (2007) Graph indexing: Tree + delta ≥ graph. In: Proceedings of the international conference on very large data bases, pp 938–949
Zou L, Chen L, Yu JX, Lu Y (2008) A novel spectral coding in a large graph database. In: Proceedings of the international conference on extending database technology, pp 181–192
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, S., Gao, X., Wu, W. et al. Efficient algorithms for supergraph query processing on graph databases. J Comb Optim 21, 159–191 (2011). https://doi.org/10.1007/s10878-009-9221-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-009-9221-1