Skip to main content
Log in

Efficient and scalable labeled subgraph matching using SGMatch

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graphs are natural candidates for modeling application domains, such as social networks, pattern recognition, citation networks, or protein–protein interactions. One of the most challenging tasks in managing graphs is subgraph matching over data graphs, which attempts to find one-to-one correspondences, called solutions, among the query and data nodes. To compute solutions, most contemporary techniques use backtracking and recursion. An open research question is whether graphs can be matched based on parts and local solutions can be combined to reach a global matching. In this paper, we present an approach based on graph decomposition called SGMatch to match graphs. We represent graphs in smaller units called graphlets and develop a matching technique to leverage this representation. Pruning strategies use a new notion of edge covering called minimum hub cover and metadata, such as statistics and inverted indices, to reduce the number of matching candidates. Our evaluation of SGMatch versus contemporary algorithms, i.e., VF2, GraphQL, QuickSI, GADDI, or SPath, shows that SGMatch substantially improves the performance of current state-of-the-art techniques for larger query graphs with different structures, i.e., cliques, paths or subgraphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Amin MS, Finley RL, Jamil HM (2012) Top-k similar graph matching using TraM in biological networks. IEEE/ACM Trans Comput Biol Bioinform 9(6):1790–1804

    Article  Google Scholar 

  2. Bonnici V, Giugno R, Pulvirenti A, Shasha D, Ferro A (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform 14(S–7):S13

    Article  Google Scholar 

  3. Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372

    Article  Google Scholar 

  4. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, San Francisco

    MATH  Google Scholar 

  5. Han W-S, Lee J, Lee J-H (2013) Turbo\(_{{\rm iso}}\): towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD conference, pp 337–348

  6. He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE, p 38

  7. He H, Singh AK(2008) Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD conference, pp 405–418

  8. Jamil HM (2009) A novel knowledge representation framework for computing sub-graph isomorphic queries in interaction network databases. In ICTAI, pp 131–138

  9. Jamil HM (2011) Computing subgraph isomorphic queries using structural unification and minimum graph structures. In SAC, pp 1053–1058

  10. Lee J, Han W-S, Kasperovics R, Lee J-H (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2):133–144

    Google Scholar 

  11. Leskovec J (2016) Stanford large network dataset collection. http://snap.stanford.edu/data/index.html

  12. Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323

    Article  Google Scholar 

  13. Prechelt L (2000) An empirical comparison of seven programming languages. IEEE Comput 33(10):23–29

    Article  Google Scholar 

  14. Prud’homme C, Fages J-G (2016) CHOCO solver: a Java library for CSP and CP. http://choco-solver.org/

  15. Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Benchmarking data exchange among semantic-web ontologies. IEEE Trans Knowl Data Eng 25(9):1997–2009

    Article  Google Scholar 

  16. Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Exchanging data amongst linked data applications. Knowl Inf Syst 37(3):693–729

    Article  Google Scholar 

  17. Rivero CR, Jamil HM (2014) On matching graphs using disk-based processing: an XML and XQuery implementation. In: ICDE workshops, pp 20–27

  18. Rivero CR, Jamil HM (2016) SGMatch: implementation, data and query sets, running scripts. http://dblab.nkn.uidaho.edu/SGMatch/

  19. Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1):364–375

    Google Scholar 

  20. Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. PVLDB 5(9):788–799

    Google Scholar 

  21. Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204

    Google Scholar 

  22. Tian Y, Patel JM(2008) TALE: a tool for approximate large graph matching. In: ICDE, pp 963–972

  23. Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42

    Article  MathSciNet  Google Scholar 

  24. Yelbay B, Birbil SI, Bülbül K, Jamil HM (2016) Approximating the minimum hub cover problem on planar graphs. Optim Lett 10(1):33–45

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhang S, Li S, Yang J (2009) GADDI: distance index based subgraph matching in biological networks. In: EDBT, pp 192–203

  26. Zhang S, Yang J, Jin W (2010) SAPPER: subgraph indexing and approximate matching in large graphs. PVLDB 3(1):1185–1194

    Google Scholar 

  27. Zhao P, Han J (2010) On graph query optimization in large networks. PVLDB 3(1):340–351

    Google Scholar 

  28. Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2), Article No 6. doi:10.1145/1132956.1132959

  29. Zou Z, Li J, Gao H, Zhang S (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22(9):1203–1218

    Article  Google Scholar 

Download references

Acknowledgments

This work was completed while Carlos R. Rivero was working as a postdoctoral researcher at the University of Idaho. The ILP formulation discussed in Sect. 3.2 for the computation of minimum hub cover was contributed by Belma Yelbay. We would like to thank the anonymous reviewers and Stanisław P. Radziszowski for their valuable suggestions. This research was supported in part by the National Science Foundation grant DRL 1515550, Idaho National Laboratory grant FFK039, and a University of Idaho Office of Research and Economic Development grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos R. Rivero.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rivero, C.R., Jamil, H.M. Efficient and scalable labeled subgraph matching using SGMatch. Knowl Inf Syst 51, 61–87 (2017). https://doi.org/10.1007/s10115-016-0968-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0968-2

Keywords

Navigation