Skip to main content
Log in

High efficiency and quality: large graphs matching

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Graph matching plays an essential role in many real applications. In this paper, we study how to match two large graphs by maximizing the number of matched edges, which is known as maximum common subgraph matching and is NP-hard. To find exact matching, it cannot a graph with more than 30 nodes. To find an approximate matching, the quality can be very poor. We propose a novel two-step approach that can efficiently match two large graphs over thousands of nodes with high matching quality. In the first step, we propose an anchor-selection/expansion approach to compute a good initial matching. In the second step, we propose a new approach to refine the initial matching. We give the optimality of our refinement and discuss how to randomly refine the matching with different combinations. We further show how to extend our solution to handle labeled graphs. We conducted extensive testing using real and synthetic datasets and report our findings in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The conference version of this work was reported in [38].

  2. http://cbio.ensmp.fr/graphm/.

  3. http://vlado.fmf.uni-lj.si/pub/networks/pajek/.

  4. http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/bcspwr/bcspwr.html.

  5. http://cactus.nci.nih.gov/download/nci/.

  6. We cannot vary degrees for real datasets like PN.

References

  1. Abu-Khzam, F.N., Samatova, N.F., Rizk, M.A., Langston, M.A.: The maximum common subgraph problem: faster solutions via vertex cover. In: AICCSA, pp. 367–373 (2007)

  2. Almohamad, H.A., Duffuaa, S.O.: A linear programming approach for the weighted graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 522–525 (1993)

    Article  Google Scholar 

  3. Arora, S., Safra, S.: Approximating clique is np-complete. In: Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, pp. 2–13 (1992)

  4. Bai, X., Yu, H., Hancock, E.: Graph matching using spectral embedding and alignment. In: Proceedings of International Conference on Pattern Recognition, pp. 398–401 (2004)

  5. Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509 (1999)

    Article  MathSciNet  Google Scholar 

  6. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  7. Bernard, M., Richard, N., Paquereau, J.: Functional brain imaging by eeg graph-matching. In: 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’05), pp. 5309–5312 (2005)

  8. Blondel, V., Gajardo, A., Heymans, M., Senellart, P., Van Dooren, P.: A measure of similarity between graph vertices: Applications to synonym extraction and web searching. Siam Rev. 46(4), 647–666 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bonchi, F., Esfandiar, P., Gleich, D.F., Greif, C., Lakshmanan, L.V.S.: Fast matrix computations for pair-wise and column-wise commute times and katz scores. CoRR abs/1104.3791 (2011)

  10. Caelli, T., Kosinov, S.: An eigenspace projection clustering method for inexact graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 515–519 (2004)

    Google Scholar 

  11. Caelli, T., Kosinov, S.: Inexact graph matching using eigen-subspace projection clustering. Int. J. Pattern Recognit. Artif. Intell. 18(3), 329–354 (2004)

    Article  Google Scholar 

  12. Chevalier, F., Domenger, J.P., Benois-Pineau, J., Delest, M.: Retrieval of objects in video by similarity based on graph matching. Pattern Recogn. Lett. 28(8), 939–949 (2007)

    Article  Google Scholar 

  13. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. IJPRAI 18(3), 265–298 (2004)

    Google Scholar 

  14. Foster, K.C., Muth, S.Q., Potterat, J.J., Rothenberg, R.B.: A faster katz status score algorithm. Comput. Math. Organ. Theory 7(4), 275–285 (2001)

    Article  Google Scholar 

  15. Jouili, S., Tabbone, S.: Graph matching based on node signatures. In: GbRPR, pp. 154–163 (2009)

  16. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Google Scholar 

  17. Knossow, D., Sharma, A., Mateus, D., Horaud, R.: Inexact matching of large and sparse graphs using laplacian eigenvectors. In: Proceedings of the 7th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition, p. 153. Springer (2009)

  18. Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)

    Google Scholar 

  19. Krissinel, E., Henrick, K.: Common subgraph isomorphism detection by backtracking search. Softw. Practice Experience 34(6), 591–607 (2004)

    Article  Google Scholar 

  20. Lee, W., Duin, R.: An inexact graph comparison approach in joint eigenspace. In: Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, p. 44. Springer (2008)

  21. McGregor, J.: Backtrack search algorithms and the maximal common subgraph problem. Softw. Practice Experience 12(1), 23–34 (1982)

    Article  MATH  Google Scholar 

  22. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)

  23. Newman, M.E.J.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46, 323–351 (2005)

    Article  Google Scholar 

  24. Ogata, H., Fujibuchi, W., Goto, S., Kanehisa, M.: A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28(20), 4021–4028 (2000)

    Google Scholar 

  25. Qiu, H., Hancock, E.: Graph matching and clustering using spectral partitions. Pattern Recognit. 39(1), 22–34 (2006)

    Article  Google Scholar 

  26. Raymond, J., Gardiner, E., Willett, P.: Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6), 631 (2002)

    Google Scholar 

  27. Riesen, K., Jiang, X., Bunke, H.: Exact and inexact graph matching: methodology and applications. In: Managing and Mining Graph Data (Chapter 7) (2010)

  28. Singh, R., Xu, J., Berger, B.: Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Research in Computational Molecular Biology, pp. 16–31. Springer (2007)

  29. Suters, W., Abu-Khzam F., Zhang, Y., Symons, C., Samatova, N., Langston, M.: A new approach and faster exact methods for the maximum common subgraph problem. Comput. Comb. 717–727 (2005)

  30. Tong, H., Faloutsos, C., Pan, J.-Y.: Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14(3), 327–346 (2008)

    Article  MATH  Google Scholar 

  31. Ullmann, J.: An algorithm for subgraph isomorphism. J. ACM (JACM) 23(1), 42 (1976)

    Article  MathSciNet  Google Scholar 

  32. Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell. 10(5), 695–703 (1988)

    Article  MATH  Google Scholar 

  33. Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)

    Article  Google Scholar 

  34. Xiao, B., Hancock, E., Wilson, R.: A generative model for graph matching and embedding. Comput. Vis. Image Underst. 113(7), 777–789 (2009)

    Article  Google Scholar 

  35. Xu, L., King, I.: A PCA approach for fast retrieval of structural patterns in attributed graphs. IEEE Trans. Syst. Man Cybern. B Cybern. 31(5), 812–817 (2001)

    Article  Google Scholar 

  36. Zaslavskiy, M., Bach, F., Vert, J.: A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2227–2242 (2009)

    Article  Google Scholar 

  37. Zaslavskiy, M., Bach, F., Vert, J.: Global alignment of protein-protein interaction networks by graph matching methods. Bioinformatics 25(12), i259 (2009)

    Article  Google Scholar 

  38. Zhu, Y., Qin, L., Yu, J.X., Ke, Y., Lin, X.: High efficiency and quality: large graphs matching. In: CIKM (2011)

Download references

Acknowledgments

The work was supported by the Research Grants Council of the Hong Kong SAR, China (419109), ARC Discovery Grants (ARCDP0987557, ARCDP110102937, ARCDP120104168), and NSFC61021004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey Xu Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Qin, L., Yu, J.X. et al. High efficiency and quality: large graphs matching. The VLDB Journal 22, 345–368 (2013). https://doi.org/10.1007/s00778-012-0292-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0292-8

Keywords

Navigation