Skip to main content
Log in

Graph transfer learning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graph embeddings have been tremendously successful at producing node representations that are discriminative for downstream tasks. In this paper, we study the problem of graph transfer learning: given two graphs and labels in the nodes of the first graph, we wish to predict the labels on the second graph. We propose a tractable, non-combinatorial method for solving the graph transfer learning problem by combining classification and embedding losses with a continuous, convex penalty motivated by tractable graph distances. We demonstrate that our method successfully predicts labels across graphs with almost perfect accuracy; in the same scenarios, training embeddings through standard methods leads to predictions that are no better than random.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/neu-spiral/GraphTransferLearning-NEU

References

  1. Ahmed A, Shervashidze N, Narayanamurthy SM, Josifovski V, Smola AJ (2013) Distributed large-scale natural graph factorization. In: Proceedings of the 22nd international world wide web conference, WWW, 2013, pp 37–48

  2. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  3. Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 855–864

  4. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  5. Bento J, Ioannidis S (2019) A family of tractable graph metrics. Appl Netw Sci 4(1):107-1–107-27

    Article  Google Scholar 

  6. Birkhoff G (1946) Tres observaciones sobre el algebra lineal [three observations on linear algebra]. Revista - Universidad Nacional de Tucumán, Serie A 5:147–151

    Google Scholar 

  7. Cao S, Lu W, Xu Q (2015) GraRep: learning graph representations with global structural information. In: Proceedings of the 24th ACM international conference on information and knowledge management, CIKM, pp 891–900

  8. Hamilton WL, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. IEEE Data Eng Bull 40(3):52–74

    Google Scholar 

  9. Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl Based Syst 151:78–94

    Article  Google Scholar 

  10. Cai H, Zheng VW, Chang KC (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637

    Article  Google Scholar 

  11. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

    Article  Google Scholar 

  12. Li Y, Tarlow D, Brockschmidt M, Zemel RS (2016) Gated graph sequence neural networks. In: Proceedings of the 4th international conference on learning representations, ICLR,

  13. Hamilton W L, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the annual conference on neural information processing systems, NeurIPS, pp 1024–1034

  14. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations, ICLR

  15. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 34(4):18–42

    Article  Google Scholar 

  16. Kipf TN, Welling M (2016) Variational graph auto-encoders. In: Proceedings of the NeurIPS Bayesian deep learning workshop

  17. Pratt L, Jennings B (1996) A survey of connectionist network reuse through transfer. In Learning to Learn, 1996, pp 19–43

  18. Do CB, Ng AY(2005) Transfer learning for text classification. Adv Neural Inf Process Syst pp 299–306

  19. Wan C, Pan R, Li J (2011) Bi-weighting domain adaptation for cross-language text classification. In: Proceedings of the 21d international joint conference on artificial intelligence, IJCAI

  20. Lu Z, Zhu Y, Pan SJ, Xiang EW, Wang Y, Yang Q (2014) Source free transfer learning for text classification. In: Proceedings of the 28th conference on artificial intelligence, AAAI, pp 122–128

  21. Lee J, Kim H, Lee J, Yoon S (2017) Transfer learning for deep learning on graph-structured data. In: Proceedings of the 31st conference on artificial intelligence, AAAI, pp 2154–2160

  22. Gong K, Gao Y, Liang X, Shen X, Wang M, Lin L (2019) Graphonomy: universal human parsing via graph transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVP, pp 7450–7459

  23. Verma S, Zhang Z-L (2019) Learning universal graph neural network embeddings with aid of transfer learning, arXiv preprint arXiv:1909.10086

  24. Banerjee B, Stone P (2007) General game learning using knowledge transfer. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI, pp 672–677

  25. Kuhlmann G, Stone P (2007) Graph-based domain mapping for transfer learning in general games. In: Proceedings of the 18th European conference on machine learning, ECML, pp 188–200

  26. Long M, Wang J, Ding G, Shen D, Yang Q (2013) Transfer learning with graph co-regularization. IEEE Trans Knowl Data Eng 26(7):1805–1818

    Article  Google Scholar 

  27. Piao G, Breslin JG (2018) Transfer learning for item recommendations and knowledge graph completion in item related domains via a co-factorization model. In: Proceedings of the 15th extended semantic web conference, ESWC, pp 496–511

  28. Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298

    Article  Google Scholar 

  29. Allen FH (2002) The cambridge structural database: a quarter of a million crystal structures and rising. Acta Crystallogr B 58(3):380–388

    Article  Google Scholar 

  30. Kvasnička V, Pospíchal J, Baláž V (1991) Reaction and chemical distances and reaction graphs. Theoret Chem Account Theory Comput Model 79(1):65–79

    Google Scholar 

  31. Macindoe O, Richards W (2010) Graph comparison using fine structure analysis. In Proceedings of the IEEE 2nd international conference on social computing, SocialCom, 2010, pp 193–200

  32. Faloutsos C, Koutra D, Vogelstein JT(2013) DELTACON: a principled massive-graph similarity function. In: Proceedings of the 13th SIAM international conference on data mining, ICDM, pp 162–170

  33. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co,

  34. Fischer A, Suen CY, Frinken V, Riesen K, Bunke H (2015) Approximation of graph edit distance based on hausdorff matching. Pattern Recogn 48(2):331–343

    Article  MATH  Google Scholar 

  35. Bunke H (1997) On a relation between graph edit distance and maximum common subgraph. Pattern Recogn Lett 18(8):689–694

    Article  Google Scholar 

  36. Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recogn Lett 19(3–4):255–259

    Article  MATH  Google Scholar 

  37. Chartrand G, Kubicki G, Schultz M (1998) Graph similarity and distance in graphs. Aequationes Mathematicae

  38. Jain BJ (2016) On the geometry of graph spaces. Discret Appl Math 214:126–144

    Article  MathSciNet  MATH  Google Scholar 

  39. Koca J, Kratochvil M, Kvasnicka V, Matyska L, Pospichal J (2012) Synthon model of organic chemistry and synthesis design. Springer Science & Business Media, vol 51

  40. Riesen K, Neuhaus M, Bunke K (2007) Graph embedding in vector spaces by means of prototype selection. Graph-Based Represent Pattern Recogn 4538:383–393

    MATH  Google Scholar 

  41. Riesen K, Bunke H (2010) Graph Classification and clustering based on vector space embedding. World Scientific

  42. Ferrer M, Valveny E, Serratosa F, Riesen K, Bunke H (2010) Generalized median graph computation by means of graph embedding in vector spaces. Pattern Recogn 43(4):1642–1655

    Article  MATH  Google Scholar 

  43. Zhu P, Wilson RC (2005) A study of graph spectra for comparing graphs. In: Proceedings of the the British machine vision conference, BMVC

  44. Wilson RC, Zhu P (2008) A study of graph spectra for comparing graphs and trees. Pattern Recogn 41:2833–2841

    Article  MATH  Google Scholar 

  45. Elghawalby H, Hancock ER (2008) Measuring graph similarity using spectral geometry. In: Proceedings of the 5th international conference on image analysis and recognition, ICIAR, 2008, pp 517–526

  46. Zhang S, Tong H (2016) Final: fast attributed network alignment. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 1345–1354

  47. Riesen K, Bunke H (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vis Comput 27(7):950–959

    Article  Google Scholar 

  48. Fankhauser S, Riesen K, Bunke H (2011) Speeding up graph edit distance computation through fast bipartite matching. In: Proceedings of the 8th international workshop on graph-based representations in pattern recognition, GbRPR, pp 102–111

  49. Heimann M, Shen H, Safavi T, Koutra D (2018) REGAL: representation learning-based graph alignment. In: Proceedings of the 27th ACM international conference on information and knowledge management, CIKM, pp 117–126

  50. Chen X, Heimann M, Vahedian F, Koutra D (2020) CONE-align: consistent network alignment with proximity-preserving node embedding. In: Proceedings of the The 29th ACM international conference on information and knowledge management, CIKM, pp 1985–1988

  51. Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, 2003, pp. 137–146

  52. Myers SA, Leskovec J (2010) On the convexity of latent social network inference. In: Proceedings of the 24th annual conference on neural information processing systems, NeurIPS, pp 1741–1749

  53. Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):211–2137

    Article  Google Scholar 

  54. Abrahao B D, Chierichetti F, Kleinberg R, Panconesi A (2013) Trace complexity of network inference. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 491–499

  55. Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. In: Proceedings of the 10th international workshop on artificial intelligence and statistics, AISTATS

  56. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th annual conference on neural information processing systems, NeurIPS, pp 3111–3119

  57. Gold S, Rangarajan A (1996) A graduated assignment algorithm for graph matching. IEEE Trans Pattern Anal Mach Intell 18(4):377–388

    Article  Google Scholar 

  58. Wiskott L, Fellous J, Krüger N, von der Malsburg C (1997) Face recognition by elastic bunch graph matching. In: Proceedings of the 7th international conference on computer analysis of images and patterns, CAIP, vol 1296, pp 456–463

  59. Babai L (2016) Graph isomorphism in quasipolynomial time [extended abstract]. In Proceedings of the 48th annual ACM SIGACT symposium on theory of computing, STOC, 2016, pp 684–697

  60. Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res Logist Quart 3(1–2):95–110

    Article  MathSciNet  Google Scholar 

  61. Boyd SP, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  62. Andersen M, Dahl J, Liu Z, Vandenberghe L, Sra S, Nowozin S, Wright S (2011)Interior-point methods for large-scale cone programming. Optim Mach Learn 5583

  63. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press

  64. Bertsekas DP (1999) Nonlinear programming. Athena Scientific Belmont,

  65. Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Quart 2(1–2):83–97

    Article  MathSciNet  MATH  Google Scholar 

  66. Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex of \(R^n\). J Optim Theory Appl 50(1):195–200

    Article  MathSciNet  MATH  Google Scholar 

  67. Gold S, Rangarajan A (1996) Softmax to Softassign: neural network algorithms for combinatorial optimization. J Art Neural Netw 2(4):381–399

    Google Scholar 

  68. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  69. Leskovec J, Kleinberg JM, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1): 2–es

  70. Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW, Jones JH (2010) A high-resolution human contact network for infectious disease transmission. Proc Natl Acad Sci 107(51):22020–22025

    Article  Google Scholar 

  71. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  72. Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2021) Graph contrastive learning with adaptive augmentation. In: WWW ’21: the web conference 2021, Virtual Event/Ljubljana, Slovenia, April 19–23, 2021. ACM / IW3C2, 2021, pp 2069–2080

  73. Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp 1225–1234

  74. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Int Math 6(1):29–123

    MathSciNet  MATH  Google Scholar 

  75. Erdös P, Rényi A (1959) On Random Graphs I. Publicationes Mathematicae Debrecen 6:290–297

    Article  MathSciNet  MATH  Google Scholar 

  76. Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5(2):109–137

    Article  MathSciNet  Google Scholar 

  77. Rossetti G, Milli L, Rinzivillo, S Sîrbu A, Pedreschi D, Giannotti F (2017) NDlib: studying network diffusion dynamics. In: Proceedings of the IEEE international conference on data science and advanced analytics, DSAA, 2017, pp 155–164

  78. Glantz SA, Slinker BK, Neilands TB (1990) Primer of applied regression and analysis of variance, vol 309. McGraw-Hill

  79. Draper NR, Smith H (1998) Applied regression analysis, vol 326. Wiley

Download references

Acknowledgements

The authors gratefully acknowledge support by the National Science Foundation (Grants IIS-1741197, CCF-1750539) and Google via GCP credit support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Gritsenko.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gritsenko, A., Shayestehfard, K., Guo, Y. et al. Graph transfer learning. Knowl Inf Syst 65, 1627–1656 (2023). https://doi.org/10.1007/s10115-022-01782-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01782-6

Keywords

Navigation