Skip to main content

Train Your Own GNN Teacher: Graph-Aware Distillation on Textual Graphs

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Abstract

How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by developing a Graph-Aware Distillation framework (GraD) to encode graph structures into an LM for graph-free, fast inference. Different from conventional knowledge distillation, GraD jointly optimizes a GNN teacher and a graph-free student over the graph’s nodes via a shared LM. This encourages the graph-free student to exploit graph information encoded by the GNN teacher while at the same time, enables the GNN teacher to better leverage textual information from unlabeled nodes. As a result, the teacher and the student models learn from each other to improve their overall performance. Experiments in eight node classification benchmarks in both transductive and inductive settings showcase GraD ’s superiority over existing distillation approaches for textual graphs. Our code and supplementary material are available at: https://github.com/cmavro/GRAD.

C. Mavromatis—Work done while interning at Amazon Web Services, Santa Clara.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For example, the inference cost of a single transformer layer is \({\mathcal {O}}(L^2d+Ld^2)\), where L is the sequence length and d is the number of hidden dimensions.

References

  1. Ando, R., Zhang, T.: Learning on graph with laplacian regularization. In: NIPS (2006)

    Google Scholar 

  2. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  3. Chien, E., et al.: Node feature extraction by self-supervised multi-scale neighborhood prediction. In: ICLR (2022)

    Google Scholar 

  4. Deng, X., Zhang, Z.: Graph-free knowledge distillation for graph neural networks. arXiv (2021)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: ACL (2019)

    Google Scholar 

  6. Dinh, T.A., Boef, J.D., Cornelisse, J., Groth, P.: E2EG: end-to-end node classification using graph topology and text-based node attributes. arXiv (2022)

    Google Scholar 

  7. Dong, W., Wu, J., Luo, Y., Ge, Z., Wang, P.: Node representation learning in graph via node-to-neighbourhood mutual information maximization. In: IEEE/CVF CVPR (2022)

    Google Scholar 

  8. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: ICML (2017)

    Google Scholar 

  9. Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. In: IJCV (2021)

    Google Scholar 

  10. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS (2017)

    Google Scholar 

  11. Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv (2015)

    Google Scholar 

  12. Hu, W., et al.: Open graph benchmark: datasets for machine learning on graphs. In: NeurIPS (2020)

    Google Scholar 

  13. Hu, Y., You, H., Wang, Z., Wang, Z., Zhou, E., Gao, Y.: Graph-MLP: node classification without message passing in graph (2021)

    Google Scholar 

  14. Huang, L., Ma, D., Li, S., Zhang, X., Wang, H.: Text level graph neural network for text classification. In: EMNLP (2019)

    Google Scholar 

  15. Ioannidis, V.N., et al.: Efficient and effective training of language and graph neural network models. arXiv (2022)

    Google Scholar 

  16. Jia, J., Benson, A.R.: Residual correlation in graph neural network regression. In: KDD (2020)

    Google Scholar 

  17. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)

    Google Scholar 

  18. Li, C., et al.: AdsGNN: behavior-graph augmented relevance modeling in sponsored search. In: ACM SIGIR (2021)

    Google Scholar 

  19. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv (2019)

    Google Scholar 

  20. Mavromatis, C., Karypis, G.: ReaRev: adaptive reasoning for question answering over knowledge graphs. arXiv (2022)

    Google Scholar 

  21. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)

    Google Scholar 

  22. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  23. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv (2019)

    Google Scholar 

  24. Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.V.D., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC (2018)

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  26. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)

    Google Scholar 

  27. Xu, Y., Zhang, Y., Guo, W., Guo, H., Tang, R., Coates, M.: Graphsail: graph structure aware incremental learning for recommender systems. In: CIKM (2020)

    Google Scholar 

  28. Yan, B., Wang, C., Guo, G., Lou, Y.: Tinygnn: learning efficient graph neural networks. In: KDD (2020)

    Google Scholar 

  29. Yang, C., Liu, J., Shi, C.: Extract the knowledge of graph neural networks and go beyond it: an effective knowledge distillation framework. In: WWW (2021)

    Google Scholar 

  30. Yang, H., Ma, K., Cheng, J.: Rethinking graph regularization for graph neural networks. In: AAAI (2021)

    Google Scholar 

  31. Yang, J., et al.: Graphformers: GNN-nested transformers for representation learning on textual graph. In: NeurIPS (2021)

    Google Scholar 

  32. Yang, Y., Qiu, J., Song, M., Tao, D., Wang, X.: Distilling knowledge from graph convolutional networks. In: IEEE/CVF CVPR (2020)

    Google Scholar 

  33. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: AAAI (2019)

    Google Scholar 

  34. Yasunaga, M., et al.: Deep bidirectional language-knowledge graph pretraining. In: NeurIPS (2022)

    Google Scholar 

  35. Yasunaga, M., Leskovec, J., Liang, P.: Linkbert: pretraining language models with document links. In: ACL (2022)

    Google Scholar 

  36. Yuan, L., Tay, F.E., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: IEEE/CVF CVPR (2020)

    Google Scholar 

  37. Zhang, J., Zhang, H., Xia, C., Sun, L.: Graph-bert: only attention is needed for learning graph representations. arXiv (2020)

    Google Scholar 

  38. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: IEEE/CVF ICCV (2019)

    Google Scholar 

  39. Zhang, S., Liu, Y., Sun, Y., Shah, N.: Graph-less neural networks: teaching old MLPs new tricks via distillation. In: ICLR (2022)

    Google Scholar 

  40. Zhang, W., Deng, L., Zhang, L., Wu, D.: A survey on negative transfer. arXiv (2020)

    Google Scholar 

  41. Zhang, X., et al.: GreaseLM: graph REASoning enhanced language models. In: ICLR (2022)

    Google Scholar 

  42. Zhao, J., et al.: Learning on large-scale text-attributed graphs via variational inference. arXiv (2022)

    Google Scholar 

  43. Zheng, W., Huang, E.W., Rao, N., Katariya, S., Wang, Z., Subbian, K.: Cold brew: distilling graph node representations with incomplete or missing neighborhoods. In: ICLR (2022)

    Google Scholar 

  44. Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: NIPS (2003)

    Google Scholar 

  45. Zhu, J., et al.: Textgnn: improving text encoder via graph neural network in sponsored search. In: WWW (2021)

    Google Scholar 

Download references

Acknowledgment

Part of this work was supported by NSF (1704074, 1757916, 1834251, 1834332). Access to research and computing facilities was provided by the College of Science & Engineering and the Minnesota Supercomputing Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Costas Mavromatis .

Editor information

Editors and Affiliations

Ethics declarations

Limitations and Ethical Statement

GraD relies on informative input node features to learn effective shared LMs (or MLPs) that can generalize to unseen nodes, which is the case in textual graphs. Thus, one limitation is that it is not certain how GraD generalizes to other graphs, e.g., to featureless graphs. Moreover as a knowledge distillation approach, GraD trades accuracy for computation efficiency and it cannot adapt to dynamic graphs with edge changes the same way as GNN could. To overcome biases encoded in the training graph, e.g., standard stereotypes in recommender graphs, GraD needs to be retrained over the new unbiased graph.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mavromatis, C. et al. (2023). Train Your Own GNN Teacher: Graph-Aware Distillation on Textual Graphs. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43418-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43417-4

  • Online ISBN: 978-3-031-43418-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics