Train Your Own GNN Teacher: Graph-Aware Distillation on Textual Graphs

Mavromatis, Costas; Ioannidis, Vassilis N.; Wang, Shen; Zheng, Da; Adeshina, Soji; Ma, Jun; Zhao, Han; Faloutsos, Christos; Karypis, George

doi:10.1007/978-3-031-43418-1_10

Costas Mavromatis¹²,
Vassilis N. Ioannidis¹³,
Shen Wang¹³,
Da Zheng¹³,
Soji Adeshina¹³,
Jun Ma¹⁴,
Han Zhao^13,15,
Christos Faloutsos^13,16 &
…
George Karypis^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14171))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1633 Accesses

Abstract

How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by developing a Graph-Aware Distillation framework (GraD) to encode graph structures into an LM for graph-free, fast inference. Different from conventional knowledge distillation, GraD jointly optimizes a GNN teacher and a graph-free student over the graph’s nodes via a shared LM. This encourages the graph-free student to exploit graph information encoded by the GNN teacher while at the same time, enables the GNN teacher to better leverage textual information from unlabeled nodes. As a result, the teacher and the student models learn from each other to improve their overall performance. Experiments in eight node classification benchmarks in both transductive and inductive settings showcase GraD ’s superiority over existing distillation approaches for textual graphs. Our code and supplementary material are available at: https://github.com/cmavro/GRAD.

C. Mavromatis—Work done while interning at Amazon Web Services, Santa Clara.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Distilling Graph Neural Networks

An In-depth Analysis of Graph Neural Networks for Semi-supervised Learning

Adaptively Denoising Graph Neural Networks for Knowledge Distillation

Notes

1.
For example, the inference cost of a single transformer layer is ${\mathcal {O}}(L^2d+Ld^2)$, where L is the sequence length and d is the number of hidden dimensions.

References

Ando, R., Zhang, T.: Learning on graph with laplacian regularization. In: NIPS (2006)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: EMNLP-IJCNLP (2019)
Google Scholar
Chien, E., et al.: Node feature extraction by self-supervised multi-scale neighborhood prediction. In: ICLR (2022)
Google Scholar
Deng, X., Zhang, Z.: Graph-free knowledge distillation for graph neural networks. arXiv (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: ACL (2019)
Google Scholar
Dinh, T.A., Boef, J.D., Cornelisse, J., Groth, P.: E2EG: end-to-end node classification using graph topology and text-based node attributes. arXiv (2022)
Google Scholar
Dong, W., Wu, J., Luo, Y., Ge, Z., Wang, P.: Node representation learning in graph via node-to-neighbourhood mutual information maximization. In: IEEE/CVF CVPR (2022)
Google Scholar
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: ICML (2017)
Google Scholar
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. In: IJCV (2021)
Google Scholar
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS (2017)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv (2015)
Google Scholar
Hu, W., et al.: Open graph benchmark: datasets for machine learning on graphs. In: NeurIPS (2020)
Google Scholar
Hu, Y., You, H., Wang, Z., Wang, Z., Zhou, E., Gao, Y.: Graph-MLP: node classification without message passing in graph (2021)
Google Scholar
Huang, L., Ma, D., Li, S., Zhang, X., Wang, H.: Text level graph neural network for text classification. In: EMNLP (2019)
Google Scholar
Ioannidis, V.N., et al.: Efficient and effective training of language and graph neural network models. arXiv (2022)
Google Scholar
Jia, J., Benson, A.R.: Residual correlation in graph neural network regression. In: KDD (2020)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Google Scholar
Li, C., et al.: AdsGNN: behavior-graph augmented relevance modeling in sponsored search. In: ACM SIGIR (2021)
Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv (2019)
Google Scholar
Mavromatis, C., Karypis, G.: ReaRev: adaptive reasoning for question answering over knowledge graphs. arXiv (2022)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv (2019)
Google Scholar
Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.V.D., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)
Google Scholar
Xu, Y., Zhang, Y., Guo, W., Guo, H., Tang, R., Coates, M.: Graphsail: graph structure aware incremental learning for recommender systems. In: CIKM (2020)
Google Scholar
Yan, B., Wang, C., Guo, G., Lou, Y.: Tinygnn: learning efficient graph neural networks. In: KDD (2020)
Google Scholar
Yang, C., Liu, J., Shi, C.: Extract the knowledge of graph neural networks and go beyond it: an effective knowledge distillation framework. In: WWW (2021)
Google Scholar
Yang, H., Ma, K., Cheng, J.: Rethinking graph regularization for graph neural networks. In: AAAI (2021)
Google Scholar
Yang, J., et al.: Graphformers: GNN-nested transformers for representation learning on textual graph. In: NeurIPS (2021)
Google Scholar
Yang, Y., Qiu, J., Song, M., Tao, D., Wang, X.: Distilling knowledge from graph convolutional networks. In: IEEE/CVF CVPR (2020)
Google Scholar
Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: AAAI (2019)
Google Scholar
Yasunaga, M., et al.: Deep bidirectional language-knowledge graph pretraining. In: NeurIPS (2022)
Google Scholar
Yasunaga, M., Leskovec, J., Liang, P.: Linkbert: pretraining language models with document links. In: ACL (2022)
Google Scholar
Yuan, L., Tay, F.E., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: IEEE/CVF CVPR (2020)
Google Scholar
Zhang, J., Zhang, H., Xia, C., Sun, L.: Graph-bert: only attention is needed for learning graph representations. arXiv (2020)
Google Scholar
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: IEEE/CVF ICCV (2019)
Google Scholar
Zhang, S., Liu, Y., Sun, Y., Shah, N.: Graph-less neural networks: teaching old MLPs new tricks via distillation. In: ICLR (2022)
Google Scholar
Zhang, W., Deng, L., Zhang, L., Wu, D.: A survey on negative transfer. arXiv (2020)
Google Scholar
Zhang, X., et al.: GreaseLM: graph REASoning enhanced language models. In: ICLR (2022)
Google Scholar
Zhao, J., et al.: Learning on large-scale text-attributed graphs via variational inference. arXiv (2022)
Google Scholar
Zheng, W., Huang, E.W., Rao, N., Katariya, S., Wang, Z., Subbian, K.: Cold brew: distilling graph node representations with incomplete or missing neighborhoods. In: ICLR (2022)
Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: NIPS (2003)
Google Scholar
Zhu, J., et al.: Textgnn: improving text encoder via graph neural network in sponsored search. In: WWW (2021)
Google Scholar

Download references

Acknowledgment

Part of this work was supported by NSF (1704074, 1757916, 1834251, 1834332). Access to research and computing facilities was provided by the College of Science & Engineering and the Minnesota Supercomputing Institute.

Author information

Authors and Affiliations

University of Minnesota, Minneapolis, USA
Costas Mavromatis & George Karypis
Amazon Web Services, Seattle, USA
Vassilis N. Ioannidis, Shen Wang, Da Zheng, Soji Adeshina, Han Zhao, Christos Faloutsos & George Karypis
Walgreens AI Labs, Seattle, USA
Jun Ma
University of Illinois at Urbana-Champaign, Champaign, USA
Han Zhao
Carnegie Mellon University, Pittsburgh, USA
Christos Faloutsos

Authors

Costas Mavromatis
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis N. Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar
Shen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Da Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Soji Adeshina
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Han Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
George Karypis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Costas Mavromatis .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Limitations and Ethical Statement

GraD relies on informative input node features to learn effective shared LMs (or MLPs) that can generalize to unseen nodes, which is the case in textual graphs. Thus, one limitation is that it is not certain how GraD generalizes to other graphs, e.g., to featureless graphs. Moreover as a knowledge distillation approach, GraD trades accuracy for computation efficiency and it cannot adapt to dynamic graphs with edge changes the same way as GNN could. To overcome biases encoded in the training graph, e.g., standard stereotypes in recommender graphs, GraD needs to be retrained over the new unbiased graph.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mavromatis, C. et al. (2023). Train Your Own GNN Teacher: Graph-Aware Distillation on Textual Graphs. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-43418-1_10
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43417-4
Online ISBN: 978-3-031-43418-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Train Your Own GNN Teacher: Graph-Aware Distillation on Textual Graphs