research-article

Learning to Distill Graph Neural Networks

Authors:

Yao Xu,

Xin Li,

Hongzhi YinAuthors Info & Claims

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Pages 123 - 131

https://doi.org/10.1145/3539597.3570480

Published: 27 February 2023 Publication History

Get Access

Abstract

Graph Neural Networks (GNNs) can effectively capture both the topology and attribute information of a graph, and have been extensively studied in many domains. Recently, there is an emerging trend that equips GNNs with knowledge distillation for better efficiency or effectiveness. However, to the best of our knowledge, existing knowledge distillation methods applied on GNNs all employed predefined distillation processes, which are controlled by several hyper-parameters without any supervision from the performance of distilled models. Such isolation between distillation and evaluation would lead to suboptimal results. In this work, we aim to propose a general knowledge distillation framework that can be applied on any pretrained GNN models to further improve their performance. To address the isolation problem, we propose to parameterize and learn distillation processes suitable for distilling GNNs. Specifically, instead of introducing a unified temperature hyper-parameter as most previous work did, we will learn node-specific distillation temperatures towards better performance of distilled models. We first parameterize each node's temperature by a function of its neighborhood's encodings and predictions, and then design a novel iterative learning process for model distilling and temperature learning. We also introduce a scalable variant of our method to accelerate model training. Experimental results on five benchmark datasets show that our proposed framework can be applied on five popular GNN models and consistently improve their prediction accuracies with 3.12% relative enhancement on average. Besides, the scalable variant enables 8 times faster training speed at the cost of 1% prediction accuracy.

Supplementary Material

MP4 File (17_wsdm2023_guo_distill_graph_01.mp4-streaming.mp4)

Learning to Distill Graph Neural Networks

Download
226.12 MB

References

[1]

Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima'an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. arXiv preprint arXiv:1704.04675 (2017).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Two Heads Are Better Than One: Teaching MLPs with Multiple Graph Neural Networks via Knowledge Distillation

Adaptively Denoising Graph Neural Networks for Knowledge Distillation

Online adversarial knowledge distillation for graph neural networks

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations