skip to main content
10.1145/3539597.3570480acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Learning to Distill Graph Neural Networks

Published: 27 February 2023 Publication History

Abstract

Graph Neural Networks (GNNs) can effectively capture both the topology and attribute information of a graph, and have been extensively studied in many domains. Recently, there is an emerging trend that equips GNNs with knowledge distillation for better efficiency or effectiveness. However, to the best of our knowledge, existing knowledge distillation methods applied on GNNs all employed predefined distillation processes, which are controlled by several hyper-parameters without any supervision from the performance of distilled models. Such isolation between distillation and evaluation would lead to suboptimal results. In this work, we aim to propose a general knowledge distillation framework that can be applied on any pretrained GNN models to further improve their performance. To address the isolation problem, we propose to parameterize and learn distillation processes suitable for distilling GNNs. Specifically, instead of introducing a unified temperature hyper-parameter as most previous work did, we will learn node-specific distillation temperatures towards better performance of distilled models. We first parameterize each node's temperature by a function of its neighborhood's encodings and predictions, and then design a novel iterative learning process for model distilling and temperature learning. We also introduce a scalable variant of our method to accelerate model training. Experimental results on five benchmark datasets show that our proposed framework can be applied on five popular GNN models and consistently improve their prediction accuracies with 3.12% relative enhancement on average. Besides, the scalable variant enables 8 times faster training speed at the cost of 1% prediction accuracy.

Supplementary Material

MP4 File (17_wsdm2023_guo_distill_graph_01.mp4-streaming.mp4)
Learning to Distill Graph Neural Networks

References

[1]
Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima'an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. arXiv preprint arXiv:1704.04675 (2017).
[2]
Hongxu Chen, Hongzhi Yin, Tong Chen, Quoc Viet Hung Nguyen, Wen-Chih Peng, and Xue Li. 2019. Exploiting centrality information with graph convolutions for network representation learning. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 590--601.
[3]
Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, and Junzhou Huang. 2021. On Self-Distilling Graph Neural Network. In Proceedings of IJCAI.
[4]
Bin Dong, Jikai Hou, Yiping Lu, and Zhihua Zhang. 2019. Distillation $approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network. arxiv (2019).
[5]
Kaituo Feng, Changsheng Li, Ye Yuan, and Guoren Wang. 2022. FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks. arXiv preprint arXiv:2206.06561 (2022).
[6]
Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. 2018. Born Again Neural Networks. In Proceedings of ICML.
[7]
Jun Gao, Jiazun Chen, Zhao Li, and Ji Zhang. 2021. ICS-GNN: lightweight interactive community search via graph neural network. Proceedings of the VLDB Endowment, Vol. 14, 6 (2021), 1006--1018.
[8]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for Quantum chemistry. In Proceedings of ICML. 1263--1272.
[9]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of NeurIPS. 1024--1034.
[10]
Huarui He, Jie Wang, Zhanqiu Zhang, and Feng Wu. 2022. Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation. arXiv preprint arXiv:2205.11678 (2022).
[11]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the knowledge in a neural network. Proceedings of NeurIPS.
[12]
Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, and Dacheng Tao. 2021. Amalgamating Knowledge From Heterogeneous Graph Neural Networks. In Proceedings of CVPR. 15709--15718.
[13]
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of ICLR.
[14]
Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In Proceedings of ICLR.
[15]
Weixuan Liang, Sihang Zhou, Jian Xiong, Xinwang Liu, Siwei Wang, En Zhu, Zhiping Cai, and Xin Xu. 2020. Multi-view spectral clustering with high-order optimal neighborhood laplacian matrix. IEEE Transactions on Knowledge and Data Engineering (2020).
[16]
Hao Liu, Jindong Han, Yanjie Fu, Jingbo Zhou, Xinjiang Lu, and Hui Xiong. 2020. Multi-modal transportation recommendation with unified route representation learning. Proceedings of the VLDB Endowment, Vol. 14, 3 (2020), 342--350.
[17]
Diego Marcheggiani and Ivan Titov. 2017. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling. In Proceedings of EMNLP.
[18]
Franco Maria Nardini, Cosimo Rulli, Salvatore Trani, and Rossano Venturini. 2022. Distilled Neural Networks for Efficient Learning to Rank. IEEE Transactions on Knowledge and Data Engineering (2022).
[19]
Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arxiv (2018).
[20]
Damien Teney, Lingqiao Liu, and Anton van Den Hengel. 2017. Graph-structured representations for visual question answering. In Proceedings of CVPR. 1--9.
[21]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 11 (2008).
[22]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NeurIPS. 5998--6008.
[23]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In Proceedings of ICLR.
[24]
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying Graph Convolutional Networks. In Proceedings of ICML. 6861--6871.
[25]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. TNNLS (2020).
[26]
Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5419.
[27]
Kunran Xu, Lai Rui, Yishi Li, and Lin Gu. 2020. Feature normalized knowledge distillation for image classification. In Proceedings of ECCV. Springer, 664--680.
[28]
Bencheng Yan, Chaokun Wang, Gaoyang Guo, and Yunkai Lou. 2020. TinyGNN: Learning Efficient Graph Neural Networks. In Proceedings of SIGKDD. 1848--1856.
[29]
Cheng Yang, Jiawei Liu, and Chuan Shi. 2021. Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework. In Proceedings of WWW. 1227--1237.
[30]
Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, and Xinchao Wang. 2020. Distilling Knowledge From Graph Convolutional Networks. In Proceedings of CVPR. 7074--7083.
[31]
Lingling Zhang, Shaowei Wang, Jun Liu, Qika Lin, Xiaojun Chang, Yaqiang Wu, and Qinghua Zheng. 2022. MuL-GRN: Multi-Level Graph Relation Network for Few-Shot Node Classification. IEEE Transactions on Knowledge and Data Engineering (2022).
[32]
Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah. 2021b. Graph-less neural networks: Teaching old mlps new tricks via distillation. In Proceedings of ICLR.
[33]
Wentao Zhang, Yuezihan Jiang, Yang Li, Zeang Sheng, Yu Shen, Xupeng Miao, Liang Wang, Zhi Yang, and Bin Cui. 2021a. ROD: Reception-aware Online Distillation for Sparse Graphs. In Proceedings of SIGKDD. 2232--2242.
[34]
Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, and Bin Cui. 2020. Reliable Data Distillation on Graph Convolutional Network. In Proceedings of SIGMOD. 1399--1414.
[35]
Zhilu Zhang and Mert Sabuncu. 2020. Self-Distillation as Instance-Specific Label Smoothing. Proceedings of NeurIPS.
[36]
Huan Zhao, Quanming Yao, and Weiwei Tu. 2021. Search to aggregate neighborhood for graph neural network. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 552--563.

Cited By

View all

Index Terms

  1. Learning to Distill Graph Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
    February 2023
    1345 pages
    ISBN:9781450394079
    DOI:10.1145/3539597
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph neural networks
    2. knowledge distillation

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    WSDM '23

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)135
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Data-Centric Graph Learning: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2024.348941211:1(1-20)Online publication date: Feb-2025
    • (2025)Learning robust MLPs on graphs via cross-layer distillation from a causal perspectivePattern Recognition10.1016/j.patcog.2025.111367162(111367)Online publication date: Jun-2025
    • (2025)NoRD: A framework for noise-resilient self-distillation through relative supervisionApplied Intelligence10.1007/s10489-025-06355-y55:6Online publication date: 1-Apr-2025
    • (2024)Graph Condensation for Inductive Node Representation Learning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00237(3056-3069)Online publication date: 13-May-2024
    • (2023)Online cross-layer knowledge distillation on graph neural networks with deep supervisionNeural Computing and Applications10.1007/s00521-023-08900-735:30(22359-22374)Online publication date: 8-Aug-2023
    • (2023)Graph Representation LearningRepresentation Learning for Natural Language Processing10.1007/978-981-99-1600-9_6(169-210)Online publication date: 24-Aug-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media