Step-by-Step and Tailored Teaching: Dynamic Knowledge Distillation

Zhang, Zhenqiang; Geng, Liting; Du, Wenqing; Li, Feng; Wang, Chunxiao; Zhao, Zhigang

doi:10.1007/978-3-031-71464-1_30

Zhenqiang Zhang^11,12,
Liting Geng^11,12,
Wenqing Du^11,12,
Feng Li^11,13,
Chunxiao Wang^11,12 &
…
Zhigang Zhao^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14997))

Included in the following conference series:

International Conference on Wireless Artificial Intelligent Computing Systems and Applications

257 Accesses

Abstract

Knowledge distillation typically involves transferring knowledge from large-scale teacher models to student models, garnering widespread attention in the fields of model compression and knowledge transfer. However, existing research often encounters several challenges. Firstly, the transmission process usually fixes the teacher model, making it difficult to adjust the learning situation of the student model according to individual needs. Secondly, the difficulty of tasks affects the learning ability of the student model. In training tasks, difficulty levels are often coupled with data, posing a challenge in guiding the student model to gradually master knowledge. To address these issues, this study proposes a dynamic learning temperature-based meta-learning knowledge distillation method, namely Temperature Meta-learning Knowledge Distillation (TMKD). Inspired by meta-learning heuristics, this algorithm enables the teacher model to dynamically adjust knowledge transfer strategies based on student feedback, facilitating tailored teaching. Furthermore, a Dynamic Temperature Regulation Module (DTRM) is constructed to flexibly control the difficulty level of tasks, allowing the student model to progressively learn knowledge. Finally, we design a Selective Insight Attention mechanism to ensure that the student network focuses more on key information during learning and inference, thereby enhancing overall performance. Extensive experiments on CIFAR-100 and ImageNet demonstrate the effectiveness of our method.

Z. Zhang and L. Geng share the co-first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ATMKD: adaptive temperature guided multi-teacher knowledge distillation

Article 26 September 2024

DistPro: Searching a Fast Knowledge Distillation Process via Meta Optimization

Collaborative Multiple-Student Single-Teacher for Online Learning

References

Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11953–11962 (2022)
Google Scholar
Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B.: Fitnets: hints for thin deep nets. Proc. ICLR 2(3), 1 (2015)
Google Scholar
Zhou, W., Xu, C., McAuley, J.: Bert learns to teach: knowledge distillation with meta learning. arXiv preprint arXiv:2106.04570 (2021)
Wang, C., Yang, Q., Huang, R., Song, S., Huang, G.: Efficient knowledge distillation from model checkpoints. Adv. Neural. Inf. Process. Syst. 35, 607–619 (2022)
Google Scholar
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)
Google Scholar
Hsu, K., Levine, S., Finn, C.: Unsupervised learning via meta-learning. arXiv preprint arXiv:1810.02334 (2018)
Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: International Conference on Machine Learning, pp. 1920–1930. PMLR (2019)
Google Scholar
Lin, X., et al.: Exploratory adversarial attacks on graph neural networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1136–1141. IEEE (2020)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (2012)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Park, D.Y., Cha, M.-H., Kim, D., Han, B., et al.: Learning student-friendly teacher networks for knowledge distillation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 13292–13303 (2021)
Google Scholar
Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 283–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_17
Chapter Google Scholar
Baik, S., Choi, J., Kim, H., Cho, D., Min, J., Lee, K.M.: Meta-learning with task-adaptive loss function for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9465–9474 (2021)
Google Scholar
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320–4328 (2018)
Google Scholar
Finn, C., Abbeel, P., Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Chen, Y., Leung, C.T., Huang, Y., Sun, J., Chen, H., Gao, H.: Molnextr: a generalized deep learning model for molecular image recognition. arXiv preprint arXiv:2403.03691 (2024)
Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San Francisco (2015)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Deng, J., Dong, W., Socher, RR., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar

Download references

Acknowledgment

This work was supported by the Technological Innovation 2030- Major Project of “New Generation Artificial Intelligence” (2022ZD0118602), the Shandong Provincial Natural Science Foundation (ZR2021LZH008, ZR2022LZH010), the NSFC under Grant 62072278, Jinan City’s “20 New Universities” Funding Project 202333043, and the Open Project of Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences) under Grant 2023ZD007.

Author information

Authors and Affiliations

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
Zhenqiang Zhang, Liting Geng, Wenqing Du, Feng Li, Chunxiao Wang & Zhigang Zhao
Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Zhenqiang Zhang, Liting Geng, Wenqing Du, Chunxiao Wang & Zhigang Zhao
School of Computer Science and Technology, Shandong University, Qingdao, China
Feng Li

Authors

Zhenqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liting Geng
View author publications
You can also search for this author in PubMed Google Scholar
Wenqing Du
View author publications
You can also search for this author in PubMed Google Scholar
Feng Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunxiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chunxiao Wang or Zhigang Zhao .

Editor information

Editors and Affiliations

Georgia State University, Atlanta, GA, USA
Zhipeng Cai
Old Dominion University, Norfolk, VA, USA
Daniel Takabi
Beijing University of Posts and Telecommunications, Beijing, China
Shaoyong Guo
Shandong University, Qingdao, China
Yifei Zou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Geng, L., Du, W., Li, F., Wang, C., Zhao, Z. (2025). Step-by-Step and Tailored Teaching: Dynamic Knowledge Distillation. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-71464-1_30
Published: 13 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71463-4
Online ISBN: 978-3-031-71464-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Step-by-Step and Tailored Teaching: Dynamic Knowledge Distillation