Abstract
Knowledge distillation typically involves transferring knowledge from large-scale teacher models to student models, garnering widespread attention in the fields of model compression and knowledge transfer. However, existing research often encounters several challenges. Firstly, the transmission process usually fixes the teacher model, making it difficult to adjust the learning situation of the student model according to individual needs. Secondly, the difficulty of tasks affects the learning ability of the student model. In training tasks, difficulty levels are often coupled with data, posing a challenge in guiding the student model to gradually master knowledge. To address these issues, this study proposes a dynamic learning temperature-based meta-learning knowledge distillation method, namely Temperature Meta-learning Knowledge Distillation (TMKD). Inspired by meta-learning heuristics, this algorithm enables the teacher model to dynamically adjust knowledge transfer strategies based on student feedback, facilitating tailored teaching. Furthermore, a Dynamic Temperature Regulation Module (DTRM) is constructed to flexibly control the difficulty level of tasks, allowing the student model to progressively learn knowledge. Finally, we design a Selective Insight Attention mechanism to ensure that the student network focuses more on key information during learning and inference, thereby enhancing overall performance. Extensive experiments on CIFAR-100 and ImageNet demonstrate the effectiveness of our method.
Z. Zhang and L. Geng share the co-first authorship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11953–11962 (2022)
Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B.: Fitnets: hints for thin deep nets. Proc. ICLR 2(3), 1 (2015)
Zhou, W., Xu, C., McAuley, J.: Bert learns to teach: knowledge distillation with meta learning. arXiv preprint arXiv:2106.04570 (2021)
Wang, C., Yang, Q., Huang, R., Song, S., Huang, G.: Efficient knowledge distillation from model checkpoints. Adv. Neural. Inf. Process. Syst. 35, 607–619 (2022)
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)
Hsu, K., Levine, S., Finn, C.: Unsupervised learning via meta-learning. arXiv preprint arXiv:1810.02334 (2018)
Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: International Conference on Machine Learning, pp. 1920–1930. PMLR (2019)
Lin, X., et al.: Exploratory adversarial attacks on graph neural networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1136–1141. IEEE (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (2012)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Park, D.Y., Cha, M.-H., Kim, D., Han, B., et al.: Learning student-friendly teacher networks for knowledge distillation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 13292–13303 (2021)
Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 283–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_17
Baik, S., Choi, J., Kim, H., Cho, D., Min, J., Lee, K.M.: Meta-learning with task-adaptive loss function for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9465–9474 (2021)
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320–4328 (2018)
Finn, C., Abbeel, P., Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Chen, Y., Leung, C.T., Huang, Y., Sun, J., Chen, H., Gao, H.: Molnextr: a generalized deep learning model for molecular image recognition. arXiv preprint arXiv:2403.03691 (2024)
Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San Francisco (2015)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Deng, J., Dong, W., Socher, RR., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Acknowledgment
This work was supported by the Technological Innovation 2030- Major Project of “New Generation Artificial Intelligence” (2022ZD0118602), the Shandong Provincial Natural Science Foundation (ZR2021LZH008, ZR2022LZH010), the NSFC under Grant 62072278, Jinan City’s “20 New Universities” Funding Project 202333043, and the Open Project of Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences) under Grant 2023ZD007.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Geng, L., Du, W., Li, F., Wang, C., Zhao, Z. (2025). Step-by-Step and Tailored Teaching: Dynamic Knowledge Distillation. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-71464-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71463-4
Online ISBN: 978-3-031-71464-1
eBook Packages: Computer ScienceComputer Science (R0)