Skip to main content

Step-by-Step and Tailored Teaching: Dynamic Knowledge Distillation

  • Conference paper
  • First Online:
Wireless Artificial Intelligent Computing Systems and Applications (WASA 2024)

Abstract

Knowledge distillation typically involves transferring knowledge from large-scale teacher models to student models, garnering widespread attention in the fields of model compression and knowledge transfer. However, existing research often encounters several challenges. Firstly, the transmission process usually fixes the teacher model, making it difficult to adjust the learning situation of the student model according to individual needs. Secondly, the difficulty of tasks affects the learning ability of the student model. In training tasks, difficulty levels are often coupled with data, posing a challenge in guiding the student model to gradually master knowledge. To address these issues, this study proposes a dynamic learning temperature-based meta-learning knowledge distillation method, namely Temperature Meta-learning Knowledge Distillation (TMKD). Inspired by meta-learning heuristics, this algorithm enables the teacher model to dynamically adjust knowledge transfer strategies based on student feedback, facilitating tailored teaching. Furthermore, a Dynamic Temperature Regulation Module (DTRM) is constructed to flexibly control the difficulty level of tasks, allowing the student model to progressively learn knowledge. Finally, we design a Selective Insight Attention mechanism to ensure that the student network focuses more on key information during learning and inference, thereby enhancing overall performance. Extensive experiments on CIFAR-100 and ImageNet demonstrate the effectiveness of our method.

Z. Zhang and L. Geng share the co-first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)

    Google Scholar 

  2. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  3. Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11953–11962 (2022)

    Google Scholar 

  4. Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B.: Fitnets: hints for thin deep nets. Proc. ICLR 2(3), 1 (2015)

    Google Scholar 

  5. Zhou, W., Xu, C., McAuley, J.: Bert learns to teach: knowledge distillation with meta learning. arXiv preprint arXiv:2106.04570 (2021)

  6. Wang, C., Yang, Q., Huang, R., Song, S., Huang, G.: Efficient knowledge distillation from model checkpoints. Adv. Neural. Inf. Process. Syst. 35, 607–619 (2022)

    Google Scholar 

  7. Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)

    Google Scholar 

  8. Hsu, K., Levine, S., Finn, C.: Unsupervised learning via meta-learning. arXiv preprint arXiv:1810.02334 (2018)

  9. Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: International Conference on Machine Learning, pp. 1920–1930. PMLR (2019)

    Google Scholar 

  10. Lin, X., et al.: Exploratory adversarial attacks on graph neural networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1136–1141. IEEE (2020)

    Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (2012)

    Google Scholar 

  12. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)

  13. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)

  14. Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)

    Google Scholar 

  15. Park, D.Y., Cha, M.-H., Kim, D., Han, B., et al.: Learning student-friendly teacher networks for knowledge distillation. In: Advances in Neural Information Processing Systems, vol. 34, pp. 13292–13303 (2021)

    Google Scholar 

  16. Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 283–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_17

    Chapter  Google Scholar 

  17. Baik, S., Choi, J., Kim, H., Cho, D., Min, J., Lee, K.M.: Meta-learning with task-adaptive loss function for few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9465–9474 (2021)

    Google Scholar 

  18. Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320–4328 (2018)

    Google Scholar 

  19. Finn, C., Abbeel, P., Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)

    Google Scholar 

  20. Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  21. Chen, Y., Leung, C.T., Huang, Y., Sun, J., Chen, H., Gao, H.: Molnextr: a generalized deep learning model for molecular image recognition. arXiv preprint arXiv:2403.03691 (2024)

  22. Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San Francisco (2015)

    Google Scholar 

  23. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  24. Deng, J., Dong, W., Socher, RR., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Technological Innovation 2030- Major Project of “New Generation Artificial Intelligence” (2022ZD0118602), the Shandong Provincial Natural Science Foundation (ZR2021LZH008, ZR2022LZH010), the NSFC under Grant 62072278, Jinan City’s “20 New Universities” Funding Project 202333043, and the Open Project of Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences) under Grant 2023ZD007.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chunxiao Wang or Zhigang Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Geng, L., Du, W., Li, F., Wang, C., Zhao, Z. (2025). Step-by-Step and Tailored Teaching: Dynamic Knowledge Distillation. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71464-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71463-4

  • Online ISBN: 978-3-031-71464-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics