Harmonizing Knowledge Transfer in Neural Network with Unified Distillation

Huang, Yaomin; Yan, Zaomin; Shen, Chaomin; Fang, Faming; Zhang, Guixu

doi:10.1007/978-3-031-73414-4_4

Yaomin Huang ORCID: orcid.org/0000-0002-8195-4978¹³,
Zaomin Yan¹³,
Chaomin Shen ORCID: orcid.org/0000-0001-9389-6472¹³,
Faming Fang¹³ &
…
Guixu Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15091))

Included in the following conference series:

European Conference on Computer Vision

417 Accesses

Abstract

Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers’ features, and logits-based, targeting the final layer’s logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, effectively gathering semantic information from different stages and scales. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, thereby allowing for knowledge distillation through a unified distribution constraint at different stages of the network, ensuring the comprehensiveness and coherence of knowledge transfer. Numerous experiments were conducted to validate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DistPro: Searching a Fast Knowledge Distillation Process via Meta Optimization

Triplet Mapping for Continuously Knowledge Distillation

Knowledge Condensation Distillation

References

Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chen, P., Liu, S., Zhao, H., Jia, J.: Distilling knowledge via knowledge review. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5008–5017 (2021)
Google Scholar
Cho, J.H., Hariharan, B.: On the efficacy of knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4794–4802 (2019)
Google Scholar
Chowdhery, A., et al.: PaLM: scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNET: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Helong, Z., et al.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1921–1930 (2019)
Google Scholar
Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp. 3779–3787. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33013779
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hsu, Y.C., Smith, J., Shen, Y., Kira, Z., Jin, H.: A closer look at knowledge distillation with features, logits, and gradients. arXiv preprint arXiv:2203.10163 (2022)
Jin, Y., Wang, J., Lin, D.: Multi-level logit distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24276–24285 (2023)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Li, J., et al.: Rethinking feature-based knowledge distillation for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20156–20165 (2023)
Google Scholar
Lin, S., et al.: Knowledge distillation via the target-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10915–10924 (2022)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Shu, C., Liu, Y., Gao, J., Yan, Z., Shen, C.: Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5311–5320 (2021)
Google Scholar
Song, J., Chen, Y., Ye, J., Song, M.: Spot-adaptive knowledge distillation. IEEE Trans. Image Process. 31, 3359–3370 (2022)
Article Google Scholar
Sun, S., Ren, W., Li, J., Wang, R., Cao, X.: Logit standardization in knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15731–15740 (2024)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
Wang, J., Chen, Y., Zheng, Z., Li, X., Cheng, M.M., Hou, Q.: CrossKD: cross-head knowledge distillation for dense object detection. arXiv preprint arXiv:2306.11369 (2023)
Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4933–4942 (2019)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Yang, Z., Zeng, A., Li, Z., Zhang, T., Yuan, C., Li, Y.: From knowledge distillation to self-knowledge distillation: a unified approach with normalized loss and customized soft labels. arXiv preprint arXiv:2303.13005 (2023)
Zagoruyko, S., Komodakis, N.: Paying More attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11953–11962 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Key R & D Program of China (2022ZD0161800), and the National Natural Science Foundation of China (62271203).

Author information

Authors and Affiliations

School of Computer Science, East China Normal University, Shanghai, China
Yaomin Huang, Zaomin Yan, Chaomin Shen, Faming Fang & Guixu Zhang

Authors

Yaomin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zaomin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Chaomin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Faming Fang
View author publications
You can also search for this author in PubMed Google Scholar
Guixu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guixu Zhang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Y., Yan, Z., Shen, C., Fang, F., Zhang, G. (2025). Harmonizing Knowledge Transfer in Neural Network with Unified Distillation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Springer, Cham. https://doi.org/10.1007/978-3-031-73414-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-73414-4_4
Published: 25 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73413-7
Online ISBN: 978-3-031-73414-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Harmonizing Knowledge Transfer in Neural Network with Unified Distillation