Skip to main content
Log in

AFMPM: adaptive feature map pruning method based on feature distillation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Feature distillation is a technology that uses the middle layer feature map of the teacher network as knowledge to transfer to the students. The feature information not only reflects the image information but also covers the feature extraction ability of the teacher network. However, the existing feature distillation methods lack theoretical guidance for feature map evaluation and suffer from the mismatch of sizes between high-dimensional feature maps and low-dimensional feature maps, and poor information utilization. In this paper, we propose an Adaptive Feature Map Pruning Method (AFMPM) for feature distillation, which transforms the problem of feature map pruning into the problem of optimization so that the valid information of the feature map is retained to the maximum extent. AFMPM has achieved significant improvements in feature distillation, and the advanced and generalized nature of the method has been verified by conducting experiments on the teacher-student distillation framework and the self-distillation framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.All data generated or analysed during this study are included in this published article.

References

  1. Wang C, Zhang S, Song S et al (2022) Learn from the past: experience ensemble knowledge distillation. arXiv preprint https://arxiv.org/abs/2202.12488

  2. Yao J, Zhang S, Yao Y et al (2022) Edge-cloud polarization and collaboration: a comprehensive survey for AI. IEEE Trans Knowl Data Eng 35:6866

    Google Scholar 

  3. Liu Z, Sun M, Zhou T et al (2018) Rethinking the value of network pruning. arXiv preprint https://arxiv.org/abs/1810.0527

  4. Wang D, Zhang S, Di Z et al (2022) A Novel Architecture Slimming Method for Network Pruning and Knowledge Distillation. arXiv preprint https://arxiv.org/abs/2202.10461

  5. Yang C, An Z, Cai L et al (2022) Knowledge distillation using hierarchical self-supervision augmented distribution. IEEE Trans Neural Netw Learn Syst (TNNLS) 12:1–15

    Google Scholar 

  6. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint https://arxiv.org/abs/1503.02531

  7. Romero A, Ballas N, Kahou SE et al (2015) FitNets: hints for thin deep nets. arXiv preprint https://arxiv.org/abs/1412.6550

  8. Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint https://arxiv.org/abs/1612.03928

  9. Yim J, Joo D, Bae J et al (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7130–7138

  10. Kim J, Park SU, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. Neural Inf Process Syst (NIPS) 31:2765–2774

    Google Scholar 

  11. Heo B, Lee M, Yun S et al (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI Conference on Artificial Intelligence, pp 3779–3787

  12. Song J, Chen Y, Ye J et al (2022) Spot-adaptive knowledge distillation. In: IEEE Trans Image Process, pp 3359–3370

  13. Srinivas S, Fleuret F (2018) Knowledge transfer with jacobian matching. In: International Conference on Machine Learning (PMLR), pp 4723–4731

  14. Zhang L, Song J, Gao A et al (2019) Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 3713–3722

  15. Lin M, Ji R, Wang Y et al (2020) Hrank: Filter pruning using high-rank feature map. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1529–1538

  16. Amik FR, Tasin AI, Ahmed S et al (2022) Dynamic Rectification Knowledge Distillation. arXiv preprint https://arxiv.org/abs/2201.11319

  17. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of systemic autoimmune diseases 1(4)

  18. Park W, Kim D, Lu Y et al (2019) Relational knowledge distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3967–3976

  19. Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. arXiv preprint https://arxiv.org/abs/1910.10699

  20. Heo B, Kim J, Yun S et al (2019) A comprehensive overhaul of feature distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1921–1930

  21. Huang Z, Wang N (2017) Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint https://arxiv.org/abs/1707.01219

  22. Passalis N, Tefas A (2018) Probabilistic knowledge transfer for deep representation learning. IEEE Trans Neural Netw Learn Syst (TNNLS) 32:2030–2039

    Article  MathSciNet  Google Scholar 

  23. Ahn S, Hu SX, Damianou A et al (2019) Variational information distillation for knowledge transfer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9163 – 917

  24. Sun D, Yao A, Zhou A et al (2019) Deeply-supervised knowledge synergy. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6997–7006

  25. Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 1365–1374

  26. Lee CY, Xie S, Gallagher P et al (2015) Deeply-supervised nets. Artif Intell Stat PMLR 21:562–570

    Google Scholar 

  27. Peng B, Jin X, Liu J et al (2019) Correlation congruence for knowledge distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 5007–5016

  28. Chen D, Mei JP, Zhang Y et al (2021) Cross-layer distillation with semantic calibration. In: AAAI Conference on Artificial Intelligence, pp 7028–7036

  29. Zhang L, Shi Y, Shi Z et al (2020) Task-oriented feature distillation. Neural Inf Process Syst (NIPS) 33:14759–14771

    Google Scholar 

  30. LeCun Y, Denker J, Solla S (1989) Optimal brain damage. Neural Inf Process Syst (NIPS) 598–605

  31. Han S, Pool J, Tran J et al (2015) Learning both weights and connections for efficient neural network. Neural Inf Process Syst (NIPS) 1:1135–1143

  32. Frankle J, Carbin M (2018) The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint https://arxiv.org/abs/1803.03635

  33. Frankle J, Dziugaite GK, Roy D et al (2020) Linear mode connectivity and the lottery ticket hypothesis. In: International Conference on Machine Learning (PMLR), pp 3259–3269

  34. Ye J, Lu X, Lin Z et al (2018) Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint https://arxiv.org/abs/1802.00124

  35. Zhuang T, Zhang Z, Huang Y et al (2020) Neuron-level structured pruning using polarization regularizer. Neural Inf Process Syst (NIPS) 33:9865–9877

    Google Scholar 

  36. Hu H, Peng R, Tai YW et al (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint https://arxiv.org/abs/1607.03250

  37. Luo JH, Wu J (2017) An entropy-based pruning method for cnn compression. arXiv preprint arXiv:1706.05791

  38. He Y, Liu P, Wang Z et al (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4340–4349

  39. Wang X, Zheng Z, He Y et al (2023) Progressive local filter pruning for image retrieval acceleration. IEEE Trans Multimed 4:1–11

    Google Scholar 

  40. Li G, Wang J, Shen HW et al (2020) Cnnpruner: pruning convolutional neural networks with visual analytics. IEEE Trans Vis Comput Gr 27:1364–1373

    Article  Google Scholar 

  41. Wang X, Zheng Z, He Y et al (2022) Soft person reidentification network pruning via blockwise adjacent filter decaying. IEEE Trans Cybern 52:13293–13307

    Article  Google Scholar 

Download references

Funding

This work was supported by the Natural Science Foundation of China (Grant No. 61976098), Science and Technology Development Foundation of Quanzhou City (Grant No. 2020C067). The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiwei Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Y., Zhang, W., Wang, J. et al. AFMPM: adaptive feature map pruning method based on feature distillation. Int. J. Mach. Learn. & Cyber. 15, 573–588 (2024). https://doi.org/10.1007/s13042-023-01926-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01926-2

Keywords

Navigation