Abstract
Feature distillation is a technology that uses the middle layer feature map of the teacher network as knowledge to transfer to the students. The feature information not only reflects the image information but also covers the feature extraction ability of the teacher network. However, the existing feature distillation methods lack theoretical guidance for feature map evaluation and suffer from the mismatch of sizes between high-dimensional feature maps and low-dimensional feature maps, and poor information utilization. In this paper, we propose an Adaptive Feature Map Pruning Method (AFMPM) for feature distillation, which transforms the problem of feature map pruning into the problem of optimization so that the valid information of the feature map is retained to the maximum extent. AFMPM has achieved significant improvements in feature distillation, and the advanced and generalized nature of the method has been verified by conducting experiments on the teacher-student distillation framework and the self-distillation framework.
Similar content being viewed by others
Data availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.All data generated or analysed during this study are included in this published article.
References
Wang C, Zhang S, Song S et al (2022) Learn from the past: experience ensemble knowledge distillation. arXiv preprint https://arxiv.org/abs/2202.12488
Yao J, Zhang S, Yao Y et al (2022) Edge-cloud polarization and collaboration: a comprehensive survey for AI. IEEE Trans Knowl Data Eng 35:6866
Liu Z, Sun M, Zhou T et al (2018) Rethinking the value of network pruning. arXiv preprint https://arxiv.org/abs/1810.0527
Wang D, Zhang S, Di Z et al (2022) A Novel Architecture Slimming Method for Network Pruning and Knowledge Distillation. arXiv preprint https://arxiv.org/abs/2202.10461
Yang C, An Z, Cai L et al (2022) Knowledge distillation using hierarchical self-supervision augmented distribution. IEEE Trans Neural Netw Learn Syst (TNNLS) 12:1–15
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint https://arxiv.org/abs/1503.02531
Romero A, Ballas N, Kahou SE et al (2015) FitNets: hints for thin deep nets. arXiv preprint https://arxiv.org/abs/1412.6550
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint https://arxiv.org/abs/1612.03928
Yim J, Joo D, Bae J et al (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7130–7138
Kim J, Park SU, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. Neural Inf Process Syst (NIPS) 31:2765–2774
Heo B, Lee M, Yun S et al (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI Conference on Artificial Intelligence, pp 3779–3787
Song J, Chen Y, Ye J et al (2022) Spot-adaptive knowledge distillation. In: IEEE Trans Image Process, pp 3359–3370
Srinivas S, Fleuret F (2018) Knowledge transfer with jacobian matching. In: International Conference on Machine Learning (PMLR), pp 4723–4731
Zhang L, Song J, Gao A et al (2019) Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 3713–3722
Lin M, Ji R, Wang Y et al (2020) Hrank: Filter pruning using high-rank feature map. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1529–1538
Amik FR, Tasin AI, Ahmed S et al (2022) Dynamic Rectification Knowledge Distillation. arXiv preprint https://arxiv.org/abs/2201.11319
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of systemic autoimmune diseases 1(4)
Park W, Kim D, Lu Y et al (2019) Relational knowledge distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3967–3976
Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. arXiv preprint https://arxiv.org/abs/1910.10699
Heo B, Kim J, Yun S et al (2019) A comprehensive overhaul of feature distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1921–1930
Huang Z, Wang N (2017) Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint https://arxiv.org/abs/1707.01219
Passalis N, Tefas A (2018) Probabilistic knowledge transfer for deep representation learning. IEEE Trans Neural Netw Learn Syst (TNNLS) 32:2030–2039
Ahn S, Hu SX, Damianou A et al (2019) Variational information distillation for knowledge transfer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9163 – 917
Sun D, Yao A, Zhou A et al (2019) Deeply-supervised knowledge synergy. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6997–7006
Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 1365–1374
Lee CY, Xie S, Gallagher P et al (2015) Deeply-supervised nets. Artif Intell Stat PMLR 21:562–570
Peng B, Jin X, Liu J et al (2019) Correlation congruence for knowledge distillation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 5007–5016
Chen D, Mei JP, Zhang Y et al (2021) Cross-layer distillation with semantic calibration. In: AAAI Conference on Artificial Intelligence, pp 7028–7036
Zhang L, Shi Y, Shi Z et al (2020) Task-oriented feature distillation. Neural Inf Process Syst (NIPS) 33:14759–14771
LeCun Y, Denker J, Solla S (1989) Optimal brain damage. Neural Inf Process Syst (NIPS) 598–605
Han S, Pool J, Tran J et al (2015) Learning both weights and connections for efficient neural network. Neural Inf Process Syst (NIPS) 1:1135–1143
Frankle J, Carbin M (2018) The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv preprint https://arxiv.org/abs/1803.03635
Frankle J, Dziugaite GK, Roy D et al (2020) Linear mode connectivity and the lottery ticket hypothesis. In: International Conference on Machine Learning (PMLR), pp 3259–3269
Ye J, Lu X, Lin Z et al (2018) Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint https://arxiv.org/abs/1802.00124
Zhuang T, Zhang Z, Huang Y et al (2020) Neuron-level structured pruning using polarization regularizer. Neural Inf Process Syst (NIPS) 33:9865–9877
Hu H, Peng R, Tai YW et al (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint https://arxiv.org/abs/1607.03250
Luo JH, Wu J (2017) An entropy-based pruning method for cnn compression. arXiv preprint arXiv:1706.05791
He Y, Liu P, Wang Z et al (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4340–4349
Wang X, Zheng Z, He Y et al (2023) Progressive local filter pruning for image retrieval acceleration. IEEE Trans Multimed 4:1–11
Li G, Wang J, Shen HW et al (2020) Cnnpruner: pruning convolutional neural networks with visual analytics. IEEE Trans Vis Comput Gr 27:1364–1373
Wang X, Zheng Z, He Y et al (2022) Soft person reidentification network pruning via blockwise adjacent filter decaying. IEEE Trans Cybern 52:13293–13307
Funding
This work was supported by the Natural Science Foundation of China (Grant No. 61976098), Science and Technology Development Foundation of Quanzhou City (Grant No. 2020C067). The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, Y., Zhang, W., Wang, J. et al. AFMPM: adaptive feature map pruning method based on feature distillation. Int. J. Mach. Learn. & Cyber. 15, 573–588 (2024). https://doi.org/10.1007/s13042-023-01926-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01926-2