Skip to main content

Advertisement

Log in

PDD: Pruning Neural Networks During Knowledge Distillation

  • Research
  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Although deep neural networks have developed at a high level, the large computational requirement limits the deployment in end devices. To this end, a variety of model compression and acceleration techniques have been developed. Among these, knowledge distillation has emerged as a popular approach that involves training a small student model to mimic the performance of a larger teacher model. However, the student architectures used in existing knowledge distillation are not optimal and always have redundancy, which raises questions about the validity of this assumption in practice. This study aims to investigate this assumption and empirically demonstrate that student models could contain redundancy, which can be removed through pruning without significant performance degradation. Therefore, we propose a novel pruning method to eliminate redundancy in student models. Instead of using traditional post-training pruning methods, we perform pruning during knowledge distillation (PDD) to prevent any loss of important information from the teacher models to the student models. This is achieved by designing a differentiable mask for each convolutional layer, which can dynamically adjust the channels to be pruned based on the loss. Experimental results show that with ResNet20 as the student model and ResNet56 as the teacher model, a 39.53%-FLOPs reduction was achieved by removing 32.77% of parameters, while the top-1 accuracy on CIFAR10 increased by 0.17%. With VGG11 as the student model and VGG16 as the teacher model, a 74.96%-FLOPs reduction was achieved by removing 76.43% of parameters, with only a loss of 1.34% in the top-1 accuracy on CIFAR10. Our code is available at https://github.com/YihangZhou0424/PDD-Pruning-during-distillation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

No datasets were generated or analyzed specifically for this study. All datasets used in this study are publicly available and can be accessed and downloaded by anyone from open-source repositories.

References

  1. Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B. Fitnets: hints for thin deep nets. Proc ICLR. 2015;2(3):1.

    Google Scholar 

  2. Belagiannis V, Farshad A, Galasso F. Adversarial network compression. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 2018;0–0

  3. Cai L, An Z, Yang C, Xu Y. Softer pruning, incremental regularization. In: 2020 25th international conference on pattern recognition (ICPR). 2021;224–230. IEEE

  4. Cai L, An Z, Yang C, Yan Y, Xu Y. Prior gradient mask guided pruning-aware fine-tuning. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022;36:140–148

  5. Covington P, Adams J, Sargin E. Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM conference on recommender systems. 2016;191–198

  6. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. 2009;248–255. Ieee

  7. Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. 2018 arXiv:1810.04805

  8. Dong X, Huang J, Yang Y, Yan S. More is less: a more complicated network with less inference complexity. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017;5840–5848

  9. Fang G, Ma X, Song M, Mi MB, Wang X. Depgraph: towards any structural pruning. 2023 arXiv:2301.12900

  10. Han S, Mao H, Dally WJ. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. 2015 arXiv:1510.00149

  11. Hassibi B, Stork DG, Wolff GJ. Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks. 1993;293–299. IEEE

  12. He Y, Kang G, Dong X, Fu Y, Yang Y. Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018;2234–2240

  13. He Y, Liu P, Wang Z, Hu Z, Yang Y. Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019;4340–4349

  14. He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision. 2017;1389–1397

  15. Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015arXiv:1503.02531

  16. Hu H, Bai S, Li A, Cui J, Wang L. Dense relation distillation with context-aware aggregation for few-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2021;10185–10194

  17. Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In: ECCV. 2018;304–320

  18. Jung S, Lee D, Park T, Moon T. Fair feature distillation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021;12115–12124

  19. Kim J, Park S. Kwak N. Paraphrasing complex network: Network compression via factor transfer. NeurIPS; 2018. p. 31.

    Google Scholar 

  20. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8):30–7.

    Article  Google Scholar 

  21. Kovaleva O, Romanov A, Rogers A, Rumshisky A. Revealing the dark secrets of Bert. 2019 arXiv:1908.08593

  22. Krizhevsky A. Learning multiple layers of features from tiny images. Citeseer: Tech. rep; 2009.

    Google Scholar 

  23. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.

    Article  Google Scholar 

  24. Lai KH, Zha D, Li Y, Hu X. Dual policy distillation. 2020 arXiv:2006.04061

  25. LeCun Y, Denker J, Solla S. Optimal brain damage. NeurIPS 1989:2

  26. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):788–91.

    Article  Google Scholar 

  27. Li H, Kadav A, Durdanovic I, Samet H, Graf HP. Pruning filters for efficient convnets. 2016arXiv:1608.08710

  28. Li L, Gan Z, Cheng Y, Liu J. Relation-aware graph attention network for visual question answering. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) October 2019

  29. Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L. Hrank: filter pruning using high-rank feature map. In: CVPR. 2020;1529–1538

  30. Lin S, Ji R, Yan C, Zhang B, Cao L, Ye Q, Huang F, Doermann D. Towards optimal structured CNN pruning via generative adversarial learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019;2790–2799

  31. Liu J, Tang J, Wu G. Residual feature distillation network for lightweight image super-resolution. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. 2020;41–55. Springer

  32. Liu J, Zhuang B, Zhuang Z, Guo Y, Huang J, Zhu J, Tan M. Discrimination-aware network pruning for deep model compression. IEEE Trans Pattern Anal Mach Intell. 2021;44(8):4035–51.

    Google Scholar 

  33. Lu Y, Yang W, Zhang Y, Chen Z, Chen J, Xuan Q, Wang Z, Yang, X. Understanding the dynamics of DNNs using graph modularity. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XII. 2022;225–242. Springer

  34. Niu W, Ma X, Lin S, Wang S, Qian X, Lin X, Wang Y, Ren B. PatDNN: achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 2020;907–922

  35. Papernot N, McDaniel P, Wu X, Jha S, Swami A. Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP). 2016;582–597. IEEE

  36. Park E, Ahn J, Yoo S. Weighted-entropy-based quantization for deep neural networks. In: CVPR. 2017;5456–5464

  37. Sanh V, Debut L, Chaumond J, Wolf T. Distilbert, a distilled version of Bert: smaller, faster, cheaper and lighter. 2019 arXiv:1910.01108

  38. Shan Y, Hoens TR, Jiao J, Wang H, Yu D, Mao J. Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016;255–262

  39. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. nature 2016;529(7587):484–489

  40. Tang H, Lu Y, Xuan Q. Sr-init: an interpretable layer pruning method. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023;1–5. IEEE

  41. Wu S, Rupprecht C, Vedaldi A. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) June 2020

  42. Yang C, An Z, Cai L, Xu Y. Hierarchical self-supervised augmented knowledge distillation. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI). 2021;1217–1223

  43. Yang C, An Z, Cai L, Xu Y. Knowledge distillation using hierarchical self-supervision augmented distribution. IEEE Transactions on Neural Networks and Learning Systems 2022;1–15

  44. Yang C, An Z, Cai L, Xu Y. Mutual contrastive learning for visual representation learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022;36:3045–3053

  45. Yang C, An Z, Li C, Diao B, Xu Y. Multi-objective pruning for CNNs using genetic algorithm. In: International Conference on Artificial Neural Networks. 2019;299–305. Springer

  46. Yang C, An Z, Xu Y. Multi-view contrastive learning for online knowledge distillation. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2021;3750–3754. IEEE

  47. Yang C, An Z, Zhou H, Cai L, Zhi X, Wu J, Xu Y, Zhang Q. Mixskd: self-knowledge distillation from mixup for image recognition. In: European Conference on Computer Vision. 2022;534–551. Springer

  48. Yang C, An Z, Zhou H, Zhuang F, Xu Y, Zhang Q. Online knowledge distillation via mutual contrastive learning for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2023;45(8):10212–27.

    Article  Google Scholar 

  49. Yang C, An Z, Zhu H, Hu X, Zhang K, Xu K, Li C, Xu Y. Gated convolutional networks with hybrid connectivity for image classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34 :12581–12588

  50. Yang C, Yu X, An Z, Xu Y. Categories of response-based, feature-based, and relation-based knowledge distillation. In: Advancements in Knowledge Distillation: Towards New Horizons of Intelligent Systems, 2023;1–32. Springer

  51. Yang C, Zhou H, An Z, Jiang X, Xu Y, Zhang Q. Cross-image relational knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022;12319–12328

  52. Yu R, Li A, Chen CF, Lai JH, Morariu VI, Han X, Gao M, Lin CY, Davis LS. NISP: pruning networks using neuron importance score propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018;9194–9203

  53. Zhang X, Xie W, Li Y, Lei J, Du Q. Filter pruning via learned representation median in the frequency domain. IEEE Transactions on Cybernetics. 2021;53(5):3165–75.

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the financial support for this research provided by the Beijing Natural Science Foundation under grant 4244098. This support has been invaluable to the completion of this work, enabling us to carry out our research with adequate resources and dedication. We extend our sincere thanks to the Foundation for their support and confidence in our project.

Author information

Authors and Affiliations

Authors

Contributions

Xi Dan, Wenjie Yang, Fuyan Zhang, Yihang Zhou, Zhuojun Yu, Zhen Qiu, Boyuan Zhao, Zeyu Dong, Libo Huang and Chuanguang Yang Xi Dan led the conception and design of the study, played a pivotal role in the development of the methodology, and conducted the primary experiments, making the greatest contribution to the work. Wenjie Yang and Boyuan Zhao contributed to the formulation of the experimental design, data analysis, and interpretation, and assisted significantly in drafting and revising the manuscript. Fuyan Zhang was involved in data collection, preprocessing, and in conducting supplementary experiments to validate the findings. Yihang Zhou and Zhen Qiu contributed to the development of the software used for data analysis, including coding and debugging, and participated in the preparation of the experimental setup. Zhuojun Yu provided substantial support in literature review, data interpretation, and was involved in critical discussions that shaped the research conclusions. Zeyu Dong and Zhen Qiu assisted in the technical aspects of the project, including software optimization and troubleshooting, and contributed to the manuscript by providing insights into the methodology. Libo Huang and Chuanguang Yang, as the corresponding author, was instrumental in overseeing the project, refining the research questions, and significantly enhancing the manuscript through critical revisions for important intellectual content. Libo Huang and Chuanguang Yang’s contributions were crucial in aligning the study with high standards of scientific rigor and ensuring the integrity and accuracy of the work. Funding Declaration: Beijing Natural Science Foundation, 4244098

Corresponding author

Correspondence to Chuanguang Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper. No financial or personal relationship with other people or organizations has influenced the work reported in this manuscript, titled “PDD: Pruning Neural Networks During Knowledge Distillation,” Submission ID dccb382d-1a4b-4ee3-a28e-dd6137112c47. This research is conducted purely from an academic perspective, and any affiliations or financial supports are disclosed in the manuscript. Our study has been carried out with a commitment to transparency and honesty in all our research findings and discussions related to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dan, X., Yang, W., Zhang, F. et al. PDD: Pruning Neural Networks During Knowledge Distillation. Cogn Comput 16, 3457–3467 (2024). https://doi.org/10.1007/s12559-024-10350-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-024-10350-9

Keywords

Navigation