Skip to main content

Advertisement

Log in

Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation

  • Regular Paper
  • Artificial Intelligence and Pattern Recognition
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Deep convolutional neural networks with high performance are hard to be deployed in many real world applications, since the computing resources of edge devices such as smart phones or embedded GPU are limited. To alleviate this hardware limitation, the compression of deep neural networks from the model side becomes important. As one of the most popular methods in the spotlight, channel pruning of the deep convolutional model can effectively remove redundant convolutional channels from the CNN (convolutional neural network) without affecting the network’s performance remarkably. Existing methods focus on pruning design, evaluating the importance of different convolutional filters in the CNN model. A fast and effective fine-tuning method to restore accuracy is urgently needed. In this paper, we propose a fine-tuning method KDFT (Knowledge Distillation Based Fine-Tuning), which improves the accuracy of fine-tuned models with almost negligible training overhead by introducing knowledge distillation. Extensive experimental results on benchmark datasets with representative CNN models show that up to 4.86% accuracy improvement and 79% time saving can be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90. DOI: https://doi.org/10.1145/3065386.

    Article  MATH  Google Scholar 

  2. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.1–9. DOI: https://doi.org/10.1109/cvpr.2015.7298594.

    Google Scholar 

  3. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: https://doi.org/10.1109/cvpr.2016.90.

    MATH  Google Scholar 

  4. Niyaz U, Bathula D R. Augmenting knowledge distillation with peer-to-peer mutual learning for model compression. In Proc. the 19th International Symposium on Biomedical Imaging, Mar. 2022, pp.1–4. DOI: https://doi.org/10.1109/IS-BI52829.2022.9761511.

    MATH  Google Scholar 

  5. Morikawa T, Kameyama K. Multi-stage model compression using teacher assistant and distillation with hint-based training. In Proc. the 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events, Mar. 2022, pp.484–490. DOI: https://doi.org/10.1109/PerComWorkshops53856.2022.9767229.

    MATH  Google Scholar 

  6. Chen P, Liu S, Zhao H, Jia J. Distilling knowledge via knowledge review. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.5006–5015. DOI: https://doi.org/10.1109/cvpr46437.2021.00497.

    MATH  Google Scholar 

  7. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.779–788. DOI: https://doi.org/10.1109/CVPR.2016.91.

    Google Scholar 

  8. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. DOI: https://doi.org/10.1109/tpami.2016.2577031.

    Article  MATH  Google Scholar 

  9. Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. DOI: https://doi.org/10.1109/tpami.2016.2572683.

    Article  MATH  Google Scholar 

  10. Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. DOI: https://doi.org/10.1109/tpami.2017.2699184.

    Article  MATH  Google Scholar 

  11. Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.

    Google Scholar 

  12. Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1387–1395.

    MATH  Google Scholar 

  13. Wen W, Wu C, Wang Y, Chen Y, Li H. Learning structured sparsity in deep neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2074–2082.

    MATH  Google Scholar 

  14. Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y. Compressing neural networks with the hashing trick. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.2285–2294.

    MATH  Google Scholar 

  15. Denton E, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.1269–1277.

    Google Scholar 

  16. Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L. HRank: Filter pruning using high-rank feature map. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1526–1535. DOI: https://doi.org/10.1109/cvpr42600.2020.00160.

    MATH  Google Scholar 

  17. Gao S, Huang F, Pei J, Huang H. Discrete model compression with resource constraint for deep neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1896–1905. DOI: https://doi.org/10.1109/cvpr42600.2020.00197.

    MATH  Google Scholar 

  18. Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: https://doi.org/10.1109/iccv.2017.298.

    MATH  Google Scholar 

  19. He Y, Lin J, Liu Z, Wang H, Li L J, Han S. AMC: AutoML for model compression and acceleration on mobile devices. In Proc the 15th European Conference on Computer Vision, Sept. 2018, pp.815–832. DOI: https://doi.org/10.1007/978-3-030-01234-2_48.

    MATH  Google Scholar 

  20. Le Cun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the 2nd International Conference on Neural Information Processing Systems, Jan. 1989, pp.598–605.

    MATH  Google Scholar 

  21. Lawson C L, Hanson R J, Kincaid D R, Krogh F T. Basic linear algebra subprograms for Fortran usage. ACM Trans. Mathematical Software (TOMS), 1799, 5(3): 308–323. DOI: https://doi.org/10.1145/355841.355847.

    Article  MATH  Google Scholar 

  22. Denil M, Shakibi B, Dinh L, Ranzato M, de Freitas N. Predicting parameters in deep learning. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.2148–2156.

    Google Scholar 

  23. Jiang W, Wang W, Liu S. Structured weight unification and encoding for neural network compression and acceleration. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2020, pp.3068–3076. DOI: https://doi.org/10.1109/cvprw50498.2020.00365.

    MATH  Google Scholar 

  24. Hu H, Peng R, Tai Y W, Tang C K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250, 2016. https://arxiv.org/abs/1607.03250, Sept. 2024.

  25. Guo J, Ouyang W, Xu D. Multi-dimensional pruning: A unified framework for model compression. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1505–1514. DOI: https://doi.org/10.1109/cvpr42600.2020.00158.

    MATH  Google Scholar 

  26. Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Sept. 2024.

  27. Krizhevsky A. Learning multiple layers of features from tiny images. [Master Thesis], University of Toronto, 2009. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, Sept. 2024.

    MATH  Google Scholar 

  28. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2015. https://arxiv.org/abs/1409.1556, Sept. 2024.

  29. Srinivas S, Subramanya A, Venkatesh Babu R. Training sparse neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jul. 2017. pp.455–462. DOI: https://doi.org/10.1109/cvprw.2017.61.

    MATH  Google Scholar 

  30. Gardner M W, Dorling S R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmospheric Environment, 1998, 32(14/15): 2627–2636. DOI: https://doi.org/10.1016/S1352-2310(97)00447-0.

    Article  MATH  Google Scholar 

  31. Chin T W, Ding R, Zhang C, Marculescu D. Towards efficient model compression via learned global ranking. In Proc the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020. pp.1515–1525. DOI: https://doi.org/10.1109/cvpr42600.2020.00159.

    MATH  Google Scholar 

  32. Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitnEts: Hints for thin deep Nets. In Proc. the 3rd International Conference on Learning Representations, May 2015.

    Google Scholar 

  33. Yim J, Joo D, Bae J, Kim J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.7130–7138. DOI: https://doi.org/10.1109/cvpr.2017.754.

    MATH  Google Scholar 

  34. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018. pp.4510–4520. DOI: https://doi.org/10.1109/cvpr.2018.00474.

    Google Scholar 

  35. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in Pytorch. In Proc. the 31st Conference on Neural Information Processing Systems, Dec. 2017.

    Google Scholar 

  36. Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 9511, 22(3): 400–407. DOI: https://doi.org/10.1214/aoms/1177729586.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hong-Wei Liu  (刘宏伟) or Yi-Lin Chen  (陈熠琳).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant No. U1866602.

Chong Zhang is currently pursuing his Ph.D. degree in advanced manufacturing in the Faculty of Computing, Harbin Institute of Technology, Harbin. His research interests focus on deep model compression, development, and acceleration of neural networks on industrial applications, such as object detection.

Hong-Zhi Wang is a full professor in the Faculty of Computing, Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2008. His research fields include big data management and analysis, database systems, knowledge engineering, and data quality.

Hong-Wei Liu is a full professor in the Faculty of Computing, Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2004. His research interests mainly include computer system structure, cloud computing, and internet of things.

Yi-Lin Chen is currently an undergraduate student in Harbin Institute of Technology, Harbin. His research interests focus on the development of model compression and industrial applications of neural networks, such as object detection and instance segmentation.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Wang, HZ., Liu, HW. et al. Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation. J. Comput. Sci. Technol. 39, 1238–1247 (2024). https://doi.org/10.1007/s11390-023-2386-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-023-2386-8

Keywords