Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation

Zhang, Chong; Wang, Hong-Zhi; Liu, Hong-Wei; Chen, Yi-Lin

doi:10.1007/s11390-023-2386-8

Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation

Regular Paper
Artificial Intelligence and Pattern Recognition
Published: 16 January 2025

Volume 39, pages 1238–1247, (2024)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Chong Zhang (张翀)¹,
Hong-Zhi Wang (王宏志)¹,
Hong-Wei Liu (刘宏伟)¹ &
…
Yi-Lin Chen (陈熠琳)²

90 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep convolutional neural networks with high performance are hard to be deployed in many real world applications, since the computing resources of edge devices such as smart phones or embedded GPU are limited. To alleviate this hardware limitation, the compression of deep neural networks from the model side becomes important. As one of the most popular methods in the spotlight, channel pruning of the deep convolutional model can effectively remove redundant convolutional channels from the CNN (convolutional neural network) without affecting the network’s performance remarkably. Existing methods focus on pruning design, evaluating the importance of different convolutional filters in the CNN model. A fast and effective fine-tuning method to restore accuracy is urgently needed. In this paper, we propose a fine-tuning method KDFT (Knowledge Distillation Based Fine-Tuning), which improves the accuracy of fine-tuned models with almost negligible training overhead by introducing knowledge distillation. Extensive experimental results on benchmark datasets with representative CNN models show that up to 4.86% accuracy improvement and 79% time saving can be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90. DOI: https://doi.org/10.1145/3065386.
Article MATH Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.1–9. DOI: https://doi.org/10.1109/cvpr.2015.7298594.
Google Scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: https://doi.org/10.1109/cvpr.2016.90.
MATH Google Scholar
Niyaz U, Bathula D R. Augmenting knowledge distillation with peer-to-peer mutual learning for model compression. In Proc. the 19th International Symposium on Biomedical Imaging, Mar. 2022, pp.1–4. DOI: https://doi.org/10.1109/IS-BI52829.2022.9761511.
MATH Google Scholar
Morikawa T, Kameyama K. Multi-stage model compression using teacher assistant and distillation with hint-based training. In Proc. the 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events, Mar. 2022, pp.484–490. DOI: https://doi.org/10.1109/PerComWorkshops53856.2022.9767229.
MATH Google Scholar
Chen P, Liu S, Zhao H, Jia J. Distilling knowledge via knowledge review. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.5006–5015. DOI: https://doi.org/10.1109/cvpr46437.2021.00497.
MATH Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.779–788. DOI: https://doi.org/10.1109/CVPR.2016.91.
Google Scholar
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. DOI: https://doi.org/10.1109/tpami.2016.2577031.
Article MATH Google Scholar
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. DOI: https://doi.org/10.1109/tpami.2016.2572683.
Article MATH Google Scholar
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. DOI: https://doi.org/10.1109/tpami.2017.2699184.
Article MATH Google Scholar
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.
Google Scholar
Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1387–1395.
MATH Google Scholar
Wen W, Wu C, Wang Y, Chen Y, Li H. Learning structured sparsity in deep neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2074–2082.
MATH Google Scholar
Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y. Compressing neural networks with the hashing trick. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.2285–2294.
MATH Google Scholar
Denton E, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.1269–1277.
Google Scholar
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L. HRank: Filter pruning using high-rank feature map. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1526–1535. DOI: https://doi.org/10.1109/cvpr42600.2020.00160.
MATH Google Scholar
Gao S, Huang F, Pei J, Huang H. Discrete model compression with resource constraint for deep neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1896–1905. DOI: https://doi.org/10.1109/cvpr42600.2020.00197.
MATH Google Scholar
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: https://doi.org/10.1109/iccv.2017.298.
MATH Google Scholar
He Y, Lin J, Liu Z, Wang H, Li L J, Han S. AMC: AutoML for model compression and acceleration on mobile devices. In Proc the 15th European Conference on Computer Vision, Sept. 2018, pp.815–832. DOI: https://doi.org/10.1007/978-3-030-01234-2_48.
MATH Google Scholar
Le Cun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the 2nd International Conference on Neural Information Processing Systems, Jan. 1989, pp.598–605.
MATH Google Scholar
Lawson C L, Hanson R J, Kincaid D R, Krogh F T. Basic linear algebra subprograms for Fortran usage. ACM Trans. Mathematical Software (TOMS), 1799, 5(3): 308–323. DOI: https://doi.org/10.1145/355841.355847.
Article MATH Google Scholar
Denil M, Shakibi B, Dinh L, Ranzato M, de Freitas N. Predicting parameters in deep learning. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.2148–2156.
Google Scholar
Jiang W, Wang W, Liu S. Structured weight unification and encoding for neural network compression and acceleration. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2020, pp.3068–3076. DOI: https://doi.org/10.1109/cvprw50498.2020.00365.
MATH Google Scholar
Hu H, Peng R, Tai Y W, Tang C K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250, 2016. https://arxiv.org/abs/1607.03250, Sept. 2024.
Guo J, Ouyang W, Xu D. Multi-dimensional pruning: A unified framework for model compression. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1505–1514. DOI: https://doi.org/10.1109/cvpr42600.2020.00158.
MATH Google Scholar
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Sept. 2024.
Krizhevsky A. Learning multiple layers of features from tiny images. [Master Thesis], University of Toronto, 2009. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, Sept. 2024.
MATH Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2015. https://arxiv.org/abs/1409.1556, Sept. 2024.
Srinivas S, Subramanya A, Venkatesh Babu R. Training sparse neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jul. 2017. pp.455–462. DOI: https://doi.org/10.1109/cvprw.2017.61.
MATH Google Scholar
Gardner M W, Dorling S R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmospheric Environment, 1998, 32(14/15): 2627–2636. DOI: https://doi.org/10.1016/S1352-2310(97)00447-0.
Article MATH Google Scholar
Chin T W, Ding R, Zhang C, Marculescu D. Towards efficient model compression via learned global ranking. In Proc the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020. pp.1515–1525. DOI: https://doi.org/10.1109/cvpr42600.2020.00159.
MATH Google Scholar
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitnEts: Hints for thin deep Nets. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Google Scholar
Yim J, Joo D, Bae J, Kim J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.7130–7138. DOI: https://doi.org/10.1109/cvpr.2017.754.
MATH Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018. pp.4510–4520. DOI: https://doi.org/10.1109/cvpr.2018.00474.
Google Scholar
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in Pytorch. In Proc. the 31st Conference on Neural Information Processing Systems, Dec. 2017.
Google Scholar
Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 9511, 22(3): 400–407. DOI: https://doi.org/10.1214/aoms/1177729586.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
Chong Zhang (张翀), Hong-Zhi Wang (王宏志) & Hong-Wei Liu (刘宏伟)
School of Astronautics, Harbin Institute of Technology, Harbin, 150001, China
Yi-Lin Chen (陈熠琳)

Authors

Chong Zhang (张翀)
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Zhi Wang (王宏志)
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Wei Liu (刘宏伟)
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Lin Chen (陈熠琳)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hong-Wei Liu (刘宏伟) or Yi-Lin Chen (陈熠琳).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant No. U1866602.

Chong Zhang is currently pursuing his Ph.D. degree in advanced manufacturing in the Faculty of Computing, Harbin Institute of Technology, Harbin. His research interests focus on deep model compression, development, and acceleration of neural networks on industrial applications, such as object detection.

Hong-Zhi Wang is a full professor in the Faculty of Computing, Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2008. His research fields include big data management and analysis, database systems, knowledge engineering, and data quality.

Hong-Wei Liu is a full professor in the Faculty of Computing, Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2004. His research interests mainly include computer system structure, cloud computing, and internet of things.

Yi-Lin Chen is currently an undergraduate student in Harbin Institute of Technology, Harbin. His research interests focus on the development of model compression and industrial applications of neural networks, such as object detection and instance segmentation.

Electronic Supplementary Material