Abstract
Deep convolutional neural networks with high performance are hard to be deployed in many real world applications, since the computing resources of edge devices such as smart phones or embedded GPU are limited. To alleviate this hardware limitation, the compression of deep neural networks from the model side becomes important. As one of the most popular methods in the spotlight, channel pruning of the deep convolutional model can effectively remove redundant convolutional channels from the CNN (convolutional neural network) without affecting the network’s performance remarkably. Existing methods focus on pruning design, evaluating the importance of different convolutional filters in the CNN model. A fast and effective fine-tuning method to restore accuracy is urgently needed. In this paper, we propose a fine-tuning method KDFT (Knowledge Distillation Based Fine-Tuning), which improves the accuracy of fine-tuned models with almost negligible training overhead by introducing knowledge distillation. Extensive experimental results on benchmark datasets with representative CNN models show that up to 4.86% accuracy improvement and 79% time saving can be obtained.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90. DOI: https://doi.org/10.1145/3065386.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.1–9. DOI: https://doi.org/10.1109/cvpr.2015.7298594.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: https://doi.org/10.1109/cvpr.2016.90.
Niyaz U, Bathula D R. Augmenting knowledge distillation with peer-to-peer mutual learning for model compression. In Proc. the 19th International Symposium on Biomedical Imaging, Mar. 2022, pp.1–4. DOI: https://doi.org/10.1109/IS-BI52829.2022.9761511.
Morikawa T, Kameyama K. Multi-stage model compression using teacher assistant and distillation with hint-based training. In Proc. the 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events, Mar. 2022, pp.484–490. DOI: https://doi.org/10.1109/PerComWorkshops53856.2022.9767229.
Chen P, Liu S, Zhao H, Jia J. Distilling knowledge via knowledge review. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.5006–5015. DOI: https://doi.org/10.1109/cvpr46437.2021.00497.
Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.779–788. DOI: https://doi.org/10.1109/CVPR.2016.91.
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. DOI: https://doi.org/10.1109/tpami.2016.2577031.
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. DOI: https://doi.org/10.1109/tpami.2016.2572683.
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. DOI: https://doi.org/10.1109/tpami.2017.2699184.
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.
Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.1387–1395.
Wen W, Wu C, Wang Y, Chen Y, Li H. Learning structured sparsity in deep neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.2074–2082.
Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y. Compressing neural networks with the hashing trick. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.2285–2294.
Denton E, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.1269–1277.
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L. HRank: Filter pruning using high-rank feature map. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1526–1535. DOI: https://doi.org/10.1109/cvpr42600.2020.00160.
Gao S, Huang F, Pei J, Huang H. Discrete model compression with resource constraint for deep neural networks. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1896–1905. DOI: https://doi.org/10.1109/cvpr42600.2020.00197.
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: https://doi.org/10.1109/iccv.2017.298.
He Y, Lin J, Liu Z, Wang H, Li L J, Han S. AMC: AutoML for model compression and acceleration on mobile devices. In Proc the 15th European Conference on Computer Vision, Sept. 2018, pp.815–832. DOI: https://doi.org/10.1007/978-3-030-01234-2_48.
Le Cun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the 2nd International Conference on Neural Information Processing Systems, Jan. 1989, pp.598–605.
Lawson C L, Hanson R J, Kincaid D R, Krogh F T. Basic linear algebra subprograms for Fortran usage. ACM Trans. Mathematical Software (TOMS), 1799, 5(3): 308–323. DOI: https://doi.org/10.1145/355841.355847.
Denil M, Shakibi B, Dinh L, Ranzato M, de Freitas N. Predicting parameters in deep learning. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.2148–2156.
Jiang W, Wang W, Liu S. Structured weight unification and encoding for neural network compression and acceleration. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2020, pp.3068–3076. DOI: https://doi.org/10.1109/cvprw50498.2020.00365.
Hu H, Peng R, Tai Y W, Tang C K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv: 1607.03250, 2016. https://arxiv.org/abs/1607.03250, Sept. 2024.
Guo J, Ouyang W, Xu D. Multi-dimensional pruning: A unified framework for model compression. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1505–1514. DOI: https://doi.org/10.1109/cvpr42600.2020.00158.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Sept. 2024.
Krizhevsky A. Learning multiple layers of features from tiny images. [Master Thesis], University of Toronto, 2009. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, Sept. 2024.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2015. https://arxiv.org/abs/1409.1556, Sept. 2024.
Srinivas S, Subramanya A, Venkatesh Babu R. Training sparse neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jul. 2017. pp.455–462. DOI: https://doi.org/10.1109/cvprw.2017.61.
Gardner M W, Dorling S R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmospheric Environment, 1998, 32(14/15): 2627–2636. DOI: https://doi.org/10.1016/S1352-2310(97)00447-0.
Chin T W, Ding R, Zhang C, Marculescu D. Towards efficient model compression via learned global ranking. In Proc the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020. pp.1515–1525. DOI: https://doi.org/10.1109/cvpr42600.2020.00159.
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitnEts: Hints for thin deep Nets. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Yim J, Joo D, Bae J, Kim J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.7130–7138. DOI: https://doi.org/10.1109/cvpr.2017.754.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018. pp.4510–4520. DOI: https://doi.org/10.1109/cvpr.2018.00474.
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in Pytorch. In Proc. the 31st Conference on Neural Information Processing Systems, Dec. 2017.
Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 9511, 22(3): 400–407. DOI: https://doi.org/10.1214/aoms/1177729586.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interest The authors declare that they have no conflict of interest.
Additional information
This work was supported by the National Natural Science Foundation of China under Grant No. U1866602.
Chong Zhang is currently pursuing his Ph.D. degree in advanced manufacturing in the Faculty of Computing, Harbin Institute of Technology, Harbin. His research interests focus on deep model compression, development, and acceleration of neural networks on industrial applications, such as object detection.
Hong-Zhi Wang is a full professor in the Faculty of Computing, Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2008. His research fields include big data management and analysis, database systems, knowledge engineering, and data quality.
Hong-Wei Liu is a full professor in the Faculty of Computing, Harbin Institute of Technology, Harbin. He received his Ph.D. degree in computer science and technology from Harbin Institute of Technology, Harbin, in 2004. His research interests mainly include computer system structure, cloud computing, and internet of things.
Yi-Lin Chen is currently an undergraduate student in Harbin Institute of Technology, Harbin. His research interests focus on the development of model compression and industrial applications of neural networks, such as object detection and instance segmentation.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Zhang, C., Wang, HZ., Liu, HW. et al. Fine-Tuning Channel-Pruned Deep Model via Knowledge Distillation. J. Comput. Sci. Technol. 39, 1238–1247 (2024). https://doi.org/10.1007/s11390-023-2386-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-023-2386-8