Elsevier

Neurocomputing

Volume 507, 1 October 2022, Pages 97-106
Neurocomputing

Channel pruning based on convolutional neural network sensitivity

https://doi.org/10.1016/j.neucom.2022.07.051Get rights and content

Abstract

Pruning is a useful technique for decreasing the memory consumption and floating point operations (FLOPs) of deep convolutional neural network (CNN) models. Nevertheless, at modest pruning levels, current structured pruning approaches often lead to considerable declines in accuracy. Furthermore, existing approaches often treat pruning rates as super parameters, neglecting the sensitivity of different convolution layers. In this study, we propose a novel sensitivity-based method for channel pruning that utilizes second-order sensitivity as a criterion. The essential concept is to prune insensitive filters while retaining sensitive ones. We quantify the sensitivity of the filter using the sum of the sensitivities of all weights in the filter, rather than the magnitude-based metric frequently applied in the literature. Furthermore, a layer sensitivity approach based on the Hessian eigenvalues of each layer is introduced into the process of automatically choosing the most appropriate pruning rate for each layer. Experiments on a variety of modern CNN architectures demonstrate that we can considerably enhance the pruning rate while sacrificing a small amount of accuracy, resulting in a reduction of more than 60% in FLOPs on CIFAR-10. Notably, on ImageNet, pruning based on ResNet50 decreased the FLOPs by 56.3% while losing only 0.92% of accuracy.

Introduction

Convolutional neural networks (CNNs) have advanced rapidly in the recent years and have become the dominant technique in several fields such as computer vision [62], [63], natural language processing [64], [65], and voice recognition. However, to achieve better performance, larger and deeper CNNs are required. Furthermore, because of the extreme size of CNN models, they are difficult to deploy in numerous resource-constrained edge devices, such as mobile processors and robots, which demand real-time inference with limited memory availability. Many alternative methods have been presented to compress redundant CNNs to make the computation more efficient and faster. These studies can be broadly classified into the following categories: network quantization [1], [2], matrix decomposition [3], [4], pruning [5], [6], distillation [7], [8], and others. Among these techniques, pruning is one of the most popular ones and has received considerable attention.

Pruning is typically classified into two categories: weight (unstructured) [9] and channel (structured) [10], [11], [12]. Weight pruning removes specific weights from filters with no structure, resulting in an excessive number of sparse matrix operations. Without specialized hardware or software, unstructured sparsities may hinder the acceleration of trimmed CNNs and obtainment of significant performance gains.

In contrast, the goal of channel pruning is to eliminate the entire selected filters and corresponding channels in each layer, resulting in a model with regular structures. Among some popular channel pruning methods, layer-by-layer methods are often used in the channel pruning process. In each layer, informative channels are selected and the least significant channels are pruned by minimizing the reconstruction error of the next layer or some other criterion. Then, the parameters in all the layers are simultaneously adjusted to restore the model accuracy by fine-tuning the network through retraining. These methods are considerably faster because they prune the channels layer-by-layer, and the fine-tuning process is performed only once. Consequently, channel pruning is one of the most effective tools for accelerating networks and reducing model size. It is more economical and flexible to achieve a large compression ratio without sacrificing performance.

For both weight and channel pruning, the most essential task is to discover and remove the least important weights or filters. Optimal brain damage (OBD) [13] and optimal brain surgeon (OBS) [14] are foundational techniques that use the Taylor series to estimate the change in loss function after each weight is removed and prune the weights that lead to minimal loss change. These traditional pruning strategies are used to minimize overfitting in shallow neural networks. In a recent study [15], this concept was successfully applied in deep neural networks. The authors proposed L-OBS, a layer-wise pruning approach to prune the weights for each layer using second-order derivatives.

The three methods described above are all weight pruning strategies, which result in unstructured sparsities and inefficiency. To solve this issue, we propose a novel sensitivity-based channel pruning solution that leverages second-order information to determine the sensitivity of each filter and prune filters with low sensitivity.

The main contributions of our study are summarized as follows:

1. We propose a novel modification to OBS, in which the sensitivity of a filter is measured by the total second-order sensitivity of all its weights. Additionally, by estimating the sensitivity of all filters in a layer, we can eliminate insensitive filters while retaining those with a relatively higher sensitivity;

2. We employ a layer-wise strategy to estimate the Hessian matrix and prune filters layer-by-layer to decrease the computational complexity. To produce better results, we adopt an adaptive technique to automatically determine the pruning rate of each layer;

3. The proposed method was tested on various network structures and datasets. Our experimental results show that our method achieves state-of-the-art performance. To demonstrate the effectiveness of our approach, we also conducted a comprehensive ablation experiment.

The remainder of this paper is organized as follows. In Section 2, we cover related efforts in model pruning. A detailed description of our approach is provided in Section 3. Our results and ablation experiments are described in Section 4. Finally, we present our conclusions in Section 5.

Section snippets

Related work

Network pruning has been a topic of discussion for many years, dating back to the 1990s [13], [16]. The works of [5], [17], [18] are among the most well-known early efforts in the field of deep neural networks [58], [59], in which weights were pruned and reasonable results were achieved throughout the period of deep neural networks. Weight pruning, as previously stated, results in unstructured sparsities in a network, hindering its use without specialized software and hardware [19].

Methodology

Our approach was developed based on previous works on OBD [13], OBS [30], and L-OBS [15], in which the perturbation for each weight had to be measured separately and the weights pruned with the smallest perturbation. These methods cause unstructured pruning, which is difficult to accelerate. To overcome this limitation, we propose a method for grouping the weights and measuring the relevant perturbation when that group is pruned, because pruning a set of weights, such as weights in the filter

Experimental settings

We evaluated our approach on the popular VGG-16 [34] and ResNet [35] using two frequently utilized benchmark datasets, ImageNet [36] and CIFAR-10 [37]. We tested the performance of our pruning method with VGG-16 and ResNet56 on the CIFAR-10 dataset and that of ResNet50 and ResNet34 on ImageNet, utilizing the accuracy decrease and FLOP reduction as metrics. The objective was to achieve better accuracy with fewer floating point operations (FLOPs).

Datasets. We conducted experiments on both small

Conclusion

In many cases, existing structured pruning approaches cause considerable accuracy deterioration when applied to moderate pruning levels. To overcome this, we proposed a novel second-order sensitivity structured pruning approach that leverages the sum of the sensitivity of the weights in filters as the metric for pruning filters in a CNN model. The fundamental notion is to determine the number of filters that should be pruned in each layer and then prune insensitive elements by applying the

CRediT authorship contribution statement

Chenbin Yang: Conceptualization, Methodology, Software, Validation, Writing – original draft. Huiyi Liu: Conceptualization, Methodology, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The research is supported by the National Key Research and Development Program of China (No. 2019YFE0105200).

Chenbin Yang He received his B.Sc. degree from Huaiyin Normal University, China in 2015, and received his M.Sc. degree from Nantong University, China in 2019. Now, he is pursuing his Ph.D. degree in the college of Computer and Information, Hohai University. His research interests are network pruning and machine learning.

References (65)

  • T. Liu et al.

    NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom

    Neurocomputing

    (2021)
  • I. Hubara et al.

    Quantized neural networks: Training neural networks with low precision weights and activations

    J. Mach. Learn. Res.

    (2017)
  • Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko,...
  • Xiyu Yu, Tongliang Liu, Xinchao Wang, Dacheng Tao, On compressing deep models by low rank and sparse decomposition, in:...
  • M. Astrid et al.

    CP-decomposition with tensor power method for convolutional neural networks compression

  • Song Han, Jeff Pool, John Tran, William J. Dally, Learning both weights and connections for efficient neural networks....
  • Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P, Namboodiri. Play and prune: Adaptive filter pruning for deep...
  • Geoffrey Hinton, Oriol Vinyals, Jeff Dean, Distilling the knowledge in a neural network. arXiv preprint...
  • Hongxu Yin, Pavlo Molchanov, Jose M Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K Jha, Jan Kautz, Dreaming to...
  • Xia Xiao, Zigeng Wang, Sanguthevar Rajasekaran, Autoprune: Automatic network pruning by regularizing auxiliary...
  • Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han, AMC: Automl for model compression and acceleration on...
  • Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S Davis,...
  • Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian, Variational convolutional neural network...
  • Yann LeCun, John S Denker, Sara A Solla, Optimal brain damage, in: Advances in Neural Information Processing Systems,...
  • Babak Hassibi David G Stork, Second order derivatives for network pruning: Optimal brain surgeon, in: Advances in...
  • Xin Dong, Shangyu Chen, Sinno Pan, Learning to prune deep neural networks via layer-wise optimal brain surgeon, in:...
  • Stephen Jose Hanson, Lorien Y Pratt, Comparing biases for minimal network construction with back-propagation, in:...
  • Song Han, Huizi Mao, William J Dally, Deep compression: Compressing deep neural networks with pruning, trained...
  • Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, Yixin Chen, Compressing neural networks with the hashing...
  • Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, William J Dally, EIE: efficient inference...
  • Jian-Hao Luo, Jianxin Wu, Weiyao Lin, Thinet: A filter level pruning method for deep neural network compression, in:...
  • Yihui He, Xiangyu Zhang, Jian Sun, Channel pruning for accelerating very deep neural networks, in: Proceedings of the...
  • Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf, Pruning filters for efficient convnets. arXiv...
  • A. Polyak et al.

    Channel-level acceleration of deep face representations

    IEEE Access

    (2015)
  • Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, Network trimming: A data-driven neuron pruning approach towards...
  • Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, Changshui Zhang, Learning efficient convolutional...
  • Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang, Soft filter pruning for accelerating deep convolutional neural...
  • Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao, Hrank: Filter pruning using...
  • Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell, Rethinking the value of network pruning. arXiv...
  • Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang, Filter pruning via geometric median for deep convolutional...
  • Babak Hassibi, David G Stork, Gregory J Wolff, Optimal brain surgeon and general network pruning, in: Proceedings of...
  • Babak Hassibi, David G Stork, Second order derivatives for network pruning: Optimal brain surgeon, in: Advances in...
  • Cited by (16)

    • Multiple hierarchical compression for deep neural network toward intelligent bearing fault diagnosis

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Network compression techniques have been extensively researched and achieved remarkable success in image, audio, video, language text processing, and other related fields (Choudhary et al., 2020; Deng et al., 2020). Currently, compressions for deep neural networks mainly involve network pruning (Yang and Liu, 2022), parameter quantization (Choi et al., 2016), tensor decomposition (Nekooei and Safari, 2022), knowledge distillation (Ji et al., 2022; Li et al., 2021b), and lightweight network design (Zhong et al., 2022; Zhang et al., 2018). Network pruning is one of the widely used network compression approaches, which can simultaneously prune the convolutional (Conv) layers and fully connected (FC) layers.

    View all citing articles on Scopus

    Chenbin Yang He received his B.Sc. degree from Huaiyin Normal University, China in 2015, and received his M.Sc. degree from Nantong University, China in 2019. Now, he is pursuing his Ph.D. degree in the college of Computer and Information, Hohai University. His research interests are network pruning and machine learning.

    Huiyi Liu He received the B.Sc. degree in mechanics from Xi’an Technological University, Xi’an ,China, in 1983, and the M.Sc. degree in graphics from Huazhong Science and Technology University, Wuhan, China, in 1987, and the Ph.D. degree in water transportation simulation from Hohai University, Nanjing, China, in 2004. He is currently a Full Professor with the college of Computer and Information, Hohai University. He has more than 30 years of research experience in computer graphics, virtual reality, pattern recognition and intelligent systems. He has hosted/involved several research projects supported by NSFC, MOST, and ministry of water resources, etc.

    View full text