Channel pruning based on mean gradient for accelerating Convolutional Neural Networks

doi:10.1016/j.sigpro.2018.10.019

Signal Processing

Volume 156, March 2019, Pages 84-91

https://doi.org/10.1016/j.sigpro.2018.10.019 Get rights and content

Highlights

•
Channel pruning is applied to reducing huge memory consumption and high computational complexity of convolutional neural networks.
•
New pruning criterion based on mean gradient does well in measure the importance of channels in network performance.
•
Hierarchical global pruning strategy, which improves global pruning strategy, achieves significant reduction in Float Point Operations of networks.

Abstract

Convolutional Neural Networks (CNNs) are getting deeper and wider to improve their performance and thus increase their computational complexity. We apply channel pruning methods to accelerate CNNs and reduce its computational consumption. A new pruning criterion is proposed based on the mean gradient for convolutional kernels. To significantly reduce Float Point Operations (FLOPs) of CNNs, a hierarchical global pruning strategy is introduced. In each pruning step, the importance of convolutional kernels is evaluated by the mean gradient criterion. Hierarchical global pruning strategy is adopted to remove less important kernels, and get a smaller CNN model. Finally we fine-tune the model to restore network performance. Experimental results show that VGG-16 network pruned by channel pruning on CIFAR-10 achieves 5.64 ×  reduction in FLOPs with less than 1% decrease in accuracy. Meanwhile ResNet-110 network pruned on CIFAR-10 achieves 2.48 ×  reduction in FLOPs and parameters with only 0.08% decrease in accuracy.

Introduction

Convolution neural networks (CNNs) have achieved remarkable success in various recognition tasks [1], [2], [3], especially in computer vision [4], [5], [6]. CNNs have achieved state-of-the-art performance in these fields compared with traditional methods based on manually designed visual features [7]. However, these deep neural networks have a huge number of parameters. For example, AlexNet [4] network contains about 6 × 10⁶ parameters, while a better performance network such as VGG [6] network contains about 1.44 × 10⁸ parameters, which causes higher memory and computational costs. For instance, VGG-16 model takes up more than 500MB storage space and needs 1.56 × 10¹⁰ Float Point Operations (FLOPs) to classify a single image. The huge memory and high computational costs of CNNs restrict the application of deep learning on mobile devices with limited resources [8]. What’s more, deep learning models are known to be over-parameterized [9]. Denil et al. [10] pointed out that deep neural networks can be reconstructed by a subset of network parameters without affecting network performance, which means that there are a huge number of redundant connections in neural network models and we can reduce the memory and computational costs by pruning and compressing such connections [11], [12].

The huge memory consumption and high computational complexity of deep neural networks drive the research of compression [13], [14] and acceleration algorithms [15], [16], and pruning [17] is one of effective methods. In the 1990s, LeCun et al. [18] introduced the Optimal Brain Damage pruning strategy, they had observed that several unimportant weight connections could be safely removed from a well-trained network with negligible impact on network performance. Hassibi et al. [19] proposed a similar Optimal Brain Surgeon pruning strategy and pointed out that the importance of weight was determined by the second derivative. However, these two methods needed to calculate Hessian matrix, which increased the memory consumption and computational complexity of network model. Recently, Han et al. [20], [21] reported impressive compression rates and effective decrease of the number of parameters on AlexNet network and VGG Network by pruning weight connections with small magnitudes and then retraining without hurting overall accuracy. The decrease of parameters was mainly concentrated in full connection layers, which achieved 3  ∼  4 ×  speedup in full connection layers during inference time. However, this pruning operation had generated an unstructured [22] sparse model, which additionally required sparse BLAS libraries [23] or even specialized hardware to achieve its acceleration [16]. Similar to our study, Li et al. [24] measured the relative importance of a convolution kernel in each layer by calculating the sum of its absolute weights, i.e., its l₁ norm. Compared to the minimum weight criterion[24], our criterion is based on mean gradient of feature maps in each layer, which more intuitively reflects the importance of feature extracted from convolutional kernels. Another pruning criterion obtained the sparsity of activations after a non-linear ReLU [25] mapping. Hu et al. [26] believed that if most outputs after these non-linear neurons are zero, the probability of neuronal redundancy should be bigger. This criterion measured importance score of a neuron by calculating its Average Percentage of Zeros (APoZ). However, APoZ pruning criterion requires the introduction of threshold parameters, which will vary from layer to layer. These two criteria simply and intuitively reflect the importance of channels for convolutional kernels or feature maps, but do not directly consider the final loss after pruning. In this paper, pruning algorithm was based on the importance of feature maps in each channel, and considered the effect on network performance after pruning a channel. Meanwhile hierarchical global pruning strategy and FLOPs constraint were introduced to significantly reduce the network FLOPs.

Firstly, channel pruning for CNNs with different structures will be achieved in Section 2. Secondly pruning criterion based on the mean gradient and hierarchical global pruning strategy will be proposed in Section 3. Effectiveness of the algorithm will be presented by experimental comparisons in Section 4. Finally, the paper will be concluded in Section 5.

Section snippets

Pruning channels and corresponding feature maps

The paper mainly studies the effect of channel pruning on reducing network FLOPs. Convolutional layers accounts for more than 90% [27] FLOPs of common CNNs. Therefore, we only prune convolutional layers, Sections 2.1 and 2.2 implement specific pruning on channels and their corresponding feature maps for different networks, respectively.

Channel pruning strategy

The proposed strategy for channel pruning consists of the following steps: (1) Given a pre-trained network model; (2) Evaluating the importance of feature map on each channel by mean gradient criterion; (3) Adopting a hierarchical global pruning strategy to prune less important channels and corresponding feature maps; (4) Alternate iterations of pruning and further fine-tuning; (5) Stopping pruning until the desired pruning target is achieved. The flow chart is depicted in Fig. 3. Our desired

Experiments

To verify the validity of our algorithm, the following experiments are conducted. Effect of removing channels with different order for mean gradient on network accuracy is considered in Section 4.1, which indicates that channels with larger mean gradient are more important in network performance. The comparison results of our strategy and global pruning strategy are shown in Section 4.2. Comparisons of different pruning criteria are given in Section 4.3, which shows that our algorithm can

Conclusion

In this paper, we apply channel pruning to accelerate CNNs and introduce a new criterion based on the mean gradient of feature maps, we propose hierarchical global pruning strategy to effectively reduce network FLOPs. During each pruning, we measure the importance of feature maps on each channel by its mean gradient and use hierarchical global pruning strategy to remove lower important feature maps, and then we obtain a smaller network model. We focus on the effect of removing feature maps on

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant Number: 61801325), the Natural Science Foundation of Tianjin City (Grant Number: 18JCQNJC00600) and the Huawei Innovation Research Program (Grant Number: HO2018085138).

References (29)

R. Girshick
Fast R-CNN
IEEE International Conference on Computer Vision
(2015)
H. Noh et al.
Learning deconvolution network for semantic segmentation
IEEE International Conference on Computer Vision
(2016)
X. Jia et al.
Guiding the long-short term memory model for image caption generation
IEEE International Conference on Computer Vision
(2016)
A. Krizhevsky et al.
Imagenet classification with deep convolutional neural networks
International Conference on Neural Information Processing Systems
(2012)
K. Simonyan et al.
Very deep convolutional networks for large-scale image recognition
arXiv preprint arXiv:1409.1556
(2014)
K. He et al.
Deep residual learning for image recognition
Computer Vision and Pattern Recognition
(2016)
M. Kuhn et al.
An introduction to feature selection
Applied Predictive Modeling
(2013)
C. Szegedy et al.
Rethinking the inception architecture for computer vision
Computer Vision and Pattern Recognition
(2016)
Y.D. Kim et al.
Compression of deep convolutional neural networks for fast and low power mobile applications
Comput. Sci.
(2015)
M. Denil et al.
Predicting parameters in deep learning
Advances in Neural Information Processing Systems
(2013)

E.L. Denton et al.

Exploiting linear structure within convolutional networks for efficient evaluation

Advances in Neural Information Processing Systems

(2014)

G.E. Hinton et al.

Improving neural networks by preventing co-adaptation of feature detectors

Comput. Sci.

(2012)

H. Zhou et al.

Less is more: towards compact cnns

European Conference on Computer Vision

(2016)

A. Novikov et al.

Tensorizing neural networks

Advances in Neural Information Processing Systems

(2015)

Cited by (70)

Reinforcement learning-based dynamic pruning for distributed inference via explainable AI in healthcare IoT systems
2024, Future Generation Computer Systems
Deep Neural Networks (DNNs) have become the key technique to revolutionize the healthcare sector. However, conducting online remote inference is often impractical due to privacy constraints and latency requirements. To enable local computation, researchers have attempted network pruning with minimal accuracy loss or DNN distribution without affecting the performance. Yet, distributed inference can be inefficient due to the energy overhead and fluctuation of communication channels between participants. On the other hand, given that realistic healthcare systems use pre-trained models, local pruning and retraining relying only on the available scarce data is not possible. Even pre-pruned DNNs are limited in their ability to customize to the local load of data and device dynamics. The online pruning of DNN inferences without retraining is viable; however, it was not considered in the literature as most well-known techniques do not perform well without adjustment. In this paper, we propose a novel pruning strategy using Explainable AI (XAI) to enhance the performance of pruned DNNs without retraining, a necessity due to the scarcity and bias of local healthcare data. We combine distribution and pruning techniques to perform online distributed inference assisted by dynamic pruning when needed for highest accuracy. We use Non-Linear Integer Programming (NLP) to formulate our approach as a trade-off between resources and accuracy, and Reinforcement Learning (RL) to relax the problem and adapt to dynamic requirements. Our pruning criterion shows high performance compared to other reference techniques and ability to assist distribution by reducing resource usage while keeping high accuracy.
Model compression of deep neural network architectures for visual pattern recognition: Current status and future directions
2024, Computers and Electrical Engineering
Visual Pattern Recognition Networks (VPRNs) are widely used in various visual data based applications such as computer vision and edge AI. VPRNs help to enhance a machine's learning by extracting underlying patterns from input images and videos. VPRNs can be implemented using statistical, syntactical, or Machine Learning (ML) based approaches. With the popularity of Deep Neural Networks (DNNs) in the recent past, DNN based VPRNs have become an inevitable choice due to their suitability for handling such high dimensional data.
However, such DNN based VPRNs bring along a curse of dimensionality, leading to intricate computations, substantial memory demands, and increased energy requirements. This impedes their practical deployment in resource constrained and strict latency required environments. Such overheads impose a demand for compression of VPRNs without impairing their performance. This research presents an exhaustive survey on compression techniques used for sequential, non-sequential and advanced DNN architectures from the perspective of Visual Pattern Recognition (VPR). Research findings of this study reveal potential scope to compress non-sequential, sequential and advanced DNN architectures for VPR applications. The study also clarifies challenges, reported and unreported issues as well as research gaps and provides directions for further research in this domain.
VeriPrune: Equivalence verification of node pruned neural network
2024, Neurocomputing
Neural network compression is a widely used technique when deploying the neural network in energy-constrained and computation-constrained devices. To guarantee the compressed neural network is still usable without too much accuracy loss, the equivalence between the compressed networks and the original ones must be verified. However, the current verification approach can only verify the equivalence of the two neural networks in the same structure which is limited and unseen in real-world scenarios. In this paper, we proposed an equivalence verification method named VeriPrune, which can verify the equivalence of the deep neural networks without the structure limitation. In detail, we proposed an innovative virtual node completing method to solve the problem that node pruning causes structure change and invalidates the existing equivalence verification approaches. To demonstrate the feasibility and efficiency of the proposed approach, we conducted experiments on the public dataset with 49 DNNs and 1272 properties. The results show that the 83.9% properties, the 1067 of the total 1272 properties can be verified. The proposed VeriPrune can be further developed as a CASE tool for the industry settings.
A novel data-driven sensor placement optimization method for unsupervised damage detection using noise-assisted neural networks with attention mechanism
2024, Mechanical Systems and Signal Processing
Optimization of sensor placement (OSP) is one of the important steps in structural health monitoring to reduce the instrumentation cost, and improve damage detection. Although modal analysis is an optional intermediate process in terms of damage detection, conventional OSP methods for damage detection are mostly dependent on mode shapes. In this case, the sensor placement and damage detection are highly relying on the accuracy of modal analysis, and the results may not be adapted to different type of excitations. In this article, a novel noise-assisted neural network with attention mechanism is proposed based on the above challenges. This method which can be used to optimize sensor placement in an unsupervised and data-driven manner, is verified using a dataset simulated from the ASCE benchmark and an experimental dataset obtained from shake table tests. The results from the simulated dataset show that the percentage of the sensors that could be removed are higher than the conventional effective independence (EFI) method, with a highest of 62.5% for cases with low noise levels. In the meantime, the occurrence and the level of damage can still be well detected with a reduced number of sensors. Most importantly, in the proposed approach the optimal sensor placement configurations can be determined adaptively to account for different forms of excitations and noise levels, which is impossible for conventional model-driven methods. The results obtained from the experimental dataset show that the proposed method is also effective for real-world applications. As a result, the proposed data-driven OSP method skips the conventional model analysis process and directly focuses on the sensor arrangement that enables accurate detection of damage. It also has the potential for application in related fields, such as the monitoring of aerospace and mechanical infrastructures.
An efficient deep learning model using network pruning for fake banknote recognition
2023, Expert Systems with Applications
In recent years, pruning methods have been proposed to reduce the size of image classification models based on CNNs and to shorten their inference times. However, most of them are based on setting the less important parameters of the model to zero, but not on reshaping the network. Therefore, in this paper we propose a pruning methodology for sequential CNNs that modifies the shape of the network (both convolutional and fully connected layers) and was applied to four sequential networks (a custom, AlexNet, VGG11 and VGG16) for the “Original and counterfeit Colombian peso banknotes” dataset. The pruned models were evaluated quantitatively and qualitatively. First, in terms of accuracy against FLOPs and parameter reduction. Second, using HiResCAM to explain the patterns on which the model is based for decision making. Quantitative results show a reduction in parameters and FLOPs of about 75% for a reduction in accuracy of up to 0.5% across all models in this study. With larger reductions of about 95%, the AlexNet-pruned and VGG16-pruned models significantly reduced their accuracy (38.1% and 21.9%, respectively), while the custom-pruned model and the VGG11-pruned model reduced their accuracy by only 0.3% and 0.9%, respectively. Furthermore, using HiResCAM, it was observed that the custom-pruned model and the VGG11-pruned model better preserved the original model activations, even at high pruning percentages, and achieved lower accuracy reductions than their competitors. With the proposed methodology, any sequential model can be pruned and reshaped while largely retaining the accuracy of the unpruned model, or even improving it for low pruning percentages.
Auto-encoder design based on the 1D-VD-CNN model for the detection of honeysuckle from unknown origin
2023, Journal of Pharmaceutical and Biomedical Analysis
The disadvantages of the traditional one-dimensional convolution neural network (1D-CNN) model based on honeysuckle near-infrared spectral data (NIRS) include high parameter quantity, low efficiency, and inability to identify unknown categories effectively. In this paper, we propose a one-dimensional very deep convolution neural network (1D-VD-CNN) and design an auto-encoder mechanism for detecting honeysuckle from unexplored habitats. First, the 1D-VD-CNN model uses the efficient very deep (VD) structure to replace the hidden layer structure in the traditional 1D-CNN model. The model can be directly applied to analyze one-dimensional near-infrared spectral data (NIRS). Second, combining the reconstruction error of the auto-encoder, a honeysuckle identification method considering an unknown origin is designed, which can solve the problem of high confidence in convolution neural networks by using an auto-encoder and reconstruction errors of the samples to be tested. Whether the sample is an unknown variety can be determined by comparing the corrected confidence level with the preset threshold value. The results show that the accuracy of the 1D-VD-CNN training set and test set is 100%, and the loss value converges to 0.001. Compared with the traditional 1D-CNN model, the parameters and FLOPs are reduced by nearly 71% and 8%, respectively. At the same time, compared with the NIRS analysis and the PLS-DA method, the 1D-VD-CNN model has higher efficiency and better recognition performance for honeysuckle near-infrared spectral classification. Meanwhile, the accuracy rate of the auto-encoder for the category detection mechanism of honeysuckle from an unknown origin is 98%. The model can quickly and efficiently classify honeysuckle from different habitats and detect honeysuckle from unexplored habitats.

View all citing articles on Scopus

View full text

Channel pruning based on mean gradient for accelerating Convolutional Neural Networks

Highlights

Abstract

Introduction

Section snippets

Pruning channels and corresponding feature maps

Channel pruning strategy

Experiments

Conclusion

Acknowledgment

Fast R-CNN

IEEE International Conference on Computer Vision

Learning deconvolution network for semantic segmentation

IEEE International Conference on Computer Vision

Guiding the long-short term memory model for image caption generation

IEEE International Conference on Computer Vision

Imagenet classification with deep convolutional neural networks

International Conference on Neural Information Processing Systems

Very deep convolutional networks for large-scale image recognition

arXiv preprint arXiv:1409.1556

Deep residual learning for image recognition

Computer Vision and Pattern Recognition

An introduction to feature selection

Applied Predictive Modeling

Rethinking the inception architecture for computer vision

Computer Vision and Pattern Recognition

Compression of deep convolutional neural networks for fast and low power mobile applications

Comput. Sci.

Predicting parameters in deep learning

Advances in Neural Information Processing Systems

Exploiting linear structure within convolutional networks for efficient evaluation

Advances in Neural Information Processing Systems

Improving neural networks by preventing co-adaptation of feature detectors

Comput. Sci.

Less is more: towards compact cnns

European Conference on Computer Vision

Tensorizing neural networks

Advances in Neural Information Processing Systems