RFPruning: A retraining-free pruning method for accelerating convolutional neural networks

doi:10.1016/j.asoc.2021.107860

Applied Soft Computing

Volume 113, Part A, December 2021, 107860

https://doi.org/10.1016/j.asoc.2021.107860 Get rights and content

Highlights

•
A retraining-free pruning framework is proposed to compress CNN models.
•
An ADMM-based sparse-learning method is introduced to make CNN models prunable.
•
A GA-based pruning strategy is designed to obtain the optimal pruned CNN models.
•
The proposed RFPruning works as well as the methods with retraining, but faster.

Abstract

Network pruning has been developed as a remedy for accelerating the inference of deep convolutional neural networks (DCNNs). The mainstream methods retrain the pruned models, which maintain the performance of the pruned models but consume a great deal of time. While the other methods reduce the time consumption by omitting to retrain, they lose the performance. To resolve the above conflicts, we propose a two-stage Retraining-Free pruning method, named RFPruning, which embeds the rough screening of channels into training and fine-tunes the structures during pruning, to achieve both good performance and low time consumption. In the first stage, the network training is reformulated as an optimization problem with constraints and solved by a sparse learning approach for rough channel selection. In the second stage, the pruning process is regarded as a multiobjective optimization problem, where the genetic algorithm is applied to carefully select channels for a trade-off between the performance and model size. The proposed method is evaluated against several DCNNs on CIFAR-10 and ImageNet datasets. Extensive experiments demonstrate that such a retraining-free pruning method obtains 43.0% $\sim$ 88.4% compression on model size and maintains the accuracy as the methods with retraining while achieving $3 \times$ speed up in pruning.

Graphical abstract

Introduction

Benefited from the powerful capacity of feature extraction, deep convolutional neural networks (DCNNs) have achieved significant advances in various computer vision tasks. However, the powerful learning capacity of DCNNs relies on expensive storage and computational resources, which restrict applications of DCNNs on mobile devices. This issue drives the research of network acceleration algorithms [1], [2], [3], [4], where network pruning is one of the most effective methods.

In general, network pruning is aimed to remove the unimportant network parameters and can be divided into two categories, i.e., unstructured pruning [5], [6] and structured pruning [7], [8], [9]. The unstructured pruning methods achieve high compression ratios because arbitrary weights in the network may be pruned. However, they destroy the form of the weight matrix and need special hardware to store the pruned weights, which limits their application in real hardware implementation. The structured pruning methods reduce the number of parameters by pruning whole channels or filters of the network. By this means, the size of the weight matrix is compressed and the structure of the matrix is preserved. Therefore, the structured pruning methods are more flexible for most hardware implementations and receive increased attention.

The framework of most existing structured pruning methods [10], [11] contains three stages: training, pruning, and retraining. As revealed by Z. Liu et al. [12] and X. Ding et al. [13], such a framework exists two major drawbacks: (1) The optimization process, which requires iterative pruning and retraining, is computation-intensive and time-consuming. (2) The pruned models, at the retraining stage, are easily trapped into bad local minima. For overcoming the two drawbacks, some works (e.g., GAL [14] and VCNNP [15]) have proposed to omit the retraining stage and directly obtain the target pruned models from the trained models. However, the compression ratio and accuracy they achieved are inferior to those of the methods with retraining. Thus, it is a challenge for both good performance and fast deployment.

In this paper, we propose a retraining-free structured pruning method, named RFPruning, for fast and efficient network pruning. To directly achieve the good-performing pruned models from the original models, we embed the rough screening of channels into training and fine-tune the structures during pruning. In the training stage, the network training is reformulated as an optimization problem with the sparse constraint, which is abstracted as $l_{0}$ norm in the loss function. Considering that $l_{0}$ norm is not differentiable, a sparse-learning approach based on the Alternating Direction Method of Multipliers (ADMM) algorithm is proposed to minimize the loss function. Through such an optimization, the unimportant channels are roughly identified and invalidated, which prevents the significant channels from being pruned by mistake and makes the models prunable. In the pruning stage, the task of obtaining the optimal pruned model from the trained model is regarded as a multiobjective optimization problem, where the targets of optimization are the accuracy and compression ratios. For a trade-off between performance and model size, the genetic algorithm is chosen to fine-tune the structure of pruned models, which is implemented by searching suitable pruning rates for each layer. The contributions of this work can be summarized as follows:

•
A retraining-free pruning framework, which contains sparse-learning and automatic searching, is proposed for both the fast pruning process and good model performance.
•
An ADMM-based sparse-learning technique is embedded into training to roughly identify and invalidate insignificant channels during training, which prevents important channels from being removed by mistake and makes the DCNN models prunable.
•
A genetic algorithm-based pruning strategy is developed to fine-tune the network structure for a trade-off between performance and model size. This strategy automatically obtains the optimal pruning rates for models, while has no requirement of additional optimized networks like other automatic strategies (e.g., GAL [14] and MetaPruning [16]).
•
Extensive experiments on CIFAR-10 and ImageNet demonstrate that the proposed retraining-free pruning method works as well as the competing methods with retraining, on several popular DCNN models, including VGGNet [17], ResNet [18], GoogLeNet [19], and DenseNet [20].

The organization of this paper is as follows. The relevant literature is reviewed and discussed in Section 2. The proposed method is introduced in detail in Section 3. In Section 4, we report and analyze the experimental results. Finally, this paper is concluded in Section 5.

Section snippets

Related work

Unstructured pruning: The unstructured pruning methods can be traced back to Optimal Brain Damage [21] and Optimal Brain Surgeon [22], which pruned network parameters according to the Hessian matrix of weights. SNIP [5] proposed to prune network parameters based on the derivatives of weights. Deep Compression [6] and H. Song et al. [23] evaluated the absolute value of weights and pruned the small ones. These methods often achieved high compression ratio because they can prune arbitrary weights

Proposed method

This section describes the proposed pruning method in detail. First, the motivations of the proposed method are introduced in Section 3.1. Second, the overall framework of the method is described in Section 3.2. Third, the training approach is introduced in Section 3.3. Fourth, the automatic pruning strategy is proposed in Section 3.4. Finally, the special operations for pruning networks with cross-layer connections are illustrated in Section 3.5.

Experiments

In this section, the performance of the proposed method is demonstrated and analyzed through a series of experiments. The experimental settings are introduced in Section 4.1. The practicality of the proposed RFPruning is evaluated in Section 4.2. In addition, the ablation experiments are conducted in Section 4.3 to analyze the effects of the proposed sparse-learning scheme and automatic pruning strategy in more detail.

Conclusion

In this paper, we propose a retraining-free pruning method, named RFPruning, to compress the DCNNs in a fast way and maintain the performance simultaneously. The network pruning task is formulated as two optimization problems, which are solved by the ADMM based sparse-learning and genetic algorithm respectively. Experimental results showed that the ADMM based sparse-learning is beneficial for making the DCNN models prunable and the application of the genetic algorithm is effective in improving

CRediT authorship contribution statement

Zhenyu Wang: Conceptualization, Methodology, Software, Investigation, Writing – original draft, Writing – review & editing. Xuemei Xie: Writing – review & editing, Resources, Project administration, Funding acquisition. Guangming Shi: Supervision, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant (No. 61632019, and 61836008) and the National Key R&D Program of China under Grant (No. 2020AAA0109301).

References (58)

LipowskiA. et al.
Roulette-wheel selection via stochastic acceptance
Physica A
(2012)
WangZ. et al.
Network pruning using sparse learning and genetic algorithm
Neurocomputing
(2020)
M. Denil, B. Shakibi, L. Dinh, M. Ranzato, N. de Freitas, Predicting parameters in deep learning, in: Proceedings of...
JaderbergM. et al.
Speeding up convolutional neural networks with low rank expansions
YangH. et al.
Energy-constrained compression for deep neural networks via weighted sparse projection and layer input masking
YangH. et al.
Ecc: Platform-independent energy-constrained deep neural network compression via a bilinear regression model
LeeN. et al.
Snip: Single-shot network pruning based on connection sensitivity
S. HanH.M. et al.
Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
H. LiA. et al.
Pruning filters for efficient convnets
AnwarS. et al.
Structured pruning of deep convolutional neural networks
ACM J. Emerg. Technol. Comput. Syst. (JETC)
(2017)

MolchanovP. et al.

Pruning convolutional neural networks for resource efficient inference

YeJ. et al.

Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers

WenW. et al.

Learning structured sparsity in deep neural networks

LiuZ. et al.

Rethinking the value of network pruning

DingX. et al.

Lossless CNN channel pruning via decoupling remembering and forgetting

(2020)

LinS. et al.

Towards optimal structured cnn pruning via generative adversarial learning

ZhaoC. et al.

Variational convolutional neural network pruning

Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K.-T. Cheng, J. Sun, Metapruning: Meta learning for automatic neural network...

SimonyanK. et al.

Very deep convolutional networks for large-scale image recognition

Comput. Sci.

(2014)

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference...

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with...

G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the...

LeCunY. et al.

Optimal brain damage

HassibiB. et al.

Second order derivatives for network pruning: Optimal brain surgeon

HanS. et al.

Learning both weights and connections for efficient neural network

MolchanovP. et al.

Importance estimation for neural network pruning

J.-H. Luo, J. Wu, W. Lin, Thinet: A filter level pruning method for deep neural network compression, in: Proceedings of...

HeY. et al.

Filter pruning via geometric median for deep convolutional neural networks acceleration

HuH. et al.

Network trimming: A data-driven neuron pruning approach towards efficient deep architectures

(2016)

Cited by (10)

MEEDNets: Medical Image Classification via Ensemble Bio-inspired Evolutionary DenseNets
2023, Knowledge-Based Systems
Inspired by the biological evolution, this paper proposes an evolutionary synthesis mechanism to automatically evolve DenseNet towards high sparsity and efficiency for medical image classification. Unlike traditional automatic design methods, this mechanism generates a sparser offspring in each generation based on its previous trained ancestor. Concretely, we use a synaptic model to mimic biological evolution in the asexual reproduction. Each generation’s knowledge is passed down to its descendant, and an environmental constraint limits the size of the descendant evolutionary DenseNet, moving the evolution process towards high sparsity. Additionally, to address the limitation of ensemble learning that requires multiple base networks to make decisions, we propose an evolution-based ensemble learning mechanism. It utilises the evolutionary synthesis scheme to generate highly sparse descendant networks, which can be used as base networks to perform ensemble learning in inference. This is specially useful in the extreme case when there is only a single network. Finally, we propose the MEEDNets (Medical Image Classification via Ensemble Bio-inspired Evolutionary DenseNets) model which consists of multiple evolutionary DenseNet-121s synthesised in the evolution process. Experimental results show that our bio-inspired evolutionary DenseNets are able to drop less important structures and compensate for the increasingly sparse architecture. In addition, our proposed MEEDNets model outperforms the state-of-the-art methods on two publicly accessible medical image datasets. All source code of this study is available at https://github.com/hengdezhu/MEEDNets.
FSConv: Flexible and separable convolution for convolutional neural networks compression
2023, Pattern Recognition
Because of limited computation resources, convolutional neural networks (CNNs) are difficult to deploy on mobile devices. To overcome this issue, many methods have successively reduced parameters in CNNs with the idea of removing redundancy among feature maps. We observe similarities between feature maps at the same layer but not complete consistency. Intuitively, the difference between similar feature maps is an essential ingredient for the success of CNNs. Therefore, we propose a flexible and separable convolution (FSConv) in a different perspective to embrace redundancy while requiring less computation, which can implicitly cluster feature maps into different clusters without introducing similarity measurements. Our proposed model extracts intrinsic information from the representative part through ordinary convolution in each cluster and reveals tiny hidden details from the redundant part through groupwise/depthwise convolution. Experimental results demonstrate that FSConv-equipped networks always perform better than previous state-of-the-art CNNs compression algorithms. Code is available at https://github.com/Clarkxielf/FSConv-Flexible-and-Separable-Convolution-for-Convolutional-Neural-Networks-Compression.
An accelerating convolutional neural networks via a 2D entropy based-adaptive filter search method for image recognition
2023, Applied Soft Computing
The success of CNNs for various vision tasks has been accompanied by a significant increase in required FLOPs and parameter quantities, which has impeded the deployment of CNNs on devices with limited computing resources and power budgets. Network pruning, which compresses and accelerates CNN models, is an effective solution to this issue. Some studies have considered pruning as a special case of neural network search (NAS) in recent years. However, existing techniques are often computationally complex or prone to sub-optimal pruning results. As such, this paper proposes a novel acceleration method via a 2D Entropy based-Adaptive Filter Search (2EAFS). The importance of corresponding filters, measured by utilizing the amount of information contained in feature maps, is employed as a theoretical guide to simplify the complex exhaustive search process. Information entropy is then normalized layer by layer and the resulting value is used to calculate a layer-wise importance score in a single step. Additionally, a sparse constraint equation is constructed based on the negative correlation between filter pruning rates and the importance of convolutional layers. The Nelder–Mead search algorithm is then adopted to quickly and adaptively determine the optimal pruning architecture. Finally, importance weights are inherited using the pruning rate and 2D entropy and model performance are restored through fine-tuning. Extensive experiments conducted with the CIFAR-10/100, ILSVRC-2012, NWPU-RESISC45 and CUB-200-2011 datasets showed this approach achieved considerable accuracy increases, with significant reductions in FLOPs and required parameters that surpassed current state-of-the-art methods by a wide margin. For example, 2EAFS achieved a 44.1% reduction in FLOPs over ResNet-50, with only a 0.53% Top-5 accuracy decrease for ILSVRC-2012.
Boosting the Convergence of Reinforcement Learning-Based Auto-Pruning Using Historical Data
2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Adaptive Search-and-Training for Robust and Efficient Network Pruning
2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
Filter pruning via feature map clustering
2023, Intelligent Data Analysis

View all citing articles on Scopus

¹: Senior Member, IEEE.

²: Fellow, IEEE.

View full text