RFPruning: A retraining-free pruning method for accelerating convolutional neural networks

https://doi.org/10.1016/j.asoc.2021.107860Get rights and content

Highlights

  • A retraining-free pruning framework is proposed to compress CNN models.

  • An ADMM-based sparse-learning method is introduced to make CNN models prunable.

  • A GA-based pruning strategy is designed to obtain the optimal pruned CNN models.

  • The proposed RFPruning works as well as the methods with retraining, but faster.

Abstract

Network pruning has been developed as a remedy for accelerating the inference of deep convolutional neural networks (DCNNs). The mainstream methods retrain the pruned models, which maintain the performance of the pruned models but consume a great deal of time. While the other methods reduce the time consumption by omitting to retrain, they lose the performance. To resolve the above conflicts, we propose a two-stage Retraining-Free pruning method, named RFPruning, which embeds the rough screening of channels into training and fine-tunes the structures during pruning, to achieve both good performance and low time consumption. In the first stage, the network training is reformulated as an optimization problem with constraints and solved by a sparse learning approach for rough channel selection. In the second stage, the pruning process is regarded as a multiobjective optimization problem, where the genetic algorithm is applied to carefully select channels for a trade-off between the performance and model size. The proposed method is evaluated against several DCNNs on CIFAR-10 and ImageNet datasets. Extensive experiments demonstrate that such a retraining-free pruning method obtains 43.0% 88.4% compression on model size and maintains the accuracy as the methods with retraining while achieving 3× speed up in pruning.

Introduction

Benefited from the powerful capacity of feature extraction, deep convolutional neural networks (DCNNs) have achieved significant advances in various computer vision tasks. However, the powerful learning capacity of DCNNs relies on expensive storage and computational resources, which restrict applications of DCNNs on mobile devices. This issue drives the research of network acceleration algorithms [1], [2], [3], [4], where network pruning is one of the most effective methods.

In general, network pruning is aimed to remove the unimportant network parameters and can be divided into two categories, i.e., unstructured pruning [5], [6] and structured pruning [7], [8], [9]. The unstructured pruning methods achieve high compression ratios because arbitrary weights in the network may be pruned. However, they destroy the form of the weight matrix and need special hardware to store the pruned weights, which limits their application in real hardware implementation. The structured pruning methods reduce the number of parameters by pruning whole channels or filters of the network. By this means, the size of the weight matrix is compressed and the structure of the matrix is preserved. Therefore, the structured pruning methods are more flexible for most hardware implementations and receive increased attention.

The framework of most existing structured pruning methods [10], [11] contains three stages: training, pruning, and retraining. As revealed by Z. Liu et al. [12] and X. Ding et al. [13], such a framework exists two major drawbacks: (1) The optimization process, which requires iterative pruning and retraining, is computation-intensive and time-consuming. (2) The pruned models, at the retraining stage, are easily trapped into bad local minima. For overcoming the two drawbacks, some works (e.g., GAL [14] and VCNNP [15]) have proposed to omit the retraining stage and directly obtain the target pruned models from the trained models. However, the compression ratio and accuracy they achieved are inferior to those of the methods with retraining. Thus, it is a challenge for both good performance and fast deployment.

In this paper, we propose a retraining-free structured pruning method, named RFPruning, for fast and efficient network pruning. To directly achieve the good-performing pruned models from the original models, we embed the rough screening of channels into training and fine-tune the structures during pruning. In the training stage, the network training is reformulated as an optimization problem with the sparse constraint, which is abstracted as l0 norm in the loss function. Considering that l0 norm is not differentiable, a sparse-learning approach based on the Alternating Direction Method of Multipliers (ADMM) algorithm is proposed to minimize the loss function. Through such an optimization, the unimportant channels are roughly identified and invalidated, which prevents the significant channels from being pruned by mistake and makes the models prunable. In the pruning stage, the task of obtaining the optimal pruned model from the trained model is regarded as a multiobjective optimization problem, where the targets of optimization are the accuracy and compression ratios. For a trade-off between performance and model size, the genetic algorithm is chosen to fine-tune the structure of pruned models, which is implemented by searching suitable pruning rates for each layer. The contributions of this work can be summarized as follows:

  • A retraining-free pruning framework, which contains sparse-learning and automatic searching, is proposed for both the fast pruning process and good model performance.

  • An ADMM-based sparse-learning technique is embedded into training to roughly identify and invalidate insignificant channels during training, which prevents important channels from being removed by mistake and makes the DCNN models prunable.

  • A genetic algorithm-based pruning strategy is developed to fine-tune the network structure for a trade-off between performance and model size. This strategy automatically obtains the optimal pruning rates for models, while has no requirement of additional optimized networks like other automatic strategies (e.g., GAL [14] and MetaPruning [16]).

  • Extensive experiments on CIFAR-10 and ImageNet demonstrate that the proposed retraining-free pruning method works as well as the competing methods with retraining, on several popular DCNN models, including VGGNet [17], ResNet [18], GoogLeNet [19], and DenseNet [20].

The organization of this paper is as follows. The relevant literature is reviewed and discussed in Section 2. The proposed method is introduced in detail in Section 3. In Section 4, we report and analyze the experimental results. Finally, this paper is concluded in Section 5.

Section snippets

Related work

Unstructured pruning: The unstructured pruning methods can be traced back to Optimal Brain Damage [21] and Optimal Brain Surgeon [22], which pruned network parameters according to the Hessian matrix of weights. SNIP [5] proposed to prune network parameters based on the derivatives of weights. Deep Compression [6] and H. Song et al. [23] evaluated the absolute value of weights and pruned the small ones. These methods often achieved high compression ratio because they can prune arbitrary weights

Proposed method

This section describes the proposed pruning method in detail. First, the motivations of the proposed method are introduced in Section 3.1. Second, the overall framework of the method is described in Section 3.2. Third, the training approach is introduced in Section 3.3. Fourth, the automatic pruning strategy is proposed in Section 3.4. Finally, the special operations for pruning networks with cross-layer connections are illustrated in Section 3.5.

Experiments

In this section, the performance of the proposed method is demonstrated and analyzed through a series of experiments. The experimental settings are introduced in Section 4.1. The practicality of the proposed RFPruning is evaluated in Section 4.2. In addition, the ablation experiments are conducted in Section 4.3 to analyze the effects of the proposed sparse-learning scheme and automatic pruning strategy in more detail.

Conclusion

In this paper, we propose a retraining-free pruning method, named RFPruning, to compress the DCNNs in a fast way and maintain the performance simultaneously. The network pruning task is formulated as two optimization problems, which are solved by the ADMM based sparse-learning and genetic algorithm respectively. Experimental results showed that the ADMM based sparse-learning is beneficial for making the DCNN models prunable and the application of the genetic algorithm is effective in improving

CRediT authorship contribution statement

Zhenyu Wang: Conceptualization, Methodology, Software, Investigation, Writing – original draft, Writing – review & editing. Xuemei Xie: Writing – review & editing, Resources, Project administration, Funding acquisition. Guangming Shi: Supervision, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant (No. 61632019, and 61836008) and the National Key R&D Program of China under Grant (No. 2020AAA0109301).

References (58)

  • LipowskiA. et al.

    Roulette-wheel selection via stochastic acceptance

    Physica A

    (2012)
  • WangZ. et al.

    Network pruning using sparse learning and genetic algorithm

    Neurocomputing

    (2020)
  • M. Denil, B. Shakibi, L. Dinh, M. Ranzato, N. de Freitas, Predicting parameters in deep learning, in: Proceedings of...
  • JaderbergM. et al.

    Speeding up convolutional neural networks with low rank expansions

  • YangH. et al.

    Energy-constrained compression for deep neural networks via weighted sparse projection and layer input masking

  • YangH. et al.

    Ecc: Platform-independent energy-constrained deep neural network compression via a bilinear regression model

  • LeeN. et al.

    Snip: Single-shot network pruning based on connection sensitivity

  • S. HanH.M. et al.

    Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding

  • H. LiA. et al.

    Pruning filters for efficient convnets

  • AnwarS. et al.

    Structured pruning of deep convolutional neural networks

    ACM J. Emerg. Technol. Comput. Syst. (JETC)

    (2017)
  • MolchanovP. et al.

    Pruning convolutional neural networks for resource efficient inference

  • YeJ. et al.

    Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers

  • WenW. et al.

    Learning structured sparsity in deep neural networks

  • LiuZ. et al.

    Rethinking the value of network pruning

  • DingX. et al.

    Lossless CNN channel pruning via decoupling remembering and forgetting

    (2020)
  • LinS. et al.

    Towards optimal structured cnn pruning via generative adversarial learning

  • ZhaoC. et al.

    Variational convolutional neural network pruning

  • Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K.-T. Cheng, J. Sun, Metapruning: Meta learning for automatic neural network...
  • SimonyanK. et al.

    Very deep convolutional networks for large-scale image recognition

    Comput. Sci.

    (2014)
  • K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference...
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with...
  • G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the...
  • LeCunY. et al.

    Optimal brain damage

  • HassibiB. et al.

    Second order derivatives for network pruning: Optimal brain surgeon

  • HanS. et al.

    Learning both weights and connections for efficient neural network

  • MolchanovP. et al.

    Importance estimation for neural network pruning

  • J.-H. Luo, J. Wu, W. Lin, Thinet: A filter level pruning method for deep neural network compression, in: Proceedings of...
  • HeY. et al.

    Filter pruning via geometric median for deep convolutional neural networks acceleration

  • HuH. et al.

    Network trimming: A data-driven neuron pruning approach towards efficient deep architectures

    (2016)
  • Cited by (10)

    • Boosting the Convergence of Reinforcement Learning-Based Auto-Pruning Using Historical Data

      2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
    • Adaptive Search-and-Training for Robust and Efficient Network Pruning

      2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
    • Filter pruning via feature map clustering

      2023, Intelligent Data Analysis
    View all citing articles on Scopus
    1

    Senior Member, IEEE.

    2

    Fellow, IEEE.

    View full text