Abstract
Existing methods for filter pruning mostly rely on specific data-driven paradigms but lack the interpretability. Besides, these approaches usually assign layer-wise compression ratios automatically only under given FLOPs by neural architecture search algorithms or just manually, which are short of efficiency. In this paper, we propose a novel interpretable task-inspired adaptive filter pruning method for neural networks to solve the above problems. First, we treat filters as semantic detectors and develop the task-inspired importance criteria by evaluating correlations between input tasks and feature maps, and observing the information flow through filters between adjacent layers. Second, we refer to the human neurobiological mechanism for the better interpretability, where the retained first layer filters act as individual information receivers. Third, inspired by the phenomenon that each filter has a deterministic impact on FLOPs and network parameters, we provide an efficient adaptive compression ratio allocation strategy based on differentiable pruning approximation under multiple budget constraints, as well as considering the performance objective. The proposed method is validated with extensive experiments on the state-of-the-art neural networks, which significantly outperforms all the existing filter pruning methods and achieves the best trade-off between neural network compression and task performance. With ResNet-50 on ImageNet, our approach reduces 75.49% parameters and 70.90% FLOPs, only suffering from 2.31% performance degradation.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Fig8_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
The datasets generated during and/or analysed during the current study are available in CIFAR-10 at https://www.cs.toronto.edu/~kriz/cifar.html and ImageNet (ILSVRC2012) at https://www.image-net.org/challenges/LSVRC/index.php.
References
Bau, D., Zhou, B., Khosla, A., et al. (2017). Network dissection: Quantifying interpretability of deep visual representations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3319–3327).
Bau, D., Zhu, J. Y., Strobelt, H., et al. (2020). Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences (PNAS), 117(48), 30071–30078.
Chan, L., Hosseini, M. S., & Plataniotis, K. N. (2021). A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. International Journal of Computer Vision (IJCV), 129(2), 361–384.
Chen, H., Zhuo, L., Zhang, B., et al. (2021). Binarized neural architecture search for efficient object recognition. International Journal of Computer Vision (IJCV), 129(2), 501–516.
Chin, T. W., Ding, R., Zhang, C., et al. (2020). Towards efficient model compression via learned global ranking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1518–1528).
Crick, F., & Koch, C. (1995). Are we aware of neural activity in primary visual cortex? Nature, 375(6527), 121–123.
Ding, X., Ding, G., Guo, Y., et al. (2019). Centripetal SGD for pruning very deep convolutional networks with complicated structure. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4943–4953).
Ding, X., Hao, T., Tan, J., et al. (2021). Resrep: Lossless CNN pruning via decoupling remembering and forgetting. In IEEE international conference on computer vision (ICCV) (pp. 4510–4520).
Dong, X., & Yang, Y. (2019). Network pruning via transformable architecture search. In Neural information processing systems (NeurIPS).
Dong, Y., Ni, R., Li, J., et al. (2019). Stochastic quantization for learning accurate low-bit deep neural networks. International Journal of Computer Vision (IJCV), 127(11), 1629–1642.
Fan, S., Gao, W., & Li, G. (2022). Salient object detection for point clouds. In European conference on computer vision (pp. 1–19). Springer.
Fu, C., Li, G., Song, R., et al. (2022). Octattention: Octree-based large-scale contexts model for point cloud compression. In Proceedings of the AAAI conference on artificial intelligence (pp. 625–633).
Gao, W., Tao, L., Zhou, L., et al. (2020). Low-rate image compression with super-resolution learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 154–155).
Gao, W., Liao, G., Ma, S., et al. (2021). Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(4), 2091–2106.
Gao, W., Zhou, L., & Tao, L. (2021). A fast view synthesis implementation method for light field applications. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 17(4), 1–20.
Gao, W., Guo, Y., Ma, S., et al. (2022). Efficient neural network compression inspired by compressive sensing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3186008
Gao, W., Ye, H., Li, G., et al. (2022b). Openpointcloud: An open-source algorithm library of deep learning based point cloud compression. In Proceedings of the 30th ACM international conference on multimedia (pp. 7347–7350).
Gao, W., Fan, S., Li, G., et al. (2023). A thorough benchmark and a new model for light field saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8003–8019. https://doi.org/10.1109/TPAMI.2023.3235415
Geng, C., Huang, S. J., & Chen, S. (2021). Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(10), 3614–3631.
Gross, C. G. (2002). Genealogy of the “grandmother cell.” The Neuroscientist, 8(5), 512–518.
Guo, S., Wang, Y., Li, Q., et al. (2020). Dmcp: Differentiable markov channel pruning for neural networks. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1536–1544).
Guo, Y., & Gao, W. (2022). Semantic-driven automatic filter pruning for neural networks. In 2022 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition(CVPR) (pp. 770–778).
He, Y., Lin, J., Liu, Z., et al. (2018). Amc: Automl for model compression and acceleration on mobile devices. In: European conference on computer vision (ECCV) (pp. 784–800).
He, Y., Ding, Y., Liu, P., et al. (2020). Learning filter pruning criteria for deep convolutional neural networks acceleration. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2006–2015).
Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In European conference on computer vision (ECCV) (pp. 304–320).
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Citeseer.
Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Neural information processing systems (NeurIPS).
Li, B., Wu, B., Su, J., et al. (2020). Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European conference on computer vision (ECCV) (pp. 639–654). Springer.
Li, H., Kadav, A., Durdanovic, I., et al. (2017). Pruning filters for efficient convnets. In International conference on learning representations (ICLR).
Li, Y., Lin, S., Zhang, B., et al. (2019). Exploiting kernel sparsity and entropy for interpretable CNN compression. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2795–2804).
Lin, M., Ji, R., Wang, Y., et al. (2020a). Hrank: Filter pruning using high-rank feature map. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1526–1535).
Lin, M., Ji, R., Zhang, Y., et al. (2020b). Channel pruning via automatic structure search. In International joint conference on artificial intelligence(IJCAI).
Lin, M., Cao, L., Li, S., et al. (2022). Filter sketch for network pruning. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(12), 7091–7100.
Lin, M., Cao, L., Zhang, Y., et al. (2022b). Pruning networks with cross-layer ranking & k-reciprocal nearest filters. In IEEE Transactions on neural networks and learning systems (TNNLS) (pp. 1–10).
Lin, M., Ji, R., Li, S., et al. (2022). Network pruning using adaptive exemplar filters. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(12), 7357–7366.
Lin, S., Ji, R., Yan, C., et al. (2019). Towards optimal structured CNN pruning via generative adversarial learning. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2785–2794).
Liu, J., Zhuang, B., Zhuang, Z., et al. (2021a). Discrimination-aware network pruning for deep model compression. In IEEE transactions on pattern analysis and machine intelligence (TPAMI) (pp. 1–1).
Liu, L., Ouyang, W., Wang, X., et al. (2020). Deep learning for generic object detection: A survey. International journal of computer vision (IJCV), 128(2), 261–318.
Liu, P., Yuan, W., Fu, J., et al. (2021b). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586
Liu, Z., Mu, H., Zhang, X., et al. (2019). Metapruning: Meta learning for automatic neural network channel pruning. In IEEE international conference on computer vision (ICCV) (pp. 3296–3305).
Liu, Z., Luo, W., Wu, B., et al. (2020). Bi-real net: Binarizing deep network towards real-network performance. International Journal of Computer Vision (IJCV), 128(1), 202–219.
Lohscheller, H. (1984). A subjectively adapted image communication system. IEEE Transactions on Communications (TCOM), 32(12), 1316–1322.
Long, S., He, X., & Yao, C. (2021). Scene text detection and recognition: The deep learning era. International Journal of Computer Vision (IJCV), 129(1), 161–184.
Luo, J.H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In: IEEE international conference on computer vision (ICCV) (pp. 5068–5076).
Minaee, S., Boykov, Y., Porikli, F., et al. (2022). Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7), 3523–3542.
Molchanov, P., Tyree, S., Karras, T., et al. (2017). Pruning convolutional neural networks for resource efficient inference. In International conference of learning representation (ICLR).
Nguyen, A., Yosinski, J., Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE conference on computer vision and pattern recognition(CVPR) (pp. 427–436).
Ning, X., Zhao, T., Li, W., et al. (2020). Dsa: More efficient budgeted pruning via differentiable sparsity allocation. In European conference on computer vision (ECCV) (pp. 592–607). Springer.
Nirenberg, S., Carcieri, S. M., Jacobs, A. L., et al. (2001). Retinal ganglion cells act largely as independent encoders. Nature, 411(6838), 698–701.
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.
Otter, D. W., Medina, J. R., & Kalita, J. K. (2021). A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 32(2), 604–624.
Paszke, A., Gross, S., Chintala, S., et al. (2017). Automatic differentiation in pytorch. In Neural information processing systems (NeurIPS).
Reich, D. S., Mechler, F., & Victor, J. D. (2001). Independent and redundant information in nearby cortical neurons. Science, 294(5551), 2566–2568.
Russakovsky, O., Deng, J., Su, H., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.
Sandler, M., Howard, A., Zhu, M., et al. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE conference on computer vision and pattern recognition (CVPR).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy, C., Zaremba, W., Sutskever, I., et al. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Tao, L., & Gao, W. (2021). Efficient channel pruning based on architecture alignment and probability model bypassing. In 2021 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 3232–3237).
Tao, L., Gao, W., Li, G., et al. (2023). Adanic: Towards practical neural image compression via dynamic transform routing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16,879–16,888).
Wang, Y., Zhang, X., Xie, L., et al. (2020). Pruning from scratch. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 12,273–12,280).
Wang, Z., Li, C., & Wang, X. (2021). Convolutional neural network pruning with structural redundancy reduction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 14,908–14,917).
Wu, Y., Qi, Z., Zheng, H., et al. (2021). Deep image compression with latent optimization and piece-wise quantization approximation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1926–1930).
Yao, K., Cao, F., Leung, Y., et al. (2021). Deep neural network compression through interpretability-based filter pruning. Pattern Recognition (PR), 119(108), 056.
Zhang, N., Pan, Z., Li, T.H., et al. (2023). Improving graph representation for point cloud segmentation via attentive filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1244–1254).
Zhang, Q., Wang, X., Wu, Y. N., et al. (2021). Interpretable CNNS for object classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3416–3431.
Zhang, R., Gao, W., Li, G., et al. (2022). Qinet: Decision surface learning and adversarial enhancement for quasi-immune completion of diverse corrupted point clouds. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–14.
Zhang, X. Y., Liu, C. L., & Suen, C. Y. (2020). Towards robust pattern recognition: A review. Proceedings of the IEEE, 108(6), 894–922.
Zhang, Y., Tiňo, P., Leonardis, A., et al. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence (TETC), 5(5), 726–742.
Zhang, Y., Lin, M., Lin, C. W., et al. (2022). Carrying out CNN channel pruning in a white box. IEEE Transactions on Neural Networks and Learning Systems (TNNLS). https://doi.org/10.1109/TNNLS.2022.3147269
Zhou, B., Khosla, A., Lapedriza, A., et al. (2015). Object detectors emerge in deep scene CNNS. In International conference on learning representations (ICLR).
Zhou, B., Bau, D., Oliva, A., et al. (2019). Interpreting deep visual representations via network dissection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(9), 2131–2145.
Acknowledgements
This work was supported by Natural Science Foundation of China (62271013, 62031013), Shenzhen Fundamental Research Program (GXWD20201231165807007-20200806163656003), and Shenzhen Science and Technology Plan Basic Research Project (JCYJ20230807120808017).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Arun Mallya.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of the Phenomenon Mentioned in Sect. 3.4
In order to prove the phenomenon that filters in the same layer have the same and deterministic influence on total parameters and FLOPs of the network, we assume that weight parameters in the t-th layer can be described as a four-dimensional (4D) matrix \(\textbf{W}_{t}\in \mathbb {R}^{n_{t} \times n_{t-1} \times k_{t} \times k_{t}}\). \(k_t\) represents the kernel size, \(n_{t-1}\) and \(n_{t}\) are the number of input and output channels, respectively.
As for the total parameters, just as vividly shown in Fig. 3, if a filter in the t-th layer is pruned, we have:
where \(n_{t-1} \cdot k_{t} \cdot k_{t}\) means parameters of the pruned filter in the t-th layer and \(n_{t+1} \cdot k_{t+1} \cdot k_{t+1}\) means the reduced parameters due to that all filters in the \((t+1)\)-th layer are pruned by one dimension. Obviously, \(\Delta P_{total}\) is a deterministic value.
Then, according to (Molchanov et al., 2017), FLOPs in the t-th layer can be described as:
where \(H_{t}\) and \(W_{t}\) are height and width of the output feature maps.
If a filter in the t-th layer is pruned, the number of input channels in the \((t+1)\)-th layer will be reduced by one. Moreover, all filters in the \((t+1)\)-th layer will also be reduced by one dimension. Therefore, the change in FLOPs of the t-th and the \((t+1)\)-th layer are as follows:
For the t-th layer:
For the \((t+1)\)-th layer:
Hence, the total FLOPs will be changed by:
where the value of \(\Delta F_{total}\) is also deterministic. Therefore, the total FLOPs will be changed by a deterministic value when a filter in the t-th layer is pruned.
In summary, since \(\Delta P_{total}\) and \(\Delta F_{total}\) are deterministic, filters in the same layer have the same and deterministic influence on total parameters and FLOPs of the network.
Appendix B: Proof of the Rationality of Eq. (13)
\({\varvec{\Phi }}_t^i \in (0,1)\) in Eq. (11) represents the probability that there are i remaining filters in the t-th layer.
We know:
Therefore, we have:
Thus:
![](http://media.springernature.com/lw154/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Equ34_HTML.png)
Since \(\textbf{R}_t \in \mathbb {R}^{n_t}\) and at least one filter must be reserved in each layer (\(\Vert \textbf{R}_t\Vert _0\ge 1\)), we have:
Therefore, the value ranges of and \(\Vert \textbf{R}_t\Vert _0\) are similar (
is a relaxation of \(\Vert \textbf{R}_t\Vert _0\)). Thus, we make such a reasonable definition in Eq. (13) in the full paper as:
![](http://media.springernature.com/lw353/springer-static/image/art%3A10.1007%2Fs11263-023-01972-x/MediaObjects/11263_2023_1972_Equ36_HTML.png)
Appendix C: Proof of the Differentiability of the Loss Function in Eq. (16)
Since the trainable parameters are \(\textbf{W}_t\) and \({\varvec{\Theta }}_t\) in each layer, we need to prove the differentiability of each term of the loss function \(\mathcal {L}\) about the involved parameters \(\textbf{W}_t\) and \({\varvec{\Theta }}_t\). Firstly, \({\mathcal {\widetilde{L}}_{FLOPs}}\) and \(\mathcal {\widetilde{L}}_{Params}\) are obviously differentiable for \({\varvec{\Theta }}_t\). Meanwhile, \(\mathcal {L}_{task}\) is naturally differentiable for \(\textbf{W}_t\). But for \(\mathcal {L}_{task}\) term, it should not only find optimal weight \(\textbf{W}_t\), but also update the \({\varvec{\Theta }}_t\). Therefore, our key problem is how to guarantee the differentiability of \(\mathcal {L}_{task}\) with respect to \({\varvec{\Theta }}_t\).
If \(\mathcal {L}_{task}\) is cross-entropy loss for classification model of C classes, we have:
where \(y_c\) is the ground truth and \(\hat{y_c}\) is the prediction.
We design a mixture of all the possible pruning masks weighted by \({\varvec{\Phi }}_t\) in Eq. (18) as:
When \({\varvec{\Gamma }}_t\) is calculated, \(determine\left( i,{\varvec{\Gamma }}_t\right) \in \mathbb {R}^{n_t}\) will be certain. For example, if \(max\left( {\varvec{\Gamma }}_t\right) ={\varvec{\Gamma }}_t^2\), \(determine\left( 1,{\varvec{\Gamma }}_t\right) =(0,1,0,\cdots ,0)\) (Only the second element is 1, and the other elements are 0). Then \({\varvec{\mathcal {R}}}_t \in \mathbb {R}^{n_t}\) is only related to \({\varvec{\Theta }}_t\) since all the \(determine\left( i,{\varvec{\Gamma }}_t\right) , 1\le i\le n_t\) are certain. We consider \({\varvec{\mathcal {R}}}_t\) as the approximation of \(\textbf{R}_t\).
We make the original feature maps \(\textbf{O}_t\) (\(\textbf{O}_t^i\) is the feature map generated by the i-th filter) in the t-th layer be re-weighted by \({\varvec{\mathcal {R}}}_t\) as shown in Eq. (19), where \({\varvec{\mathcal {R}}}_t\) will execute a dot product with each feature map in \(\textbf{O}_t\), and \({\varvec{\mathcal {O}}}_t\) is the differentiable approximation of the pruned feature maps of the t-th layer.
We know \(\hat{y_c}\) is originally generated by feature maps \(\textbf{O}_t\) layer by layer. We now make \(\hat{y_c}\) be generated by \({\varvec{\mathcal {O}}}_t\) to store the differentiability about \({\varvec{\Theta }}_t\) layer by layer. In this way, the differentiability of \(\mathcal {L}_{task}\) with respect to \({\varvec{\Theta }}_t\) can be guaranteed. In summary, this compeletes the proof of the differentiability of each term in our loss function about the involved parameters.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, Y., Gao, W. & Li, G. Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints. Int J Comput Vis 132, 2060–2076 (2024). https://doi.org/10.1007/s11263-023-01972-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01972-x