Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints

Guo, Yang; Gao, Wei; Li, Ge

doi:10.1007/s11263-023-01972-x

Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints

Published: 06 January 2024

Volume 132, pages 2060–2076, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

649 Accesses
1 Altmetric
Explore all metrics

Abstract

Existing methods for filter pruning mostly rely on specific data-driven paradigms but lack the interpretability. Besides, these approaches usually assign layer-wise compression ratios automatically only under given FLOPs by neural architecture search algorithms or just manually, which are short of efficiency. In this paper, we propose a novel interpretable task-inspired adaptive filter pruning method for neural networks to solve the above problems. First, we treat filters as semantic detectors and develop the task-inspired importance criteria by evaluating correlations between input tasks and feature maps, and observing the information flow through filters between adjacent layers. Second, we refer to the human neurobiological mechanism for the better interpretability, where the retained first layer filters act as individual information receivers. Third, inspired by the phenomenon that each filter has a deterministic impact on FLOPs and network parameters, we provide an efficient adaptive compression ratio allocation strategy based on differentiable pruning approximation under multiple budget constraints, as well as considering the performance objective. The proposed method is validated with extensive experiments on the state-of-the-art neural networks, which significantly outperforms all the existing filter pruning methods and achieves the best trade-off between neural network compression and task performance. With ResNet-50 on ImageNet, our approach reduces 75.49% parameters and 70.90% FLOPs, only suffering from 2.31% performance degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Network Pruning via Explicit Information Migration

Preserving the Essential Features in CNNs: Pruning and Analysis

Effective Layer Pruning Through Similarity Metric Perspective

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability Statement

The datasets generated during and/or analysed during the current study are available in CIFAR-10 at https://www.cs.toronto.edu/~kriz/cifar.html and ImageNet (ILSVRC2012) at https://www.image-net.org/challenges/LSVRC/index.php.

References

Bau, D., Zhou, B., Khosla, A., et al. (2017). Network dissection: Quantifying interpretability of deep visual representations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3319–3327).
Bau, D., Zhu, J. Y., Strobelt, H., et al. (2020). Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences (PNAS), 117(48), 30071–30078.
Article Google Scholar
Chan, L., Hosseini, M. S., & Plataniotis, K. N. (2021). A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. International Journal of Computer Vision (IJCV), 129(2), 361–384.
Article Google Scholar
Chen, H., Zhuo, L., Zhang, B., et al. (2021). Binarized neural architecture search for efficient object recognition. International Journal of Computer Vision (IJCV), 129(2), 501–516.
Article Google Scholar
Chin, T. W., Ding, R., Zhang, C., et al. (2020). Towards efficient model compression via learned global ranking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1518–1528).
Crick, F., & Koch, C. (1995). Are we aware of neural activity in primary visual cortex? Nature, 375(6527), 121–123.
Article Google Scholar
Ding, X., Ding, G., Guo, Y., et al. (2019). Centripetal SGD for pruning very deep convolutional networks with complicated structure. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4943–4953).
Ding, X., Hao, T., Tan, J., et al. (2021). Resrep: Lossless CNN pruning via decoupling remembering and forgetting. In IEEE international conference on computer vision (ICCV) (pp. 4510–4520).
Dong, X., & Yang, Y. (2019). Network pruning via transformable architecture search. In Neural information processing systems (NeurIPS).
Dong, Y., Ni, R., Li, J., et al. (2019). Stochastic quantization for learning accurate low-bit deep neural networks. International Journal of Computer Vision (IJCV), 127(11), 1629–1642.
Article Google Scholar
Fan, S., Gao, W., & Li, G. (2022). Salient object detection for point clouds. In European conference on computer vision (pp. 1–19). Springer.
Fu, C., Li, G., Song, R., et al. (2022). Octattention: Octree-based large-scale contexts model for point cloud compression. In Proceedings of the AAAI conference on artificial intelligence (pp. 625–633).
Gao, W., Tao, L., Zhou, L., et al. (2020). Low-rate image compression with super-resolution learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 154–155).
Gao, W., Liao, G., Ma, S., et al. (2021). Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(4), 2091–2106.
Article Google Scholar
Gao, W., Zhou, L., & Tao, L. (2021). A fast view synthesis implementation method for light field applications. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 17(4), 1–20.
Article Google Scholar
Gao, W., Guo, Y., Ma, S., et al. (2022). Efficient neural network compression inspired by compressive sensing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3186008
Article Google Scholar
Gao, W., Ye, H., Li, G., et al. (2022b). Openpointcloud: An open-source algorithm library of deep learning based point cloud compression. In Proceedings of the 30th ACM international conference on multimedia (pp. 7347–7350).
Gao, W., Fan, S., Li, G., et al. (2023). A thorough benchmark and a new model for light field saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8003–8019. https://doi.org/10.1109/TPAMI.2023.3235415
Article Google Scholar
Geng, C., Huang, S. J., & Chen, S. (2021). Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(10), 3614–3631.
Article Google Scholar
Gross, C. G. (2002). Genealogy of the “grandmother cell.” The Neuroscientist, 8(5), 512–518.
Guo, S., Wang, Y., Li, Q., et al. (2020). Dmcp: Differentiable markov channel pruning for neural networks. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1536–1544).
Guo, Y., & Gao, W. (2022). Semantic-driven automatic filter pruning for neural networks. In 2022 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition(CVPR) (pp. 770–778).
He, Y., Lin, J., Liu, Z., et al. (2018). Amc: Automl for model compression and acceleration on mobile devices. In: European conference on computer vision (ECCV) (pp. 784–800).
He, Y., Ding, Y., Liu, P., et al. (2020). Learning filter pruning criteria for deep convolutional neural networks acceleration. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2006–2015).
Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In European conference on computer vision (ECCV) (pp. 304–320).
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Citeseer.
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Neural information processing systems (NeurIPS).
Li, B., Wu, B., Su, J., et al. (2020). Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European conference on computer vision (ECCV) (pp. 639–654). Springer.
Li, H., Kadav, A., Durdanovic, I., et al. (2017). Pruning filters for efficient convnets. In International conference on learning representations (ICLR).
Li, Y., Lin, S., Zhang, B., et al. (2019). Exploiting kernel sparsity and entropy for interpretable CNN compression. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2795–2804).
Lin, M., Ji, R., Wang, Y., et al. (2020a). Hrank: Filter pruning using high-rank feature map. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1526–1535).
Lin, M., Ji, R., Zhang, Y., et al. (2020b). Channel pruning via automatic structure search. In International joint conference on artificial intelligence(IJCAI).
Lin, M., Cao, L., Li, S., et al. (2022). Filter sketch for network pruning. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(12), 7091–7100.
Article Google Scholar
Lin, M., Cao, L., Zhang, Y., et al. (2022b). Pruning networks with cross-layer ranking & k-reciprocal nearest filters. In IEEE Transactions on neural networks and learning systems (TNNLS) (pp. 1–10).
Lin, M., Ji, R., Li, S., et al. (2022). Network pruning using adaptive exemplar filters. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(12), 7357–7366.
Article Google Scholar
Lin, S., Ji, R., Yan, C., et al. (2019). Towards optimal structured CNN pruning via generative adversarial learning. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2785–2794).
Liu, J., Zhuang, B., Zhuang, Z., et al. (2021a). Discrimination-aware network pruning for deep model compression. In IEEE transactions on pattern analysis and machine intelligence (TPAMI) (pp. 1–1).
Liu, L., Ouyang, W., Wang, X., et al. (2020). Deep learning for generic object detection: A survey. International journal of computer vision (IJCV), 128(2), 261–318.
Article Google Scholar
Liu, P., Yuan, W., Fu, J., et al. (2021b). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586
Liu, Z., Mu, H., Zhang, X., et al. (2019). Metapruning: Meta learning for automatic neural network channel pruning. In IEEE international conference on computer vision (ICCV) (pp. 3296–3305).
Liu, Z., Luo, W., Wu, B., et al. (2020). Bi-real net: Binarizing deep network towards real-network performance. International Journal of Computer Vision (IJCV), 128(1), 202–219.
Article Google Scholar
Lohscheller, H. (1984). A subjectively adapted image communication system. IEEE Transactions on Communications (TCOM), 32(12), 1316–1322.
Article Google Scholar
Long, S., He, X., & Yao, C. (2021). Scene text detection and recognition: The deep learning era. International Journal of Computer Vision (IJCV), 129(1), 161–184.
Article Google Scholar
Luo, J.H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In: IEEE international conference on computer vision (ICCV) (pp. 5068–5076).
Minaee, S., Boykov, Y., Porikli, F., et al. (2022). Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7), 3523–3542.
Google Scholar
Molchanov, P., Tyree, S., Karras, T., et al. (2017). Pruning convolutional neural networks for resource efficient inference. In International conference of learning representation (ICLR).
Nguyen, A., Yosinski, J., Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE conference on computer vision and pattern recognition(CVPR) (pp. 427–436).
Ning, X., Zhao, T., Li, W., et al. (2020). Dsa: More efficient budgeted pruning via differentiable sparsity allocation. In European conference on computer vision (ECCV) (pp. 592–607). Springer.
Nirenberg, S., Carcieri, S. M., Jacobs, A. L., et al. (2001). Retinal ganglion cells act largely as independent encoders. Nature, 411(6838), 698–701.
Article Google Scholar
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.
Article Google Scholar
Otter, D. W., Medina, J. R., & Kalita, J. K. (2021). A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 32(2), 604–624.
Article MathSciNet Google Scholar
Paszke, A., Gross, S., Chintala, S., et al. (2017). Automatic differentiation in pytorch. In Neural information processing systems (NeurIPS).
Reich, D. S., Mechler, F., & Victor, J. D. (2001). Independent and redundant information in nearby cortical neurons. Science, 294(5551), 2566–2568.
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.
Article MathSciNet Google Scholar
Sandler, M., Howard, A., Zhu, M., et al. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE conference on computer vision and pattern recognition (CVPR).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy, C., Zaremba, W., Sutskever, I., et al. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Tao, L., & Gao, W. (2021). Efficient channel pruning based on architecture alignment and probability model bypassing. In 2021 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 3232–3237).
Tao, L., Gao, W., Li, G., et al. (2023). Adanic: Towards practical neural image compression via dynamic transform routing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16,879–16,888).
Wang, Y., Zhang, X., Xie, L., et al. (2020). Pruning from scratch. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 12,273–12,280).
Wang, Z., Li, C., & Wang, X. (2021). Convolutional neural network pruning with structural redundancy reduction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 14,908–14,917).
Wu, Y., Qi, Z., Zheng, H., et al. (2021). Deep image compression with latent optimization and piece-wise quantization approximation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1926–1930).
Yao, K., Cao, F., Leung, Y., et al. (2021). Deep neural network compression through interpretability-based filter pruning. Pattern Recognition (PR), 119(108), 056.
Google Scholar
Zhang, N., Pan, Z., Li, T.H., et al. (2023). Improving graph representation for point cloud segmentation via attentive filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1244–1254).
Zhang, Q., Wang, X., Wu, Y. N., et al. (2021). Interpretable CNNS for object classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3416–3431.
Article Google Scholar
Zhang, R., Gao, W., Li, G., et al. (2022). Qinet: Decision surface learning and adversarial enhancement for quasi-immune completion of diverse corrupted point clouds. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–14.
Google Scholar
Zhang, X. Y., Liu, C. L., & Suen, C. Y. (2020). Towards robust pattern recognition: A review. Proceedings of the IEEE, 108(6), 894–922.
Article Google Scholar
Zhang, Y., Tiňo, P., Leonardis, A., et al. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence (TETC), 5(5), 726–742.
Article Google Scholar
Zhang, Y., Lin, M., Lin, C. W., et al. (2022). Carrying out CNN channel pruning in a white box. IEEE Transactions on Neural Networks and Learning Systems (TNNLS). https://doi.org/10.1109/TNNLS.2022.3147269
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., et al. (2015). Object detectors emerge in deep scene CNNS. In International conference on learning representations (ICLR).
Zhou, B., Bau, D., Oliva, A., et al. (2019). Interpreting deep visual representations via network dissection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(9), 2131–2145.
Article Google Scholar

Download references

Acknowledgements

This work was supported by Natural Science Foundation of China (62271013, 62031013), Shenzhen Fundamental Research Program (GXWD20201231165807007-20200806163656003), and Shenzhen Science and Technology Plan Basic Research Project (JCYJ20230807120808017).

Author information

Authors and Affiliations

School of Electronic and Computer Engineering, Peking University, Shenzhen, China
Yang Guo, Wei Gao & Ge Li

Authors

Yang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ge Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Gao.

Additional information

Communicated by Arun Mallya.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of the Phenomenon Mentioned in Sect. 3.4

In order to prove the phenomenon that filters in the same layer have the same and deterministic influence on total parameters and FLOPs of the network, we assume that weight parameters in the t-th layer can be described as a four-dimensional (4D) matrix $\textbf{W}_{t}\in \mathbb {R}^{n_{t} \times n_{t-1} \times k_{t} \times k_{t}}$. $k_t$ represents the kernel size, $n_{t-1}$ and $n_{t}$ are the number of input and output channels, respectively.

As for the total parameters, just as vividly shown in Fig. 3, if a filter in the t-th layer is pruned, we have:

$$\begin{aligned} \Delta P_{total} = n_{t-1} \cdot k_{t} \cdot k_{t} + n_{t+1} \cdot k_{t+1} \cdot k_{t+1}, \end{aligned}$$

(A1)

where $n_{t-1} \cdot k_{t} \cdot k_{t}$ means parameters of the pruned filter in the t-th layer and $n_{t+1} \cdot k_{t+1} \cdot k_{t+1}$ means the reduced parameters due to that all filters in the $(t+1)$-th layer are pruned by one dimension. Obviously, $\Delta P_{total}$ is a deterministic value.

Then, according to (Molchanov et al., 2017), FLOPs in the t-th layer can be described as:

$$\begin{aligned} FLOPs(t) = 2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \cdot n_t, \end{aligned}$$

(A2)

where $H_{t}$ and $W_{t}$ are height and width of the output feature maps.

If a filter in the t-th layer is pruned, the number of input channels in the $(t+1)$-th layer will be reduced by one. Moreover, all filters in the $(t+1)$-th layer will also be reduced by one dimension. Therefore, the change in FLOPs of the t-th and the $(t+1)$-th layer are as follows:

For the t-th layer:

$$\begin{aligned} FLOPs(t)_{before}{} & {} = 2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \cdot n_t, \nonumber \\ \end{aligned}$$

(A3)

$$\begin{aligned} FLOPs(t)_{now}{} & {} = 2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \cdot (n_t-1), \nonumber \\ \end{aligned}$$

(A4)

$$\begin{aligned} \Delta FLOPs(t){} & {} = FLOPs(t)_{before}-FLOPs(t)_{now}\nonumber \\{} & {} =2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) . \end{aligned}$$

(A5)

For the $(t+1)$-th layer:

$$\begin{aligned}{} & {} FLOPs(t+1)_{before} =\nonumber \\{} & {} 2H_{t+1}\cdot W_{t+1}\cdot \left( n_{t}\cdot k_{t+1}^2+1\right) \cdot n_{t+1},\end{aligned}$$

(A6)

$$\begin{aligned}{} & {} FLOPs(t+1)_{now} = \nonumber \\{} & {} 2H_{t+1}\cdot W_{t+1}\cdot \left[ \left( n_{t}-1\right) \cdot k_{t+1}^2+1\right] \cdot n_{t+1},\end{aligned}$$

(A7)

$$\begin{aligned}{} & {} \Delta FLOPs(t+1) = FLOPs(t+1)_{before}\nonumber \\{} & {} -FLOPs(t+1)_{now}\nonumber \\{} & {} =2H_{t+1}\cdot W_{t+1}\cdot k_{t+1}^2\cdot n_{t+1}. \end{aligned}$$

(A8)

Hence, the total FLOPs will be changed by:

$$\begin{aligned} \begin{aligned} \Delta F_{total}&=\Delta FLOPs(t)+\Delta FLOPs(t+1) \\&=2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \\&+2H_{t+1}\cdot W_{t+1}\cdot k_{t+1}^2\cdot n_{t+1}, \end{aligned} \end{aligned}$$

(A9)

where the value of $\Delta F_{total}$ is also deterministic. Therefore, the total FLOPs will be changed by a deterministic value when a filter in the t-th layer is pruned.

In summary, since $\Delta P_{total}$ and $\Delta F_{total}$ are deterministic, filters in the same layer have the same and deterministic influence on total parameters and FLOPs of the network.

Appendix B: Proof of the Rationality of Eq. (13)

${\varvec{\Phi }}_t^i \in (0,1)$ in Eq. (11) represents the probability that there are i remaining filters in the t-th layer.

We know:

$$\begin{aligned} \sum _{i=1}^{n_t}{\varvec{\Phi }}_t^i = 1. \end{aligned}$$

(B10)

Therefore, we have:

$$\begin{aligned}{} & {} 1\cdot \sum _{i=1}^{n_t}{\varvec{\Phi }}_t^i<\sum _{i=1}^{n_t}i\cdot {\varvec{\Phi }}_t^i<n_t\cdot \sum _{i=1}^{n_t}{\varvec{\Phi }}_t^i, \end{aligned}$$

(B11)

$$\begin{aligned}{} & {} 1<\sum _{i=1}^{n_t}i\cdot {\varvec{\Phi }}_t^i<n_t. \end{aligned}$$

(B12)

Thus:

(B13)

Since $\textbf{R}_t \in \mathbb {R}^{n_t}$ and at least one filter must be reserved in each layer ($\Vert \textbf{R}_t\Vert _0\ge 1$), we have:

$$\begin{aligned} 1\le \Vert \textbf{R}_t\Vert _0\le n_t. \end{aligned}$$

(B14)

Therefore, the value ranges of and $\Vert \textbf{R}_t\Vert _0$ are similar ( is a relaxation of $\Vert \textbf{R}_t\Vert _0$). Thus, we make such a reasonable definition in Eq. (13) in the full paper as:

(B15)

Appendix C: Proof of the Differentiability of the Loss Function in Eq. (16)

Since the trainable parameters are $\textbf{W}_t$ and ${\varvec{\Theta }}_t$ in each layer, we need to prove the differentiability of each term of the loss function $\mathcal {L}$ about the involved parameters $\textbf{W}_t$ and ${\varvec{\Theta }}_t$. Firstly, ${\mathcal {\widetilde{L}}_{FLOPs}}$ and $\mathcal {\widetilde{L}}_{Params}$ are obviously differentiable for ${\varvec{\Theta }}_t$. Meanwhile, $\mathcal {L}_{task}$ is naturally differentiable for $\textbf{W}_t$. But for $\mathcal {L}_{task}$ term, it should not only find optimal weight $\textbf{W}_t$, but also update the ${\varvec{\Theta }}_t$. Therefore, our key problem is how to guarantee the differentiability of $\mathcal {L}_{task}$ with respect to ${\varvec{\Theta }}_t$.

If $\mathcal {L}_{task}$ is cross-entropy loss for classification model of C classes, we have:

$$\begin{aligned} \mathcal {L}_{task} = -\sum _{c=1}^C y_c\cdot log (\hat{y_c}), \end{aligned}$$

(C16)

where $y_c$ is the ground truth and $\hat{y_c}$ is the prediction.

We design a mixture of all the possible pruning masks weighted by ${\varvec{\Phi }}_t$ in Eq. (18) as:

$$\begin{aligned} \begin{aligned} {\varvec{\mathcal {R}}}_t&= \sum _{i=1}^{n_t} {\varvec{\Phi }}_t^i\;determine\left( i,{\varvec{\Gamma }}_t\right) \\&=\sum _{i=1}^{n_t} \frac{e^{{\varvec{\Theta }}_t^i}}{\sum _{k=1}^{n_t}e^{{\varvec{\Theta }}_t^k}}\;determine\left( i,{\varvec{\Gamma }}_t\right) . \end{aligned} \end{aligned}$$

(C17)

When ${\varvec{\Gamma }}_t$ is calculated, $determine\left( i,{\varvec{\Gamma }}_t\right) \in \mathbb {R}^{n_t}$ will be certain. For example, if $max\left( {\varvec{\Gamma }}_t\right) ={\varvec{\Gamma }}_t^2$, $determine\left( 1,{\varvec{\Gamma }}_t\right) =(0,1,0,\cdots ,0)$ (Only the second element is 1, and the other elements are 0). Then ${\varvec{\mathcal {R}}}_t \in \mathbb {R}^{n_t}$ is only related to ${\varvec{\Theta }}_t$ since all the $determine\left( i,{\varvec{\Gamma }}_t\right) , 1\le i\le n_t$ are certain. We consider ${\varvec{\mathcal {R}}}_t$ as the approximation of $\textbf{R}_t$.

We make the original feature maps $\textbf{O}_t$ ($\textbf{O}_t^i$ is the feature map generated by the i-th filter) in the t-th layer be re-weighted by ${\varvec{\mathcal {R}}}_t$ as shown in Eq. (19), where ${\varvec{\mathcal {R}}}_t$ will execute a dot product with each feature map in $\textbf{O}_t$, and ${\varvec{\mathcal {O}}}_t$ is the differentiable approximation of the pruned feature maps of the t-th layer.

We know $\hat{y_c}$ is originally generated by feature maps $\textbf{O}_t$ layer by layer. We now make $\hat{y_c}$ be generated by ${\varvec{\mathcal {O}}}_t$ to store the differentiability about ${\varvec{\Theta }}_t$ layer by layer. In this way, the differentiability of $\mathcal {L}_{task}$ with respect to ${\varvec{\Theta }}_t$ can be guaranteed. In summary, this compeletes the proof of the differentiability of each term in our loss function about the involved parameters.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, Y., Gao, W. & Li, G. Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints. Int J Comput Vis 132, 2060–2076 (2024). https://doi.org/10.1007/s11263-023-01972-x

Download citation

Received: 27 August 2022
Accepted: 22 November 2023
Published: 06 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11263-023-01972-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Network Pruning via Explicit Information Migration

Preserving the Essential Features in CNNs: Pruning and Analysis

Effective Layer Pruning Through Similarity Metric Perspective

Data Availability Statement

References

Acknowledgements