Skip to main content

Advertisement

Log in

Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Existing methods for filter pruning mostly rely on specific data-driven paradigms but lack the interpretability. Besides, these approaches usually assign layer-wise compression ratios automatically only under given FLOPs by neural architecture search algorithms or just manually, which are short of efficiency. In this paper, we propose a novel interpretable task-inspired adaptive filter pruning method for neural networks to solve the above problems. First, we treat filters as semantic detectors and develop the task-inspired importance criteria by evaluating correlations between input tasks and feature maps, and observing the information flow through filters between adjacent layers. Second, we refer to the human neurobiological mechanism for the better interpretability, where the retained first layer filters act as individual information receivers. Third, inspired by the phenomenon that each filter has a deterministic impact on FLOPs and network parameters, we provide an efficient adaptive compression ratio allocation strategy based on differentiable pruning approximation under multiple budget constraints, as well as considering the performance objective. The proposed method is validated with extensive experiments on the state-of-the-art neural networks, which significantly outperforms all the existing filter pruning methods and achieves the best trade-off between neural network compression and task performance. With ResNet-50 on ImageNet, our approach reduces 75.49% parameters and 70.90% FLOPs, only suffering from 2.31% performance degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability Statement

The datasets generated during and/or analysed during the current study are available in CIFAR-10 at https://www.cs.toronto.edu/~kriz/cifar.html and ImageNet (ILSVRC2012) at https://www.image-net.org/challenges/LSVRC/index.php.

References

  • Bau, D., Zhou, B., Khosla, A., et al. (2017). Network dissection: Quantifying interpretability of deep visual representations. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3319–3327).

  • Bau, D., Zhu, J. Y., Strobelt, H., et al. (2020). Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences (PNAS), 117(48), 30071–30078.

    Article  Google Scholar 

  • Chan, L., Hosseini, M. S., & Plataniotis, K. N. (2021). A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. International Journal of Computer Vision (IJCV), 129(2), 361–384.

    Article  Google Scholar 

  • Chen, H., Zhuo, L., Zhang, B., et al. (2021). Binarized neural architecture search for efficient object recognition. International Journal of Computer Vision (IJCV), 129(2), 501–516.

    Article  Google Scholar 

  • Chin, T. W., Ding, R., Zhang, C., et al. (2020). Towards efficient model compression via learned global ranking. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1518–1528).

  • Crick, F., & Koch, C. (1995). Are we aware of neural activity in primary visual cortex? Nature, 375(6527), 121–123.

    Article  Google Scholar 

  • Ding, X., Ding, G., Guo, Y., et al. (2019). Centripetal SGD for pruning very deep convolutional networks with complicated structure. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4943–4953).

  • Ding, X., Hao, T., Tan, J., et al. (2021). Resrep: Lossless CNN pruning via decoupling remembering and forgetting. In IEEE international conference on computer vision (ICCV) (pp. 4510–4520).

  • Dong, X., & Yang, Y. (2019). Network pruning via transformable architecture search. In Neural information processing systems (NeurIPS).

  • Dong, Y., Ni, R., Li, J., et al. (2019). Stochastic quantization for learning accurate low-bit deep neural networks. International Journal of Computer Vision (IJCV), 127(11), 1629–1642.

    Article  Google Scholar 

  • Fan, S., Gao, W., & Li, G. (2022). Salient object detection for point clouds. In European conference on computer vision (pp. 1–19). Springer.

  • Fu, C., Li, G., Song, R., et al. (2022). Octattention: Octree-based large-scale contexts model for point cloud compression. In Proceedings of the AAAI conference on artificial intelligence (pp. 625–633).

  • Gao, W., Tao, L., Zhou, L., et al. (2020). Low-rate image compression with super-resolution learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 154–155).

  • Gao, W., Liao, G., Ma, S., et al. (2021). Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(4), 2091–2106.

    Article  Google Scholar 

  • Gao, W., Zhou, L., & Tao, L. (2021). A fast view synthesis implementation method for light field applications. ACM Transactions on Multimedia Computing Communications and Applications (TOMM), 17(4), 1–20.

    Article  Google Scholar 

  • Gao, W., Guo, Y., Ma, S., et al. (2022). Efficient neural network compression inspired by compressive sensing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3186008

    Article  Google Scholar 

  • Gao, W., Ye, H., Li, G., et al. (2022b). Openpointcloud: An open-source algorithm library of deep learning based point cloud compression. In Proceedings of the 30th ACM international conference on multimedia (pp. 7347–7350).

  • Gao, W., Fan, S., Li, G., et al. (2023). A thorough benchmark and a new model for light field saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8003–8019. https://doi.org/10.1109/TPAMI.2023.3235415

    Article  Google Scholar 

  • Geng, C., Huang, S. J., & Chen, S. (2021). Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(10), 3614–3631.

    Article  Google Scholar 

  • Gross, C. G. (2002). Genealogy of the “grandmother cell.” The Neuroscientist, 8(5), 512–518.

  • Guo, S., Wang, Y., Li, Q., et al. (2020). Dmcp: Differentiable markov channel pruning for neural networks. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1536–1544).

  • Guo, Y., & Gao, W. (2022). Semantic-driven automatic filter pruning for neural networks. In 2022 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

  • He, K., Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition(CVPR) (pp. 770–778).

  • He, Y., Lin, J., Liu, Z., et al. (2018). Amc: Automl for model compression and acceleration on mobile devices. In: European conference on computer vision (ECCV) (pp. 784–800).

  • He, Y., Ding, Y., Liu, P., et al. (2020). Learning filter pruning criteria for deep convolutional neural networks acceleration. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2006–2015).

  • Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In European conference on computer vision (ECCV) (pp. 304–320).

  • Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Citeseer.

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Neural information processing systems (NeurIPS).

  • Li, B., Wu, B., Su, J., et al. (2020). Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European conference on computer vision (ECCV) (pp. 639–654). Springer.

  • Li, H., Kadav, A., Durdanovic, I., et al. (2017). Pruning filters for efficient convnets. In International conference on learning representations (ICLR).

  • Li, Y., Lin, S., Zhang, B., et al. (2019). Exploiting kernel sparsity and entropy for interpretable CNN compression. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2795–2804).

  • Lin, M., Ji, R., Wang, Y., et al. (2020a). Hrank: Filter pruning using high-rank feature map. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1526–1535).

  • Lin, M., Ji, R., Zhang, Y., et al. (2020b). Channel pruning via automatic structure search. In International joint conference on artificial intelligence(IJCAI).

  • Lin, M., Cao, L., Li, S., et al. (2022). Filter sketch for network pruning. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(12), 7091–7100.

    Article  Google Scholar 

  • Lin, M., Cao, L., Zhang, Y., et al. (2022b). Pruning networks with cross-layer ranking & k-reciprocal nearest filters. In IEEE Transactions on neural networks and learning systems (TNNLS) (pp. 1–10).

  • Lin, M., Ji, R., Li, S., et al. (2022). Network pruning using adaptive exemplar filters. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(12), 7357–7366.

    Article  Google Scholar 

  • Lin, S., Ji, R., Yan, C., et al. (2019). Towards optimal structured CNN pruning via generative adversarial learning. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2785–2794).

  • Liu, J., Zhuang, B., Zhuang, Z., et al. (2021a). Discrimination-aware network pruning for deep model compression. In IEEE transactions on pattern analysis and machine intelligence (TPAMI) (pp. 1–1).

  • Liu, L., Ouyang, W., Wang, X., et al. (2020). Deep learning for generic object detection: A survey. International journal of computer vision (IJCV), 128(2), 261–318.

    Article  Google Scholar 

  • Liu, P., Yuan, W., Fu, J., et al. (2021b). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586

  • Liu, Z., Mu, H., Zhang, X., et al. (2019). Metapruning: Meta learning for automatic neural network channel pruning. In IEEE international conference on computer vision (ICCV) (pp. 3296–3305).

  • Liu, Z., Luo, W., Wu, B., et al. (2020). Bi-real net: Binarizing deep network towards real-network performance. International Journal of Computer Vision (IJCV), 128(1), 202–219.

    Article  Google Scholar 

  • Lohscheller, H. (1984). A subjectively adapted image communication system. IEEE Transactions on Communications (TCOM), 32(12), 1316–1322.

    Article  Google Scholar 

  • Long, S., He, X., & Yao, C. (2021). Scene text detection and recognition: The deep learning era. International Journal of Computer Vision (IJCV), 129(1), 161–184.

    Article  Google Scholar 

  • Luo, J.H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In: IEEE international conference on computer vision (ICCV) (pp. 5068–5076).

  • Minaee, S., Boykov, Y., Porikli, F., et al. (2022). Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7), 3523–3542.

    Google Scholar 

  • Molchanov, P., Tyree, S., Karras, T., et al. (2017). Pruning convolutional neural networks for resource efficient inference. In International conference of learning representation (ICLR).

  • Nguyen, A., Yosinski, J., Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE conference on computer vision and pattern recognition(CVPR) (pp. 427–436).

  • Ning, X., Zhao, T., Li, W., et al. (2020). Dsa: More efficient budgeted pruning via differentiable sparsity allocation. In European conference on computer vision (ECCV) (pp. 592–607). Springer.

  • Nirenberg, S., Carcieri, S. M., Jacobs, A. L., et al. (2001). Retinal ganglion cells act largely as independent encoders. Nature, 411(6838), 698–701.

    Article  Google Scholar 

  • Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.

    Article  Google Scholar 

  • Otter, D. W., Medina, J. R., & Kalita, J. K. (2021). A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 32(2), 604–624.

    Article  MathSciNet  Google Scholar 

  • Paszke, A., Gross, S., Chintala, S., et al. (2017). Automatic differentiation in pytorch. In Neural information processing systems (NeurIPS).

  • Reich, D. S., Mechler, F., & Victor, J. D. (2001). Independent and redundant information in nearby cortical neurons. Science, 294(5551), 2566–2568.

    Article  Google Scholar 

  • Russakovsky, O., Deng, J., Su, H., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sandler, M., Howard, A., Zhu, M., et al. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Szegedy, C., Zaremba, W., Sutskever, I., et al. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199

  • Tao, L., & Gao, W. (2021). Efficient channel pruning based on architecture alignment and probability model bypassing. In 2021 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 3232–3237).

  • Tao, L., Gao, W., Li, G., et al. (2023). Adanic: Towards practical neural image compression via dynamic transform routing. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16,879–16,888).

  • Wang, Y., Zhang, X., Xie, L., et al. (2020). Pruning from scratch. In Proceedings of the AAAI conference on artificial intelligence (AAAI) (pp. 12,273–12,280).

  • Wang, Z., Li, C., & Wang, X. (2021). Convolutional neural network pruning with structural redundancy reduction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 14,908–14,917).

  • Wu, Y., Qi, Z., Zheng, H., et al. (2021). Deep image compression with latent optimization and piece-wise quantization approximation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1926–1930).

  • Yao, K., Cao, F., Leung, Y., et al. (2021). Deep neural network compression through interpretability-based filter pruning. Pattern Recognition (PR), 119(108), 056.

    Google Scholar 

  • Zhang, N., Pan, Z., Li, T.H., et al. (2023). Improving graph representation for point cloud segmentation via attentive filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1244–1254).

  • Zhang, Q., Wang, X., Wu, Y. N., et al. (2021). Interpretable CNNS for object classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3416–3431.

    Article  Google Scholar 

  • Zhang, R., Gao, W., Li, G., et al. (2022). Qinet: Decision surface learning and adversarial enhancement for quasi-immune completion of diverse corrupted point clouds. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–14.

    Google Scholar 

  • Zhang, X. Y., Liu, C. L., & Suen, C. Y. (2020). Towards robust pattern recognition: A review. Proceedings of the IEEE, 108(6), 894–922.

    Article  Google Scholar 

  • Zhang, Y., Tiňo, P., Leonardis, A., et al. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence (TETC), 5(5), 726–742.

    Article  Google Scholar 

  • Zhang, Y., Lin, M., Lin, C. W., et al. (2022). Carrying out CNN channel pruning in a white box. IEEE Transactions on Neural Networks and Learning Systems (TNNLS). https://doi.org/10.1109/TNNLS.2022.3147269

    Article  Google Scholar 

  • Zhou, B., Khosla, A., Lapedriza, A., et al. (2015). Object detectors emerge in deep scene CNNS. In International conference on learning representations (ICLR).

  • Zhou, B., Bau, D., Oliva, A., et al. (2019). Interpreting deep visual representations via network dissection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(9), 2131–2145.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Natural Science Foundation of China (62271013, 62031013), Shenzhen Fundamental Research Program (GXWD20201231165807007-20200806163656003), and Shenzhen Science and Technology Plan Basic Research Project (JCYJ20230807120808017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Gao.

Additional information

Communicated by Arun Mallya.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of the Phenomenon Mentioned in Sect. 3.4

In order to prove the phenomenon that filters in the same layer have the same and deterministic influence on total parameters and FLOPs of the network, we assume that weight parameters in the t-th layer can be described as a four-dimensional (4D) matrix \(\textbf{W}_{t}\in \mathbb {R}^{n_{t} \times n_{t-1} \times k_{t} \times k_{t}}\). \(k_t\) represents the kernel size, \(n_{t-1}\) and \(n_{t}\) are the number of input and output channels, respectively.

As for the total parameters, just as vividly shown in Fig. 3, if a filter in the t-th layer is pruned, we have:

$$\begin{aligned} \Delta P_{total} = n_{t-1} \cdot k_{t} \cdot k_{t} + n_{t+1} \cdot k_{t+1} \cdot k_{t+1}, \end{aligned}$$
(A1)

where \(n_{t-1} \cdot k_{t} \cdot k_{t}\) means parameters of the pruned filter in the t-th layer and \(n_{t+1} \cdot k_{t+1} \cdot k_{t+1}\) means the reduced parameters due to that all filters in the \((t+1)\)-th layer are pruned by one dimension. Obviously, \(\Delta P_{total}\) is a deterministic value.

Then, according to (Molchanov et al., 2017), FLOPs in the t-th layer can be described as:

$$\begin{aligned} FLOPs(t) = 2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \cdot n_t, \end{aligned}$$
(A2)

where \(H_{t}\) and \(W_{t}\) are height and width of the output feature maps.

If a filter in the t-th layer is pruned, the number of input channels in the \((t+1)\)-th layer will be reduced by one. Moreover, all filters in the \((t+1)\)-th layer will also be reduced by one dimension. Therefore, the change in FLOPs of the t-th and the \((t+1)\)-th layer are as follows:

For the t-th layer:

$$\begin{aligned} FLOPs(t)_{before}{} & {} = 2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \cdot n_t, \nonumber \\ \end{aligned}$$
(A3)
$$\begin{aligned} FLOPs(t)_{now}{} & {} = 2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \cdot (n_t-1), \nonumber \\ \end{aligned}$$
(A4)
$$\begin{aligned} \Delta FLOPs(t){} & {} = FLOPs(t)_{before}-FLOPs(t)_{now}\nonumber \\{} & {} =2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) . \end{aligned}$$
(A5)

For the \((t+1)\)-th layer:

$$\begin{aligned}{} & {} FLOPs(t+1)_{before} =\nonumber \\{} & {} 2H_{t+1}\cdot W_{t+1}\cdot \left( n_{t}\cdot k_{t+1}^2+1\right) \cdot n_{t+1},\end{aligned}$$
(A6)
$$\begin{aligned}{} & {} FLOPs(t+1)_{now} = \nonumber \\{} & {} 2H_{t+1}\cdot W_{t+1}\cdot \left[ \left( n_{t}-1\right) \cdot k_{t+1}^2+1\right] \cdot n_{t+1},\end{aligned}$$
(A7)
$$\begin{aligned}{} & {} \Delta FLOPs(t+1) = FLOPs(t+1)_{before}\nonumber \\{} & {} -FLOPs(t+1)_{now}\nonumber \\{} & {} =2H_{t+1}\cdot W_{t+1}\cdot k_{t+1}^2\cdot n_{t+1}. \end{aligned}$$
(A8)

Hence, the total FLOPs will be changed by:

$$\begin{aligned} \begin{aligned} \Delta F_{total}&=\Delta FLOPs(t)+\Delta FLOPs(t+1) \\&=2H_{t}\cdot W_{t}\cdot \left( n_{t-1}\cdot k_t^2+1\right) \\&+2H_{t+1}\cdot W_{t+1}\cdot k_{t+1}^2\cdot n_{t+1}, \end{aligned} \end{aligned}$$
(A9)

where the value of \(\Delta F_{total}\) is also deterministic. Therefore, the total FLOPs will be changed by a deterministic value when a filter in the t-th layer is pruned.

In summary, since \(\Delta P_{total}\) and \(\Delta F_{total}\) are deterministic, filters in the same layer have the same and deterministic influence on total parameters and FLOPs of the network.

Appendix B: Proof of the Rationality of Eq. (13)

\({\varvec{\Phi }}_t^i \in (0,1)\) in Eq. (11) represents the probability that there are i remaining filters in the t-th layer.

We know:

$$\begin{aligned} \sum _{i=1}^{n_t}{\varvec{\Phi }}_t^i = 1. \end{aligned}$$
(B10)

Therefore, we have:

$$\begin{aligned}{} & {} 1\cdot \sum _{i=1}^{n_t}{\varvec{\Phi }}_t^i<\sum _{i=1}^{n_t}i\cdot {\varvec{\Phi }}_t^i<n_t\cdot \sum _{i=1}^{n_t}{\varvec{\Phi }}_t^i, \end{aligned}$$
(B11)
$$\begin{aligned}{} & {} 1<\sum _{i=1}^{n_t}i\cdot {\varvec{\Phi }}_t^i<n_t. \end{aligned}$$
(B12)

Thus:

(B13)

Since \(\textbf{R}_t \in \mathbb {R}^{n_t}\) and at least one filter must be reserved in each layer (\(\Vert \textbf{R}_t\Vert _0\ge 1\)), we have:

$$\begin{aligned} 1\le \Vert \textbf{R}_t\Vert _0\le n_t. \end{aligned}$$
(B14)

Therefore, the value ranges of and \(\Vert \textbf{R}_t\Vert _0\) are similar ( is a relaxation of \(\Vert \textbf{R}_t\Vert _0\)). Thus, we make such a reasonable definition in Eq. (13) in the full paper as:

(B15)

Appendix C: Proof of the Differentiability of the Loss Function in Eq. (16)

Since the trainable parameters are \(\textbf{W}_t\) and \({\varvec{\Theta }}_t\) in each layer, we need to prove the differentiability of each term of the loss function \(\mathcal {L}\) about the involved parameters \(\textbf{W}_t\) and \({\varvec{\Theta }}_t\). Firstly, \({\mathcal {\widetilde{L}}_{FLOPs}}\) and \(\mathcal {\widetilde{L}}_{Params}\) are obviously differentiable for \({\varvec{\Theta }}_t\). Meanwhile, \(\mathcal {L}_{task}\) is naturally differentiable for \(\textbf{W}_t\). But for \(\mathcal {L}_{task}\) term, it should not only find optimal weight \(\textbf{W}_t\), but also update the \({\varvec{\Theta }}_t\). Therefore, our key problem is how to guarantee the differentiability of \(\mathcal {L}_{task}\) with respect to \({\varvec{\Theta }}_t\).

If \(\mathcal {L}_{task}\) is cross-entropy loss for classification model of C classes, we have:

$$\begin{aligned} \mathcal {L}_{task} = -\sum _{c=1}^C y_c\cdot log (\hat{y_c}), \end{aligned}$$
(C16)

where \(y_c\) is the ground truth and \(\hat{y_c}\) is the prediction.

We design a mixture of all the possible pruning masks weighted by \({\varvec{\Phi }}_t\) in Eq. (18) as:

$$\begin{aligned} \begin{aligned} {\varvec{\mathcal {R}}}_t&= \sum _{i=1}^{n_t} {\varvec{\Phi }}_t^i\;determine\left( i,{\varvec{\Gamma }}_t\right) \\&=\sum _{i=1}^{n_t} \frac{e^{{\varvec{\Theta }}_t^i}}{\sum _{k=1}^{n_t}e^{{\varvec{\Theta }}_t^k}}\;determine\left( i,{\varvec{\Gamma }}_t\right) . \end{aligned} \end{aligned}$$
(C17)

When \({\varvec{\Gamma }}_t\) is calculated, \(determine\left( i,{\varvec{\Gamma }}_t\right) \in \mathbb {R}^{n_t}\) will be certain. For example, if \(max\left( {\varvec{\Gamma }}_t\right) ={\varvec{\Gamma }}_t^2\), \(determine\left( 1,{\varvec{\Gamma }}_t\right) =(0,1,0,\cdots ,0)\) (Only the second element is 1, and the other elements are 0). Then \({\varvec{\mathcal {R}}}_t \in \mathbb {R}^{n_t}\) is only related to \({\varvec{\Theta }}_t\) since all the \(determine\left( i,{\varvec{\Gamma }}_t\right) , 1\le i\le n_t\) are certain. We consider \({\varvec{\mathcal {R}}}_t\) as the approximation of \(\textbf{R}_t\).

We make the original feature maps \(\textbf{O}_t\) (\(\textbf{O}_t^i\) is the feature map generated by the i-th filter) in the t-th layer be re-weighted by \({\varvec{\mathcal {R}}}_t\) as shown in Eq. (19), where \({\varvec{\mathcal {R}}}_t\) will execute a dot product with each feature map in \(\textbf{O}_t\), and \({\varvec{\mathcal {O}}}_t\) is the differentiable approximation of the pruned feature maps of the t-th layer.

We know \(\hat{y_c}\) is originally generated by feature maps \(\textbf{O}_t\) layer by layer. We now make \(\hat{y_c}\) be generated by \({\varvec{\mathcal {O}}}_t\) to store the differentiability about \({\varvec{\Theta }}_t\) layer by layer. In this way, the differentiability of \(\mathcal {L}_{task}\) with respect to \({\varvec{\Theta }}_t\) can be guaranteed. In summary, this compeletes the proof of the differentiability of each term in our loss function about the involved parameters.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Y., Gao, W. & Li, G. Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints. Int J Comput Vis 132, 2060–2076 (2024). https://doi.org/10.1007/s11263-023-01972-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01972-x

Keywords

Navigation