Abstract
In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an \(\ell _1\) penalty on the channel-associated scaling factors in the batch normalization layers during training. In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the \(\ell _1\) penalty with the \(\ell _p\) and transformed \(\ell _1\) (T\(\ell _1\)) penalties since these nonconvex penalties outperformed \(\ell _1\) in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with \(\ell _p\) and T\(\ell _1\) penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than \(\ell _1\). In addition, T\(\ell _1\) preserves the model accuracy after channel pruning, and \(\ell _{1/2, 3/4}\) yield compressed models with similar accuracies as \(\ell _1\) after retraining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aghasi, A., Abdi, A., Romberg, J.: Fast convex pruning of deep neural networks. SIAM J. Math. Data Sci. 2(1), 158–188 (2020)
Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: Advances in Neural Information Processing Systems. pp. 2270–2278 (2016)
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Candès, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
Cao, W., Sun, J., Xu, Z.: Fast image deconvolution using closed-form thresholding formulas of \(L_q (q= 1/2, 2/3)\) regularization. J. Vis. Commun. Image Represent. 24(1), 31–41 (2013)
Changpinyo, S., Sandler, M., Zhmoginov, A.: The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257 (2017)
Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)
Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Prob. 24(3), 035020 (2008)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 3869–3872. IEEE (2008)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y.: Compressing neural networks with the hashing trick. In: International conference on machine learning. pp. 2285–2294 (2015)
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: Training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems. pp. 3123–3131 (2015)
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems. pp. 1269–1277 (2014)
Dinh, T., Xin, J.: Convergence of a relaxed variable splitting method for learning sparse neural networks via \(\ell _1\),\(\ell _0\), and transformed-\(\ell _1\) penalties. In: Proceedings of SAI Intelligent Systems Conference. pp. 360–374. Springer (2020)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems. pp. 1135–1143 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)
Hu, H., Peng, R., Tai, Y.W., Tang, C.K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4700–4708 (2017)
Huang, G., Sun, Yu., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_39
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. pp. 448–456 (2015)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
Jung, H., Ye, J.C., Kim, E.Y.: Improved k-t blast and k-t sense using focuss. Phys. Med. Biol. 52(11), 3201 (2007)
Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: Advances in Neural Information Processing Systems. pp. 1033–1041 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. pp. 1097–1105 (2012)
Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
Li, Y., Wu, C., Duan, Y.: The \(\text{ TV }p\) regularized mumford-shah model for image labeling and segmentation. IEEE Trans. Image Process. 29, 7061–7075 (2020)
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2736–2744 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3431–3440 (2015)
Lou, Y., Osher, S., Xin, J.: Computational aspects of constrained \(L_1-L_2\) minimization for compressive sensing. In: Le Thi, H.A., Pham Dinh, T., Nguyen, N.T. (eds.) Modelling, Computation and Optimization in Information Systems and Management Sciences. AISC, vol. 359, pp. 169–180. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18161-5_15
Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary based on difference of \(L_1\) and \(L_2\). J. Sci. Comput. 64(1), 178–196 (2015)
Lustig, M., Donoho, D., Pauly, J.M.: Sparse mri: the application of compressed sensing for rapid mr imaging. Magn. Res. Med. An Off. J. Int. Soc. Magn. Res. Med. 58(6), 1182–1195 (2007)
Ma, R., Miao, J., Niu, L., Zhang, P.: Transformed \(\ell _1\) regularization for learning sparse deep neural networks. Neural Networks 119, 286–298 (2019)
Qian, Y., Jia, S., Zhou, J., Robles-Kelly, A.: Hyperspectral unmixing via \(L_{1/2}\) sparsity-constrained nonnegative matrix factorization. IEEE Trans. Geosci. Remote Sens. 49(11), 4282–4297 (2011)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Scardapane, S., Comminiello, D., Hussain, A., Uncini, A.: Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017)
Shor, N.Z.: Minimization methods for non-differentiable functions, vol. 3. Springer Science & Business Media (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning. pp. 1139–1147 (2013)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2818–2826 (2016)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems. pp. 2074–2082 (2016)
Wen, W., Xu, C., Wu, C., Wang, Y., Chen, Y., Li, H.: Coordinating filters for faster deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 658–666 (2017)
Xu, Y., et al.: Trained rank pruning for efficient deep neural networks. arXiv preprint arXiv:1812.02402 (2018)
Xu, Y., et al.: Trp: Trained rank pruning for efficient deep neural networks. arXiv preprint arXiv:2004.14566 (2020)
Xu, Z., Chang, X., Xu, F., Zhang, H.: \({\ell _{1/2}}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012)
Xu, Z., Guo, H., Wang, Y., Hai, Z.: Representative of \(L_{1/2}\) regularization among \(L_q (0 \le q \le 1)\) regularizations: an experimental study based on phase diagram. Acta Automatica Sinica 38(7), 1225–1228 (2012)
Xu, Z., Zhang, H., Wang, Y., Chang, X., Liang, Y.: \(L_{1/2}\) regularization. Sci. China Inf. Sci. 53(6), 1159–1169 (2010)
Xue, F., Xin, J.: Learning sparse neural networks via \(\ell _0\) and t\(\ell _1\) by a relaxed variable splitting method with application to multi-scale curve classification. In: Le Thi, H.A., Le, H.M., Pham Dinh, T. (eds.) WCGO 2019. AISC, vol. 991, pp. 800–809. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-21803-4_80
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Yin, P., Zhang, S., Lyu, J., Osher, S., Qi, Y., Xin, J.: Binaryrelax: a relaxation approach for training deep neural networks with quantized weights. SIAM J. Imag. Sci. 11(4), 2205–2223 (2018)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J. Imag. Sci. 1(1), 143–168 (2008)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Royal Stat. Soc.: Series B (Stat. Methodol.) 68(1), 49–67 (2006)
Zhang, S., Xin, J.: Minimization of transformed \( l_1 \) penalty: closed form representation and iterative thresholding algorithms. Commun. Math. Sci. 15(2), 511–537 (2017)
Zhang, S., Xin, J.: Minimization of transformed \(l_1\) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. Math. Program. 169(1), 307–336 (2018)
Zhang, S., Yin, P., Xin, J.: Transformed Schatten-1 iterative thresholding algorithms for low rank matrix completion. Commun. Math. Sci. 15(3), 839–862 (2017)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
Acknowledgments
The work was partially supported by NSF grants IIS-1632935, DMS-1854434, DMS-1952644, and a Qualcomm Faculty Award. The authors thank Mingjie Sun for having the code for [31] available on GitHub.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bui, K., Park, F., Zhang, S., Qi, Y., Xin, J. (2020). Nonconvex Regularization for Network Slimming: Compressing CNNs Even More. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2020. Lecture Notes in Computer Science(), vol 12509. Springer, Cham. https://doi.org/10.1007/978-3-030-64556-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-64556-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64555-7
Online ISBN: 978-3-030-64556-4
eBook Packages: Computer ScienceComputer Science (R0)