Abstract
Group convolution has been widely used in deep learning community to achieve computation efficiency. In this paper, we develop CondenseNet-elasso to eliminate feature correlation among different convolution groups and alleviate neural network’s overfitting problem. It applies exclusive lasso regularization on CondenseNet. The exclusive lasso regularizer encourages different convolution groups to use different subsets of input channels therefore learn more diversified features. Our experiment results on CIFAR10, CIFAR100 and Tiny ImageNet show that CondenseNets-elasso are more efficient than CondenseNets and other DenseNet’ variants.









Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Campbell F, Allen GI et al (2017) Within group variable selection through the exclusive lasso. Electron J Statist 11(2):4220–4257
Changpinyo S, Sandler M, Zhmoginov A (2017) The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257
Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates Inc, pp. 4467–4475. https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf
Cogswell M, Ahmed F, Girshick R, Zitnick L, Batra D (2015) Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee
Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N (2013) Predicting parameters in deep learning. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26, Curran Associates, Inc., pp. 2148–2156. https://proceedings.neurips.cc/paper/2013/file/7fec306d1e665bc9c748b5d2b99a6e97-Paper.pdf
Dong W, Wu J, Bai Z, Hu Y, Li W, Qiao W, Woźniak M (2021) Mobilegcn applied to low-dimensional node feature learning. Pattern Recognit 112:107788
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: International conference on algorithmic learning theory, pp. 63–77. Springer
Guo Q, Wu XJ, Kittler J, Feng Z (2020) Self-grouping convolutional neural networks. Neural Netw 132:491–505
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp. 1135–1143. https://proceedings.neurips.cc/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866
Hu H, Dey D, Del Giorno A, Hebert M, Bagnell JA (2017) Log-densenet: How to sparsify a densenet. arXiv preprint arXiv:1711.00002
Hu H, Peng R, Tai YW, Tang CK (2016) Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250
Huang G, Liu S, Van der Maaten L, Weinberger KQ (2018) Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708
Ioannou Y, Robertson D, Cipolla R, Criminisi A (2017) Deep roots: Improving cnn efficiency with hierarchical filter groups. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1231–1240
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2018) Medical image semantic segmentation based on deep learning. Neural Comput Appl 29(5):1257–1265
Ke Q, Zhang J, Wei W, Połap D, Woźniak M, Kośmider L, Damaševĭcius R (2019) A neuro-heuristic approach for recognition of lung diseases from x-ray images. Expert Syst Appl 126:218–232
Kong D, Fujimaki R, Liu J, Nie F, Ding C (2014) Exclusive feature learning on arbitrary structures via \(\ell _{1,2}\) -norm. Adv Neural Inf Process Syst 27:1655–1663
Kornblith S, Norouzi M, Lee H, Hinton G (2019) Similarity of neural network representations revisited. arXiv preprint arXiv:1905.00414
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Tech. rep, Citeseer
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp. 1097–1105. https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710
Li X, Chen S, Hu X, Yang J (2019) Understanding the disharmony between dropout and batch normalization by variance shift. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2682–2690
Li Y, Gu S, Mayer C, Gool LV, Timofte R (2020) Group sparsity: The hinge between filter pruning and decomposition for network compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8018–8027
Long J, Shelhamer E, Darrell T(2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
Loshchilov I, Hutter F(2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983
Luo JH, Wu J, Lin W (2017) Thinet: A filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp. 5058–5066
Ma KWD, Lewis J, Kleijn WB (2020) The hsic bottleneck: Deep learning without back-propagation. Proc AAAI Conf Artif Intell 34(4):5085–5092. https://doi.org/10.1609/aaai.v34i04.5950
Minaee S, Kafieh R, Sonka M, Yazdani S, Soufi GJ (2020) Deep-covid: Predicting covid-19 from chest x-ray images using deep transfer learning. Med Image Anal 65:101794
Park S, Lee J, Mo S, Shin J (2020) Lookahead: A far-sighted alternative of magnitude-based pruning. In: International Conference on Learning Representations. https://openreview.net/forum?id=ryl3ygHYDB
Peng, B., Tan, W., Li, Z., Zhang, S., Xie, D., Pu, S.: Extreme network compression via filter group approximation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300–316 (2018)
Pleiss G, Chen D, Huang G, Li T, van der Maaten L, Weinberger KQ (2017) Memory-efficient implementation of densenets. arXiv preprint arXiv:1707.06990
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp. 91–99. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Wan L, Zeiler M, Zhang S, Le Cun Y, Fergus R (2013) Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066
Wang W, Li X, Yang J, Lu T (2018) Mixed link networks. arXiv preprint arXiv:1802.01808
Wang X, Kan M, Shan S, Chen X (2019) Fully learnable group convolution for acceleration of deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9049–9058
Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates, Inc., pp. 2074–2082. https://proceedings.neurips.cc/paper/2016/file/41bfd20a38bb1b0bec75acf0845530a7-Paper.pdf
Woźniak M, Wieczorek M, Siłka J, Połap D (2020) Body pose prediction based on motion sensor data and recurrent neural network. IEEE Trans Ind Inf 17(3):2101–2111
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500
Yang Y, Zhong Z, Shen T, Lin Z (2018) Convolutional neural networks with alternately updated clique. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2413–2422
Ye J, Lu X, Lin Z, Wang JZ (2018) Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint arXiv:1802.00124
Yoon J, Hwang SJ (2017) Combined group and exclusive sparsity for deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3958–3966. JMLR. org
Yu R, Li A, Chen CF, Lai JH, Morariu VI, Han X, Gao M, Lin CY, Davis LS (2018) Nisp: Pruning networks using neuron importance score propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9194–9203
Zhang D, Wang H, Figueiredo M, Balzano L (2018) Learning to share: Simultaneous parameter tying and sparsification in deep learning. In: International Conference on Learning Representations. https://openreview.net/forum?id=rypT3fb0b
Zhang T, Qi GJ, Xiao B, Wang J (2017) Interleaved group convolutions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4373–4382
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856
Zhang Z, Li J, Shao W, Peng Z, Zhang R, Wang X, Luo P (2019) Differentiable learning-to-group channels via groupable convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3542–3551
Zhou Y, Jin R, Hoi SCH (2010) Exclusive lasso for multi-task feature selection. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 988–995
Zhu L, Deng R, Maire M, Deng Z, Mori G, Tan P (2018) Sparsely aggregated convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 186–201
Zhu X, Zhou W, Li H (2018) Improving deep neural network sparsity through decorrelation regularization. In: International Joint Conference on Artificial Intelligence, pp 3264–3270
Acknowledgements
The authors are supported by the National Natural Science Foundation of China (Nos. 61976174, 11671317).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Experiment settings for compared models
LAP Lookahead pruning (LAP) [34] prunes redundant neurons or filters through a lookahead distortion by considering its neighboring layers. In this set of experiments, we use DenseNet as the baseline model and the pruned version is denoted as LAP-DenseNet. For CondenseNet-elasso, the pruning ratios for all 1 × 1 convolutional layers are 75% and the final fully connected layer is 50%. All 3 × 3 convolutional layers are group convolutional with group number 4 (except the initial convolutional layer), which is equivalent to prune 75% of filters. Therefore, we prune LAP-DenseNets under the same setting. Besides, we do not include convolutional layers inside transition blocks, same as that in CondenseNets-elasso. Training steps for pre-training and retraining are [120 k,40 k] for CIFAR and [80 k, 26 k] for Tiny ImageNet. Preliminary experiment results show that the cosine learning rate performs better than the fixed learning rate schedule, therefore, we use a cosine learning rate schedule starting from 0.1 and gradually reduces to 0. We test LAP-DenseNet with 50, 86 and 122 layers on CIFAR and LAP-DenseNet with 52 and 88 layers on Tiny ImageNet. Experiment results in Tables 1 and 2 show that our model performs better than lookahead pruning on DenseNet.
Hinge Hinge (Hinge) [28] provides a way to compress the whole network together at a given target FLOPs compression ratio. The original heavyweight convolution is converted to a lightweight convolution and a linear projection. For example, in basic building block in DenseNet (shown in Fig. 2a), one 1 × 1 convolution is added after each 3x3 convolution to select important output filters. Group sparsity is added by introducing a sparsity-inducing matrix, such as \(L_{1}\) norm or \(L_{1/2}\) norm and the matrix is optimized through proximal gradient descent. In this set of experiments, we follow the original implementation of Hinge https://github.com/ofsoundof/group_sparsity, the resulting unpruned model is denoted as Hinge-DenseNet while the pruned model is denoted as Hinge-DenseNet-pruned. The target FLOPs pruning ratio is selected to be comparable to our baseline models. The model configuration is set as follows: Hinge-DenseNet-28 has {8,8,8} dense blocks in each stage, Hinge-DenseNet-58 has {18,18,18} dense blocks in each stage, and the growth rate is set to {8,16,32}. The searching epochs and converging epochs are set to 200 and 300, respectively. Experiment results in Table 1 and Table 2 show that our model performs better than Hinge-DenseNet under similar computation settings.
Appendix B: Covid-19 X-ray images classification
In this section, we evaluate our proposed model on Covid-19 X-Ray images. The dataset COVID-Xray-5k is constructed from paper DeepCovid [33] whose training dataset is composed of 2000 non-covid examples and 84 covid examples while the validation dataset is composed of 3000 non-covid examples and 100 non-covid examples. [33] use the pre-trained model on ImageNet2012 and use transfer learning to fine-tune the neural networks on the training images of the COVID-Xray-5k dataset. The models predict a probability score for each image and a threshold is selected and any sample with probability higher than the threshold is considered as COVID-19. The paper uses the following two metrics to report the model performance:\(\begin{aligned}&\text {Sensitivity} = \frac{\text {Number of Images correctly predicted as COVID-19}}{\text {Number of Total COVID Images}} \\&\quad \text {Specificity} = \frac{\text {Number of Images correctly predicte as Non-COVID}}{\text {Number Total Non-COVID Images}} \end{aligned}\)In this example, following the training schedule from DeepCovid, we first train the models on ImageNet2012 and use transfer learning to retrain the last fully connected layers from 1000 classes to classes(covid and non-covid). We train three models for comparison: DenseNet-G, CondenseNet and CondenseNet-elasso, all models have {4,6,8,10,8} dense blocks in each stage and the growth rate is set to {8,16,32,64,128}. The pre-trained models are validated on the whole ImageNet2012 validation dataset, the result is shown in Table 4. Our model achieves 4.23% and 0.42% higher top-1 error rate compared with DenseNet-G and CondenseNet-elasso under the same computation settings. At transferring stage, we use a learning rate of 0.0005 for CondenseNet and DenseNet-G and 0.001 for DenseNet-G. The sensitivity and specificity under different threshold levels are shown in Table 5. Table 5 shows that our proposed CondeseNet-elasso achieves much higher sensitivity and comparable specificity compared with DenseNet-G and CondenseNet. Both experiment results on ImageNet2012 classification task and Covid-19 X-ray image classification show that our model performs better than its baseline CondenseNet and DenseNet-G.
Rights and permissions
About this article
Cite this article
Ji, L., Zhang, J., Zhang, C. et al. CondenseNet with exclusive lasso regularization. Neural Comput & Applic 33, 16197–16212 (2021). https://doi.org/10.1007/s00521-021-06222-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06222-0