Skip to main content

Advertisement

Log in

CondenseNet with exclusive lasso regularization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Group convolution has been widely used in deep learning community to achieve computation efficiency. In this paper, we develop CondenseNet-elasso to eliminate feature correlation among different convolution groups and alleviate neural network’s overfitting problem. It applies exclusive lasso regularization on CondenseNet. The exclusive lasso regularizer encourages different convolution groups to use different subsets of input channels therefore learn more diversified features. Our experiment results on CIFAR10, CIFAR100 and Tiny ImageNet show that CondenseNets-elasso are more efficient than CondenseNets and other DenseNet’ variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Campbell F, Allen GI et al (2017) Within group variable selection through the exclusive lasso. Electron J Statist 11(2):4220–4257

    Article  MathSciNet  Google Scholar 

  2. Changpinyo S, Sandler M, Zhmoginov A (2017) The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257

  3. Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates Inc, pp. 4467–4475. https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf

  4. Cogswell M, Ahmed F, Girshick R, Zitnick L, Batra D (2015) Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068

  5. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee

  6. Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N (2013) Predicting parameters in deep learning. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26, Curran Associates, Inc., pp. 2148–2156. https://proceedings.neurips.cc/paper/2013/file/7fec306d1e665bc9c748b5d2b99a6e97-Paper.pdf

  7. Dong W, Wu J, Bai Z, Hu Y, Li W, Qiao W, Woźniak M (2021) Mobilegcn applied to low-dimensional node feature learning. Pattern Recognit 112:107788

    Article  Google Scholar 

  8. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: International conference on algorithmic learning theory, pp. 63–77. Springer

  9. Guo Q, Wu XJ, Kittler J, Feng Z (2020) Self-grouping convolutional neural networks. Neural Netw 132:491–505

    Article  Google Scholar 

  10. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp. 1135–1143. https://proceedings.neurips.cc/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf

  11. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 1026–1034

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  13. He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866

  14. Hu H, Dey D, Del Giorno A, Hebert M, Bagnell JA (2017) Log-densenet: How to sparsify a densenet. arXiv preprint arXiv:1711.00002

  15. Hu H, Peng R, Tai YW, Tang CK (2016) Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250

  16. Huang G, Liu S, Van der Maaten L, Weinberger KQ (2018) Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761

  17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708

  18. Ioannou Y, Robertson D, Cipolla R, Criminisi A (2017) Deep roots: Improving cnn efficiency with hierarchical filter groups. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1231–1240

  19. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  20. Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2018) Medical image semantic segmentation based on deep learning. Neural Comput Appl 29(5):1257–1265

    Article  Google Scholar 

  21. Ke Q, Zhang J, Wei W, Połap D, Woźniak M, Kośmider L, Damaševĭcius R (2019) A neuro-heuristic approach for recognition of lung diseases from x-ray images. Expert Syst Appl 126:218–232

    Article  Google Scholar 

  22. Kong D, Fujimaki R, Liu J, Nie F, Ding C (2014) Exclusive feature learning on arbitrary structures via \(\ell _{1,2}\) -norm. Adv Neural Inf Process Syst 27:1655–1663

    Google Scholar 

  23. Kornblith S, Norouzi M, Lee H, Hinton G (2019) Similarity of neural network representations revisited. arXiv preprint arXiv:1905.00414

  24. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Tech. rep, Citeseer

    Google Scholar 

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp. 1097–1105. https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

  26. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710

  27. Li X, Chen S, Hu X, Yang J (2019) Understanding the disharmony between dropout and batch normalization by variance shift. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2682–2690

  28. Li Y, Gu S, Mayer C, Gool LV, Timofte R (2020) Group sparsity: The hinge between filter pruning and decomposition for network compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8018–8027

  29. Long J, Shelhamer E, Darrell T(2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440

  30. Loshchilov I, Hutter F(2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983

  31. Luo JH, Wu J, Lin W (2017) Thinet: A filter level pruning method for deep neural network compression. In: Proceedings of the IEEE international conference on computer vision, pp. 5058–5066

  32. Ma KWD, Lewis J, Kleijn WB (2020) The hsic bottleneck: Deep learning without back-propagation. Proc AAAI Conf Artif Intell 34(4):5085–5092. https://doi.org/10.1609/aaai.v34i04.5950

    Article  Google Scholar 

  33. Minaee S, Kafieh R, Sonka M, Yazdani S, Soufi GJ (2020) Deep-covid: Predicting covid-19 from chest x-ray images using deep transfer learning. Med Image Anal 65:101794

    Article  Google Scholar 

  34. Park S, Lee J, Mo S, Shin J (2020) Lookahead: A far-sighted alternative of magnitude-based pruning. In: International Conference on Learning Representations. https://openreview.net/forum?id=ryl3ygHYDB

  35. Peng, B., Tan, W., Li, Z., Zhang, S., Xie, D., Pu, S.: Extreme network compression via filter group approximation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300–316 (2018)

  36. Pleiss G, Chen D, Huang G, Li T, van der Maaten L, Weinberger KQ (2017) Memory-efficient implementation of densenets. arXiv preprint arXiv:1707.06990

  37. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788

  38. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp. 91–99. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf

  39. Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89

    Article  Google Scholar 

  40. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  41. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  42. Wan L, Zeiler M, Zhang S, Le Cun Y, Fergus R (2013) Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066

  43. Wang W, Li X, Yang J, Lu T (2018) Mixed link networks. arXiv preprint arXiv:1802.01808

  44. Wang X, Kan M, Shan S, Chen X (2019) Fully learnable group convolution for acceleration of deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9049–9058

  45. Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates, Inc., pp. 2074–2082. https://proceedings.neurips.cc/paper/2016/file/41bfd20a38bb1b0bec75acf0845530a7-Paper.pdf

  46. Woźniak M, Wieczorek M, Siłka J, Połap D (2020) Body pose prediction based on motion sensor data and recurrent neural network. IEEE Trans Ind Inf 17(3):2101–2111

    Article  Google Scholar 

  47. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500

  48. Yang Y, Zhong Z, Shen T, Lin Z (2018) Convolutional neural networks with alternately updated clique. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2413–2422

  49. Ye J, Lu X, Lin Z, Wang JZ (2018) Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint arXiv:1802.00124

  50. Yoon J, Hwang SJ (2017) Combined group and exclusive sparsity for deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3958–3966. JMLR. org

  51. Yu R, Li A, Chen CF, Lai JH, Morariu VI, Han X, Gao M, Lin CY, Davis LS (2018) Nisp: Pruning networks using neuron importance score propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9194–9203

  52. Zhang D, Wang H, Figueiredo M, Balzano L (2018) Learning to share: Simultaneous parameter tying and sparsification in deep learning. In: International Conference on Learning Representations. https://openreview.net/forum?id=rypT3fb0b

  53. Zhang T, Qi GJ, Xiao B, Wang J (2017) Interleaved group convolutions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4373–4382

  54. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856

  55. Zhang Z, Li J, Shao W, Peng Z, Zhang R, Wang X, Luo P (2019) Differentiable learning-to-group channels via groupable convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3542–3551

  56. Zhou Y, Jin R, Hoi SCH (2010) Exclusive lasso for multi-task feature selection. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 988–995

  57. Zhu L, Deng R, Maire M, Deng Z, Mori G, Tan P (2018) Sparsely aggregated convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 186–201

  58. Zhu X, Zhou W, Li H (2018) Improving deep neural network sparsity through decorrelation regularization. In: International Joint Conference on Artificial Intelligence, pp 3264–3270

Download references

Acknowledgements

The authors are supported by the National Natural Science Foundation of China (Nos. 61976174, 11671317).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangshe Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Experiment settings for compared models

LAP Lookahead pruning (LAP) [34] prunes redundant neurons or filters through a lookahead distortion by considering its neighboring layers. In this set of experiments, we use DenseNet as the baseline model and the pruned version is denoted as LAP-DenseNet. For CondenseNet-elasso, the pruning ratios for all 1 × 1 convolutional layers are 75% and the final fully connected layer is 50%. All 3 × 3 convolutional layers are group convolutional with group number 4 (except the initial convolutional layer), which is equivalent to prune 75% of filters. Therefore, we prune LAP-DenseNets under the same setting. Besides, we do not include convolutional layers inside transition blocks, same as that in CondenseNets-elasso. Training steps for pre-training and retraining are [120 k,40 k] for CIFAR and [80 k, 26 k] for Tiny ImageNet. Preliminary experiment results show that the cosine learning rate performs better than the fixed learning rate schedule, therefore, we use a cosine learning rate schedule starting from 0.1 and gradually reduces to 0. We test LAP-DenseNet with 50, 86 and 122 layers on CIFAR and LAP-DenseNet with 52 and 88 layers on Tiny ImageNet. Experiment results in Tables 1 and 2 show that our model performs better than lookahead pruning on DenseNet.


Hinge Hinge (Hinge) [28] provides a way to compress the whole network together at a given target FLOPs compression ratio. The original heavyweight convolution is converted to a lightweight convolution and a linear projection. For example, in basic building block in DenseNet (shown in Fig. 2a), one 1 × 1 convolution is added after each 3x3 convolution to select important output filters. Group sparsity is added by introducing a sparsity-inducing matrix, such as \(L_{1}\) norm or \(L_{1/2}\) norm and the matrix is optimized through proximal gradient descent. In this set of experiments, we follow the original implementation of Hinge https://github.com/ofsoundof/group_sparsity, the resulting unpruned model is denoted as Hinge-DenseNet while the pruned model is denoted as Hinge-DenseNet-pruned. The target FLOPs pruning ratio is selected to be comparable to our baseline models. The model configuration is set as follows: Hinge-DenseNet-28 has {8,8,8} dense blocks in each stage, Hinge-DenseNet-58 has {18,18,18} dense blocks in each stage, and the growth rate is set to {8,16,32}. The searching epochs and converging epochs are set to 200 and 300, respectively. Experiment results in Table 1 and Table 2 show that our model performs better than Hinge-DenseNet under similar computation settings.

Appendix B: Covid-19 X-ray images classification

In this section, we evaluate our proposed model on Covid-19 X-Ray images. The dataset COVID-Xray-5k is constructed from paper DeepCovid [33] whose training dataset is composed of 2000 non-covid examples and 84 covid examples while the validation dataset is composed of 3000 non-covid examples and 100 non-covid examples. [33] use the pre-trained model on ImageNet2012 and use transfer learning to fine-tune the neural networks on the training images of the COVID-Xray-5k dataset. The models predict a probability score for each image and a threshold is selected and any sample with probability higher than the threshold is considered as COVID-19. The paper uses the following two metrics to report the model performance:\(\begin{aligned}&\text {Sensitivity} = \frac{\text {Number of Images correctly predicted as COVID-19}}{\text {Number of Total COVID Images}} \\&\quad \text {Specificity} = \frac{\text {Number of Images correctly predicte as Non-COVID}}{\text {Number Total Non-COVID Images}} \end{aligned}\)In this example, following the training schedule from DeepCovid, we first train the models on ImageNet2012 and use transfer learning to retrain the last fully connected layers from 1000 classes to classes(covid and non-covid). We train three models for comparison: DenseNet-G, CondenseNet and CondenseNet-elasso, all models have {4,6,8,10,8} dense blocks in each stage and the growth rate is set to {8,16,32,64,128}. The pre-trained models are validated on the whole ImageNet2012 validation dataset, the result is shown in Table 4. Our model achieves 4.23% and 0.42% higher top-1 error rate compared with DenseNet-G and CondenseNet-elasso under the same computation settings. At transferring stage, we use a learning rate of 0.0005 for CondenseNet and DenseNet-G and 0.001 for DenseNet-G. The sensitivity and specificity under different threshold levels are shown in Table 5. Table 5 shows that our proposed CondeseNet-elasso achieves much higher sensitivity and comparable specificity compared with DenseNet-G and CondenseNet. Both experiment results on ImageNet2012 classification task and Covid-19 X-ray image classification show that our model performs better than its baseline CondenseNet and DenseNet-G.

Table 4 10%-ImageNet Classification Error rate
Table 5 Sensitivity and Specificity on Covid-19 X-Ray Images

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, L., Zhang, J., Zhang, C. et al. CondenseNet with exclusive lasso regularization. Neural Comput & Applic 33, 16197–16212 (2021). https://doi.org/10.1007/s00521-021-06222-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06222-0

Keywords