Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations

Shi, Guang; Zhang, Jiangshe; Li, Huirong; Wang, Changpeng

doi:10.1007/s11063-018-9883-8

Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations

Published: 24 September 2018

Volume 50, pages 57–75, (2019)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Guang Shi ORCID: orcid.org/0000-0001-8878-0731¹,
Jiangshe Zhang¹,
Huirong Li² &
…
Changpeng Wang³

791 Accesses
24 Citations
Explore all metrics

Abstract

Deep neural networks (DNNs) are witnessing increasing attention in machine learning. However, the information propagation is becoming increasingly difficult as the networks get deeper, which makes the optimization of DNN extremely hard. One reason of this difficulty is saturation of hidden units. In this paper, we propose a novel methodology named RegA to decrease the influences of saturation on ReLU-DNNs (DNNs with ReLU). Instead of changing the activation functions or the initialization strategy, our methodology explicitly encourage the pre-activation to be out of the saturation region. Specifically, we add an auxiliary objective induced by L2-norm of the pre-activation values to the optimization problem. The auxiliary objective could help to active more units and promote effective information propagation in ReLU-DNNs. By conducting experiments on several large-scale real datasets, we demonstrate better representations could be learned by using RegA and the method help ReLU-DNNs get better performance on convergence and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Skipping Two Layers in ResNet Makes the Generalization Gap Smaller than Skipping One or No Layer

Optimization Method of Residual Networks of Residual Networks for Image Classification

Deep learning with ExtendeD Exponential Linear Unit (DELU)

Article 16 August 2023

Burak Çatalbaş & Ömer Morgül

References

Aizenberg I, Paliy DV, Zurada JM, Astola JT (2008) Blur identification by multilayer neural network based on multivalued neurons. IEEE Trans Neural Netw 19(5):883–898
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. Curran Associates, pp 1097–1105
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. Curran Associates, pp 3111–3119
Shrivastava S, Singh MP (2011) Performance evaluation of feed-forward neural network with soft computing techniques for hand written english alphabets. Appl Soft Comput 11(1):1156–1182
Article Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Wei J, He J, Chen K, Zhou Y, Tang Z (2017) Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl 69:29–39
Article Google Scholar
Ouyang W, Zeng X, Wang X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Li H, Wang K, Yan J, Loy CC, Tang X (2017) DeepID-Net: object detection with deformable part based convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 39(7):1320–1334
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–55
Article MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Google Scholar
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Lin T, Horne BG, Tino P, Giles CL (1998) Learning long-term dependencies is not as difficult with NARX recurrent neural networks. Technical report
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl Based Syst 6(02):107–116
Article MATH Google Scholar
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. IEEE Press
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40
MATH Google Scholar
Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM, pp 473–480
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. AISTATS 15:275
Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol 30
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900
Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7(2):219–269
Article Google Scholar
Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7(1):117–143
Article Google Scholar
Weigend AS, Rumelhart DE, Huberman BA (1991) Generalization by weight-elimination with application to forecasting. In: Advances in neural information processing systems. Morgan-Kaufmann, pp 875–882
Mrázová I, Wang D (2007) Improved generalization of neural classifiers with enforced internal representation. Neurocomputing 70(16):2940–2952
Article Google Scholar
Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414
Article Google Scholar
Chauvin Y (1989) A back-propagation algorithm with optimal use of hidden units. In: Advances in neural information processing systems. Morgan-Kaufmann, pp 519–526
Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area V2. In: Advances in neural information processing systems. Curran Associates, pp 873–880
Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model. In: Proceedings of the 19th international conference on neural information processing systems. MIT Press, pp 1137–1144
Thom M, Palm G (2013) Sparse activity and sparse connectivity in supervised learning. J Mach Learn Res 14:1091–1143
MathSciNet MATH Google Scholar
Zhang J, Ji N, Liu J, Pan J, Meng D (2015) Enhancing performance of the backpropagation algorithm via sparse response regularization. Neurocomputing 153:20–40
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, pp II-104
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto 1(4):7
Simard PY, Steinkraus D, Platt JC, et al. (2003) Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol 3, Citeseer, pp 958–962

Download references

Acknowledgements

This work is supported by the National Basic Research Program of China (973 Program) under Grant No. 2013CB329404, the National Natural Science Foundation of China under Grant Nos. 61572393, 91230101, 61075006, 11131006, 11201367, 11501049, the Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20100201120048.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China
Guang Shi & Jiangshe Zhang
Department of Mathematics and Computer Application, Shangluo University, Shangluo, China
Huirong Li
School of Mathematics and Information Science, Chang’an University, Xi’an, China
Changpeng Wang

Authors

Guang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jiangshe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huirong Li
View author publications
You can also search for this author in PubMed Google Scholar
Changpeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangshe Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, G., Zhang, J., Li, H. et al. Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations. Neural Process Lett 50, 57–75 (2019). https://doi.org/10.1007/s11063-018-9883-8

Download citation

Published: 24 September 2018
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s11063-018-9883-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations

Abstract

Access this article

Similar content being viewed by others

Skipping Two Layers in ResNet Makes the Generalization Gap Smaller than Skipping One or No Layer

Optimization Method of Residual Networks of Residual Networks for Image Classification

Deep learning with ExtendeD Exponential Linear Unit (DELU)

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations

Abstract

Access this article

Similar content being viewed by others

Skipping Two Layers in ResNet Makes the Generalization Gap Smaller than Skipping One or No Layer

Optimization Method of Residual Networks of Residual Networks for Image Classification

Deep learning with ExtendeD Exponential Linear Unit (DELU)

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation