Skip to main content
Log in

Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) are witnessing increasing attention in machine learning. However, the information propagation is becoming increasingly difficult as the networks get deeper, which makes the optimization of DNN extremely hard. One reason of this difficulty is saturation of hidden units. In this paper, we propose a novel methodology named RegA to decrease the influences of saturation on ReLU-DNNs (DNNs with ReLU). Instead of changing the activation functions or the initialization strategy, our methodology explicitly encourage the pre-activation to be out of the saturation region. Specifically, we add an auxiliary objective induced by L2-norm of the pre-activation values to the optimization problem. The auxiliary objective could help to active more units and promote effective information propagation in ReLU-DNNs. By conducting experiments on several large-scale real datasets, we demonstrate better representations could be learned by using RegA and the method help ReLU-DNNs get better performance on convergence and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Aizenberg I, Paliy DV, Zurada JM, Astola JT (2008) Blur identification by multilayer neural network based on multivalued neurons. IEEE Trans Neural Netw 19(5):883–898

    Article  Google Scholar 

  2. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. Curran Associates, pp 1097–1105

  3. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  4. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. Curran Associates, pp 3111–3119

  5. Shrivastava S, Singh MP (2011) Performance evaluation of feed-forward neural network with soft computing techniques for hand written english alphabets. Appl Soft Comput 11(1):1156–1182

    Article  Google Scholar 

  6. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  7. Wei J, He J, Chen K, Zhou Y, Tang Z (2017) Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl 69:29–39

    Article  Google Scholar 

  8. Ouyang W, Zeng X, Wang X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Li H, Wang K, Yan J, Loy CC, Tang X (2017) DeepID-Net: object detection with deformable part based convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 39(7):1320–1334

    Article  Google Scholar 

  9. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  10. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–55

    Article  MATH  Google Scholar 

  11. Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153

    Google Scholar 

  12. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  13. Lin T, Horne BG, Tino P, Giles CL (1998) Learning long-term dependencies is not as difficult with NARX recurrent neural networks. Technical report

  14. Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl Based Syst 6(02):107–116

    Article  MATH  Google Scholar 

  15. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. IEEE Press

  16. Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103

  17. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  18. Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40

    MATH  Google Scholar 

  19. Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM, pp 473–480

  20. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256

  21. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. AISTATS 15:275

    Google Scholar 

  22. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  23. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol 30

  24. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  25. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853

  26. Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900

  27. Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7(2):219–269

    Article  Google Scholar 

  28. Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7(1):117–143

    Article  Google Scholar 

  29. Weigend AS, Rumelhart DE, Huberman BA (1991) Generalization by weight-elimination with application to forecasting. In: Advances in neural information processing systems. Morgan-Kaufmann, pp 875–882

  30. Mrázová I, Wang D (2007) Improved generalization of neural classifiers with enforced internal representation. Neurocomputing 70(16):2940–2952

    Article  Google Scholar 

  31. Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414

    Article  Google Scholar 

  32. Chauvin Y (1989) A back-propagation algorithm with optimal use of hidden units. In: Advances in neural information processing systems. Morgan-Kaufmann, pp 519–526

  33. Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area V2. In: Advances in neural information processing systems. Curran Associates, pp 873–880

  34. Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model. In: Proceedings of the 19th international conference on neural information processing systems. MIT Press, pp 1137–1144

  35. Thom M, Palm G (2013) Sparse activity and sparse connectivity in supervised learning. J Mach Learn Res 14:1091–1143

    MathSciNet  MATH  Google Scholar 

  36. Zhang J, Ji N, Liu J, Pan J, Meng D (2015) Enhancing performance of the backpropagation algorithm via sparse response regularization. Neurocomputing 153:20–40

    Article  Google Scholar 

  37. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

  38. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  39. LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, pp II-104

  40. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto 1(4):7

  41. Simard PY, Steinkraus D, Platt JC, et al. (2003) Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol 3, Citeseer, pp 958–962

Download references

Acknowledgements

This work is supported by the National Basic Research Program of China (973 Program) under Grant No. 2013CB329404, the National Natural Science Foundation of China under Grant Nos. 61572393, 91230101, 61075006, 11131006, 11201367, 11501049, the Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20100201120048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangshe Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, G., Zhang, J., Li, H. et al. Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations. Neural Process Lett 50, 57–75 (2019). https://doi.org/10.1007/s11063-018-9883-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-018-9883-8

Keywords

Navigation