Abstract
Deep neural networks (DNNs) are witnessing increasing attention in machine learning. However, the information propagation is becoming increasingly difficult as the networks get deeper, which makes the optimization of DNN extremely hard. One reason of this difficulty is saturation of hidden units. In this paper, we propose a novel methodology named RegA to decrease the influences of saturation on ReLU-DNNs (DNNs with ReLU). Instead of changing the activation functions or the initialization strategy, our methodology explicitly encourage the pre-activation to be out of the saturation region. Specifically, we add an auxiliary objective induced by L2-norm of the pre-activation values to the optimization problem. The auxiliary objective could help to active more units and promote effective information propagation in ReLU-DNNs. By conducting experiments on several large-scale real datasets, we demonstrate better representations could be learned by using RegA and the method help ReLU-DNNs get better performance on convergence and accuracy.
Similar content being viewed by others
References
Aizenberg I, Paliy DV, Zurada JM, Astola JT (2008) Blur identification by multilayer neural network based on multivalued neurons. IEEE Trans Neural Netw 19(5):883–898
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. Curran Associates, pp 1097–1105
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. Curran Associates, pp 3111–3119
Shrivastava S, Singh MP (2011) Performance evaluation of feed-forward neural network with soft computing techniques for hand written english alphabets. Appl Soft Comput 11(1):1156–1182
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Wei J, He J, Chen K, Zhou Y, Tang Z (2017) Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl 69:29–39
Ouyang W, Zeng X, Wang X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Li H, Wang K, Yan J, Loy CC, Tang X (2017) DeepID-Net: object detection with deformable part based convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 39(7):1320–1334
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–55
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Lin T, Horne BG, Tino P, Giles CL (1998) Learning long-term dependencies is not as difficult with NARX recurrent neural networks. Technical report
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl Based Syst 6(02):107–116
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. IEEE Press
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40
Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM, pp 473–480
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. AISTATS 15:275
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol 30
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900
Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7(2):219–269
Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7(1):117–143
Weigend AS, Rumelhart DE, Huberman BA (1991) Generalization by weight-elimination with application to forecasting. In: Advances in neural information processing systems. Morgan-Kaufmann, pp 875–882
Mrázová I, Wang D (2007) Improved generalization of neural classifiers with enforced internal representation. Neurocomputing 70(16):2940–2952
Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414
Chauvin Y (1989) A back-propagation algorithm with optimal use of hidden units. In: Advances in neural information processing systems. Morgan-Kaufmann, pp 519–526
Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area V2. In: Advances in neural information processing systems. Curran Associates, pp 873–880
Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model. In: Proceedings of the 19th international conference on neural information processing systems. MIT Press, pp 1137–1144
Thom M, Palm G (2013) Sparse activity and sparse connectivity in supervised learning. J Mach Learn Res 14:1091–1143
Zhang J, Ji N, Liu J, Pan J, Meng D (2015) Enhancing performance of the backpropagation algorithm via sparse response regularization. Neurocomputing 153:20–40
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, pp II-104
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto 1(4):7
Simard PY, Steinkraus D, Platt JC, et al. (2003) Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol 3, Citeseer, pp 958–962
Acknowledgements
This work is supported by the National Basic Research Program of China (973 Program) under Grant No. 2013CB329404, the National Natural Science Foundation of China under Grant Nos. 61572393, 91230101, 61075006, 11131006, 11201367, 11501049, the Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20100201120048.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shi, G., Zhang, J., Li, H. et al. Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations. Neural Process Lett 50, 57–75 (2019). https://doi.org/10.1007/s11063-018-9883-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-018-9883-8