Negative effects of sufficiently small initialweights on back-propagation neural networks

Liu, Yan; Yang, Jie; Li, Long; Wu, Wei

doi:10.1631/jzus.C1200008

Negative effects of sufficiently small initialweights on back-propagation neural networks

Published: 08 August 2012

Volume 13, pages 585–592, (2012)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Yan Liu^1,2,
Jie Yang¹,
Long Li³ &
…
Wei Wu¹

112 Accesses
6 Citations
Explore all metrics

Abstract

In the training of feedforward neural networks, it is usually suggested that the initial weights should be small in magnitude in order to prevent premature saturation. The aim of this paper is to point out the other side of the story: In some cases, the gradient of the error functions is zero not only for infinitely large weights but also for zero weights. Slow convergence in the beginning of the training procedure is often the result of sufficiently small initial weights. Therefore, we suggest that, in these cases, the initial values of the weights should be neither too large, nor too small. For instance, a typical range of choices of the initial weights might be something like (−0.4,−0.1) ∪ (0.1, 0.4), rather than (−0.1, 0.1) as suggested by the usual strategy. Our theory that medium size weights should be used has also been extended to a few commonly used transfer functions and error functions. Numerical experiments are carried out to support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks

Article Open access 08 March 2016

Pushing Stochastic Gradient towards Second-Order Methods – Backpropagation Learning with Transformations in Nonlinearities

Towards More Biologically Plausible Error-Driven Learning for Artificial Neural Networks

References

Bao, J., Chen, Y., Yu, J., 2010. A regeneratable dynamic differential evolution algorithm for neural networks with integer weights. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(12):939–947. [doi:10.1631/jzus.C1000137]
Article Google Scholar
Biswajeet, P., Saied, P., 2010. Comparison between prediction capabilities of neural network and fuzzy logic techniques for landslide susceptibility mapping. Dis. Adv., 3(2):26–34.
Google Scholar
Castillo, P.A., Carpio, J., Merelo, J.J., Prieto, A., Rivas, V., Romero, G., 2000. Evolving multilayer perceptrons. Neur. Process. Lett., 12(2):115–128. [doi:10.1023/A:1009684907680]
Article MATH Google Scholar
Deng, Y., He, X., Zhao, J., Xiong, Y., Shen, Y., Jiang, J., 2010. Application of artificial neural network for switching loss modeling in power IGBTs. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(6):435–443. [doi:10.1631/jzus.C0910442]
Article Google Scholar
Drago, G.P., Ridella, S., 1992. Statistically controlled activation weight initialization. IEEE Trans. Neur. Networks, 3(4):627–631. [doi:10.1109/72.143378]
Article Google Scholar
Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition, 48(1):71–99. [doi:10.1016/0010-0277(93)90058-4]
Article Google Scholar
Hagan, M.T., Demuth, H.B., Beale, M., 1996. Neural Network Design. PWS Publishing Company, Boston, MA.
Google Scholar
Ham, F.M., Kostanic, I., 2001. Principles of Neurocomputing for Science and Engineering. McGraw-Hill, New York.
Google Scholar
Kathirvalavakumar, T., Thangavel, P., 2003. A new learning algorithm using simultaneous perturbation with weight initialization. Neur. Process. Lett., 17(1):55–68. [doi:10.1023/A:1022919300793]
Article Google Scholar
Li, Z., Wu, W., Zhang, H., 2001. Convergence of on-line gradient methods for two-layer feedforward neural networks. J. Math. Res. Exp., 21:219–228.
MathSciNet MATH Google Scholar
Liu, M., Zhang, M., Yan, G., 2008. A new neural network model for the feedback stabilization of nonlinear systems. J. Zhejiang Univ.-Sci. A, 9(8):1015–1023. [doi:10.1631/jzus.A0720122]
Article MathSciNet MATH Google Scholar
Ludermir, T.B., Yamazaki, A., Zanchettin, C., 2006. An optimization methodology for neural network weights and architectures. IEEE Trans. Neur. Networks, 17(6):1452–1459. [doi:10.1109/TNN.2006.881047]
Article Google Scholar
Pradhan, B., 2011. An assessment of the use of an advanced neural network model with five different training strategies for the preparation of landslide susceptibility maps. J. Data Sci., 9(1):65–81.
MathSciNet Google Scholar
Pradhan, B., Buchroithner, M.F., 2010. Comparison and validation of landslide susceptibility maps using an artificial neural network model for three test areas in Malaysia. Envir. Eng. Geosci., 16(2):107–126. [doi:10.2113/gseegeosci.16.2.107]
Article Google Scholar
Pradhan, B., Youssef, A.M., Varathrajoo, R., 2010. Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geo-spat. Inf. Sci., 13(2):93–102. [doi:10.1007/s11806-010-0236-7]
Article Google Scholar
Qi, H., Zhao, H., Liu, W., Zhang, H., 2009. Parameters optimization and nonlinearity analysis of grating eddy current displacement sensor using neural network and genetic algorithm. J. Zhejiang Univ.-Sci. A, 10(8):1205–1212. [doi:10.1631/jzus.A0820564]
Article MATH Google Scholar
Wang, J., Wu, W., Zurada, J.M., 2011. Deterministic convergence of conjugate gradient method for feedforward neural networks. Neurocomputing, 74(14–15):2368–2376. [doi:10.1016/j.neucom.2011.03.016]
Article Google Scholar
Wu, W., Feng, G., Li, Z., Xu, Y., 2005. Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans. Neur. Networks, 16(3):533–540. [doi:10.1109/TNN.2005.844903]
Article Google Scholar
Xiong, Y., Wu, W., Kang, X., Zhang, C., 2007. Training pi-sigma network by online gradient algorithm with penalty for small weight update. Neur. Comput., 19(12):3356–3368. [doi:10.1162/neco.2007.19.12.3356]
Article MATH Google Scholar
Yam, J.Y.F., Chow, T.W.S., 2001. Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans. Neur. Networks, 12(2):430–434. [doi:10.1109/72.914538]
Article Google Scholar
Yam, Y.F., Chow, T.W.S., Leung, C.T., 1997. A new method in determining initial weights of feedforward neural networks for training enhancement. Neurocomputing, 16(1):23–32. [doi:10.1016/S0925-2312(96)00058-6]
Article Google Scholar
Yang, S.S., Siu, S., Ho, C.L., 2008. Analysis of the initial values in split-complex backpropagation algorithm. IEEE Trans. Neur. Networks, 19(9):1564–1573. [doi:10.1109/TNN.2008.2000805]
Article Google Scholar
Zeng, X.Q., Wang, Y.F., Zhang, K., 2006. Computation of Adalines’ sensitivity to weight perturbation. IEEE Trans. Neur. Networks, 17(2):515–519. [doi:10.1109/TNN.2005.863418]
Article Google Scholar
Zhang, N., Wu, W., Zheng, G., 2006. Convergence of gradient method with momentum for two-layer feedforward neural networks. IEEE Trans. Neur. Networks, 17(2):522–525. [doi:10.1109/TNN.2005.863460]
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
Yan Liu, Jie Yang & Wei Wu
School of Information Science and Engineering, Dalian Polytechnic University, Dalian, 116034, China
Yan Liu
Department of Mathematics and Computational Science, Hengyang Normal University, Hengyang, 421002, China
Long Li

Authors

Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Long Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Wu.

Additional information

Project supported by the National Natural Science Foundation of China (No. 11171367) and the Fundamental Research Funds for the Central Universities, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Yang, J., Li, L. et al. Negative effects of sufficiently small initialweights on back-propagation neural networks. J. Zhejiang Univ. - Sci. C 13, 585–592 (2012). https://doi.org/10.1631/jzus.C1200008

Download citation

Received: 11 January 2012
Accepted: 21 June 2012
Published: 08 August 2012
Issue Date: August 2012
DOI: https://doi.org/10.1631/jzus.C1200008

Key words

CLC number

TP18

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Negative effects of sufficiently small initialweights on back-propagation neural networks

Abstract

Access this article

Similar content being viewed by others

Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks

Pushing Stochastic Gradient towards Second-Order Methods – Backpropagation Learning with Transformations in Nonlinearities

Towards More Biologically Plausible Error-Driven Learning for Artificial Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Negative effects of sufficiently small initialweights on back-propagation neural networks

Abstract

Access this article

Similar content being viewed by others

Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks

Pushing Stochastic Gradient towards Second-Order Methods – Backpropagation Learning with Transformations in Nonlinearities

Towards More Biologically Plausible Error-Driven Learning for Artificial Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation