Skip to main content
Log in

Negative effects of sufficiently small initialweights on back-propagation neural networks

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

In the training of feedforward neural networks, it is usually suggested that the initial weights should be small in magnitude in order to prevent premature saturation. The aim of this paper is to point out the other side of the story: In some cases, the gradient of the error functions is zero not only for infinitely large weights but also for zero weights. Slow convergence in the beginning of the training procedure is often the result of sufficiently small initial weights. Therefore, we suggest that, in these cases, the initial values of the weights should be neither too large, nor too small. For instance, a typical range of choices of the initial weights might be something like (−0.4,−0.1) ∪ (0.1, 0.4), rather than (−0.1, 0.1) as suggested by the usual strategy. Our theory that medium size weights should be used has also been extended to a few commonly used transfer functions and error functions. Numerical experiments are carried out to support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bao, J., Chen, Y., Yu, J., 2010. A regeneratable dynamic differential evolution algorithm for neural networks with integer weights. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(12):939–947. [doi:10.1631/jzus.C1000137]

    Article  Google Scholar 

  • Biswajeet, P., Saied, P., 2010. Comparison between prediction capabilities of neural network and fuzzy logic techniques for landslide susceptibility mapping. Dis. Adv., 3(2):26–34.

    Google Scholar 

  • Castillo, P.A., Carpio, J., Merelo, J.J., Prieto, A., Rivas, V., Romero, G., 2000. Evolving multilayer perceptrons. Neur. Process. Lett., 12(2):115–128. [doi:10.1023/A:1009684907680]

    Article  MATH  Google Scholar 

  • Deng, Y., He, X., Zhao, J., Xiong, Y., Shen, Y., Jiang, J., 2010. Application of artificial neural network for switching loss modeling in power IGBTs. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(6):435–443. [doi:10.1631/jzus.C0910442]

    Article  Google Scholar 

  • Drago, G.P., Ridella, S., 1992. Statistically controlled activation weight initialization. IEEE Trans. Neur. Networks, 3(4):627–631. [doi:10.1109/72.143378]

    Article  Google Scholar 

  • Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition, 48(1):71–99. [doi:10.1016/0010-0277(93)90058-4]

    Article  Google Scholar 

  • Hagan, M.T., Demuth, H.B., Beale, M., 1996. Neural Network Design. PWS Publishing Company, Boston, MA.

    Google Scholar 

  • Ham, F.M., Kostanic, I., 2001. Principles of Neurocomputing for Science and Engineering. McGraw-Hill, New York.

    Google Scholar 

  • Kathirvalavakumar, T., Thangavel, P., 2003. A new learning algorithm using simultaneous perturbation with weight initialization. Neur. Process. Lett., 17(1):55–68. [doi:10.1023/A:1022919300793]

    Article  Google Scholar 

  • Li, Z., Wu, W., Zhang, H., 2001. Convergence of on-line gradient methods for two-layer feedforward neural networks. J. Math. Res. Exp., 21:219–228.

    MathSciNet  MATH  Google Scholar 

  • Liu, M., Zhang, M., Yan, G., 2008. A new neural network model for the feedback stabilization of nonlinear systems. J. Zhejiang Univ.-Sci. A, 9(8):1015–1023. [doi:10.1631/jzus.A0720122]

    Article  MathSciNet  MATH  Google Scholar 

  • Ludermir, T.B., Yamazaki, A., Zanchettin, C., 2006. An optimization methodology for neural network weights and architectures. IEEE Trans. Neur. Networks, 17(6):1452–1459. [doi:10.1109/TNN.2006.881047]

    Article  Google Scholar 

  • Pradhan, B., 2011. An assessment of the use of an advanced neural network model with five different training strategies for the preparation of landslide susceptibility maps. J. Data Sci., 9(1):65–81.

    MathSciNet  Google Scholar 

  • Pradhan, B., Buchroithner, M.F., 2010. Comparison and validation of landslide susceptibility maps using an artificial neural network model for three test areas in Malaysia. Envir. Eng. Geosci., 16(2):107–126. [doi:10.2113/gseegeosci.16.2.107]

    Article  Google Scholar 

  • Pradhan, B., Youssef, A.M., Varathrajoo, R., 2010. Approaches for delineating landslide hazard areas using different training sites in an advanced artificial neural network model. Geo-spat. Inf. Sci., 13(2):93–102. [doi:10.1007/s11806-010-0236-7]

    Article  Google Scholar 

  • Qi, H., Zhao, H., Liu, W., Zhang, H., 2009. Parameters optimization and nonlinearity analysis of grating eddy current displacement sensor using neural network and genetic algorithm. J. Zhejiang Univ.-Sci. A, 10(8):1205–1212. [doi:10.1631/jzus.A0820564]

    Article  MATH  Google Scholar 

  • Wang, J., Wu, W., Zurada, J.M., 2011. Deterministic convergence of conjugate gradient method for feedforward neural networks. Neurocomputing, 74(14–15):2368–2376. [doi:10.1016/j.neucom.2011.03.016]

    Article  Google Scholar 

  • Wu, W., Feng, G., Li, Z., Xu, Y., 2005. Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans. Neur. Networks, 16(3):533–540. [doi:10.1109/TNN.2005.844903]

    Article  Google Scholar 

  • Xiong, Y., Wu, W., Kang, X., Zhang, C., 2007. Training pi-sigma network by online gradient algorithm with penalty for small weight update. Neur. Comput., 19(12):3356–3368. [doi:10.1162/neco.2007.19.12.3356]

    Article  MATH  Google Scholar 

  • Yam, J.Y.F., Chow, T.W.S., 2001. Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans. Neur. Networks, 12(2):430–434. [doi:10.1109/72.914538]

    Article  Google Scholar 

  • Yam, Y.F., Chow, T.W.S., Leung, C.T., 1997. A new method in determining initial weights of feedforward neural networks for training enhancement. Neurocomputing, 16(1):23–32. [doi:10.1016/S0925-2312(96)00058-6]

    Article  Google Scholar 

  • Yang, S.S., Siu, S., Ho, C.L., 2008. Analysis of the initial values in split-complex backpropagation algorithm. IEEE Trans. Neur. Networks, 19(9):1564–1573. [doi:10.1109/TNN.2008.2000805]

    Article  Google Scholar 

  • Zeng, X.Q., Wang, Y.F., Zhang, K., 2006. Computation of Adalines’ sensitivity to weight perturbation. IEEE Trans. Neur. Networks, 17(2):515–519. [doi:10.1109/TNN.2005.863418]

    Article  Google Scholar 

  • Zhang, N., Wu, W., Zheng, G., 2006. Convergence of gradient method with momentum for two-layer feedforward neural networks. IEEE Trans. Neur. Networks, 17(2):522–525. [doi:10.1109/TNN.2005.863460]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Wu.

Additional information

Project supported by the National Natural Science Foundation of China (No. 11171367) and the Fundamental Research Funds for the Central Universities, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Yang, J., Li, L. et al. Negative effects of sufficiently small initialweights on back-propagation neural networks. J. Zhejiang Univ. - Sci. C 13, 585–592 (2012). https://doi.org/10.1631/jzus.C1200008

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1200008

Key words

CLC number

Navigation