Skip to main content

Neural network initialization

  • Learning
  • Conference paper
  • First Online:
From Natural to Artificial Neural Computation (IWANN 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 930))

Included in the following conference series:

Abstract

Proper initialization is one of the most important prerequisites for fast convergence of feed-forward neural networks like high order and multilayer perceptrons. This publication aims at determining the optimal value of the initial weight variance (or range), which is the principal parameter of random weight initialization methods for both types of neural networks.

An overview of random weight initialization methods for multilayer perceptrons is presented. These methods are extensively tested using eight real-world benchmark data sets and a broad range of initial weight variances by means of more than 30, 000 simulations, in the aim to find the best weight initialization method for multilayer perceptrons.

For high order networks, a large number of experiments (more than 200, 000 simulations) was performed, using three weight distributions, three activation functions, several network orders, and the same eight data sets. The results of these experiments are compared to weight initialization techniques for multilayer perceptrons, which leads to the proposal of a suitable weight initialization method for high order perceptrons.

The conclusions on the weight initialization methods for both types of networks are justified by sufficiently small confidence intervals of the mean convergence times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. I. Bellido and E. Fiesler. Do Dackpropagation Trained Neural Networks Have Normal Weight Distributions? In Stan Gielen and Bert Kappen (eds.), ICANN '93; Proceedings of the International Conference on Artificial Neural Networks, pp. 772–775, London, U.K., 1993. Springer-Verlag.

    Google Scholar 

  2. E. J. W. Boers and H. Kuiper. Biological Metaphors and the Design of Modular Artificial Neural Networks. Master's thesis, Leiden University, Leiden, The Netherlands, Aug. 1992.

    Google Scholar 

  3. L.-Y. Bottou. Reconnaissance de la Parole par Reseaux Multi-Couches. In Neuro-Nimes '88; Proceedings of the International Workshop on Neural Networks and Their Applications, pp. 197–217, 1988. ISBN: 2-906899-14-3

    Google Scholar 

  4. C. L. Chen and R. S. Nutter. Improving the Training Speed of Three-Layer Feedforward Neural Nets by Optimal Estimation of the Initial Weights. In International Joint Conference on Neural Networks, vol. 3, pp. 2063–2068. IEEE, 1991.

    Google Scholar 

  5. T. Denoeux and R. Lengellé. Initializing Back Propagation Networks with Prototypes. Neural Networks, vol. 6, pp. 351–363, Pergamon Press Ltd., 1993.

    Google Scholar 

  6. G. P. Drago and S. Ridella. Statistically Controlled Activation Weight Initialization (SCAWI). IEEE Transactions on Neural Networks, vol. 3, num. 4, pp. 627–631, Jul. 1992.

    Google Scholar 

  7. S. E. Fahlman. An Empirical Study of Learning Speed in Backpropagation Networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Sep. 1988.

    Google Scholar 

  8. E. Fiesler. Neural Network Classification and Formalization. In J. Fulcher (ed.), Computer Standards & Interfaces, vol. 16, num. 3, special issue on Neural Network Standardization, pp. 231–239. North-Holland/Elsevier, 1994. ISSN: 0920-5489

    Google Scholar 

  9. M. D. Garris and R. A. Wilkinson. NIST Special Database 3. National Institute of Standarts and Technology, Advanced System Division, Image Recognition Group, Feb. 1992.

    Google Scholar 

  10. J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation, vol. I. Addison Wesley, 1991. ISBN: 0-201-51560-1

    Google Scholar 

  11. P. Haffner, A. Waibel, H. Sawai, and K. Shikano. Fast Back-Propagation Learning Methods for Neural Networks in Speech. Technical Report TR-1-0058, ATR Interpreting Telephony Research Laboratories, 1988.

    Google Scholar 

  12. G. G. Judge, W. E. Griffiths, R. Carter Hill, and T.-C. Lee. The Theory and Practice of Econometrics. Wiley Series in Probability and mathematical statistics. John Wiley and Sons, 2nd edition, 1985.

    Google Scholar 

  13. J. F. Kolen and J. B. Pollack. Back Propagation is Sensitive to Initial Conditions. Technical Report TR 90-JK-BPSIC. Laboratory for Artificial Intelligence Research, Computer and Information Science Department, 1990.

    Google Scholar 

  14. Y. K. Kim and J. B. Ra. Weight Value Initialization for Improving Training Speed in the Back-propagation Network. In International Joint Conference on Neural Networks, vol. 3, pp. 2396–2401. IEEE, 1991.

    Google Scholar 

  15. Y. C. Lee, G. Doolen, H. Chen, G. Sun, T. Maxwell, H. Lee, and C. L. Giles. Machine Learning Using a Higher Order Correlation Network. Physica D: Nonlinear Phenomena, vol. 22, pp. 276–306, 1986. ISSN: 0167-2789

    MathSciNet  Google Scholar 

  16. Y. Lee, S.-H. Oh, and M. W. Kim. An Analysis of Premature Saturation in Back Propagation Learning. Neural Networks, vol. 6, pp. 719–728, 1993.

    Google Scholar 

  17. P. M. Murphy and D. W. Aha (Librarians). UCI Repository of machine learning databases [Machine-readable data repository], anonymous-ftp access ics.uci.edu: pub/machine-learning-databases, 1994.

    Google Scholar 

  18. D. Nguyen and B. Widrow. Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) San Diego, vol. III, pp. 21–26, Edward Brothers, 1990.

    Google Scholar 

  19. Y.-H. Pao. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley Publishing Company, Inc., Reading, Mass., 1989. ISBN: 0-201-12584-6

    Google Scholar 

  20. J.C. Platt. Learning by Combining Memorization and Gradient Descent. In R. P. Lippman et al. (eds.), Advances in Neural Information Processing Systems, vol. III, pp. 714–720. Morgan Kaufmann, San Mateo, 1991.

    Google Scholar 

  21. D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations. The MIT Press, Cambridge, Mass., 1986. ISBN: 0-262-18120-7

    Google Scholar 

  22. F. J. Śmieja. Hyperplane “Spin” Dynamics, Network Plasticity and Back-Propagation Learning. GMD report, GMD, St. Augustin, Germany, Nov. 28, 1991.

    Google Scholar 

  23. G. Thimm, R. Grau, and E. Fiesler. Modular Object-Oriented Neural Network Simulators and Topology Generalizations. In M. Marinaro and P. G. Morasso (eds.), Proceedings of the International Conference on Artificial Neural Networks (ICANN 94), vol. 1, pp. 747–750, London, U.K., 1994. Springer-Verlag. ISBN: 3-540-19887-3

    Google Scholar 

  24. G. Thimm, P. Moerland, and E. Fiesler. The Learning Rate and the Gain of the Activation Function in Backpropagation Neural Networks are Exchangeable. Submitted to Neural Computation. See also P. Moerland, G. Thimm, and E. Fiesler. Results on the Steepness in Backpropagation Neural Networks. In Marc Aguilar (ed.), Proceedings of the '94 SIPAR-Workshop on Parallel and Distributed Computing, Inst. of Informatics, University Pérolles, Chemin du Musée 3, Fribourg, Switzerland, pp. 91–94, Oct. 1994. SI Group for Parallel Systems.

    Google Scholar 

  25. L. F. A. Wessels and E. Barnard. Avoiding False Local Minima by Proper Initialization of Connections. IEEE Transactions on Neural Networks, vol. 3, num. 6, pp. 899–905, Nov. 1992.

    Google Scholar 

  26. R. L. Watrous and G. M. Kuhn. Some Considerations on the Training of Recurrent Neural Net-works for Time-Varying Signals. In M. Gori (ed.), Second Workshop on Neural Networks for Speech Processing, pp. 5–17, Trieste, Italy, 1993. Università di Firenze, Edizioni LINT Trieste S.r.l.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José Mira Francisco Sandoval

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thimm, G., Fiesler, E. (1995). Neural network initialization. In: Mira, J., Sandoval, F. (eds) From Natural to Artificial Neural Computation. IWANN 1995. Lecture Notes in Computer Science, vol 930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59497-3_220

Download citation

  • DOI: https://doi.org/10.1007/3-540-59497-3_220

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-59497-0

  • Online ISBN: 978-3-540-49288-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics