Neural network initialization

Thimm, G.; Fiesler, E.

doi:10.1007/3-540-59497-3_220

G. Thimm¹ &
E. Fiesler¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 930))

Included in the following conference series:

International Workshop on Artificial Neural Networks

1454 Accesses
27 Citations
3 Altmetric

Abstract

Proper initialization is one of the most important prerequisites for fast convergence of feed-forward neural networks like high order and multilayer perceptrons. This publication aims at determining the optimal value of the initial weight variance (or range), which is the principal parameter of random weight initialization methods for both types of neural networks.

An overview of random weight initialization methods for multilayer perceptrons is presented. These methods are extensively tested using eight real-world benchmark data sets and a broad range of initial weight variances by means of more than 30, 000 simulations, in the aim to find the best weight initialization method for multilayer perceptrons.

For high order networks, a large number of experiments (more than 200, 000 simulations) was performed, using three weight distributions, three activation functions, several network orders, and the same eight data sets. The results of these experiments are compared to weight initialization techniques for multilayer perceptrons, which leads to the proposal of a suitable weight initialization method for high order perceptrons.

The conclusions on the weight initialization methods for both types of networks are justified by sufficiently small confidence intervals of the mean convergence times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

Decorelated Weight Initialization by Backpropagation

Improving learning in Artificial Neural Networks using better weight initializations

Article 29 April 2024

References

I. Bellido and E. Fiesler. Do Dackpropagation Trained Neural Networks Have Normal Weight Distributions? In Stan Gielen and Bert Kappen (eds.), ICANN '93; Proceedings of the International Conference on Artificial Neural Networks, pp. 772–775, London, U.K., 1993. Springer-Verlag.
Google Scholar
E. J. W. Boers and H. Kuiper. Biological Metaphors and the Design of Modular Artificial Neural Networks. Master's thesis, Leiden University, Leiden, The Netherlands, Aug. 1992.
Google Scholar
L.-Y. Bottou. Reconnaissance de la Parole par Reseaux Multi-Couches. In Neuro-Nimes '88; Proceedings of the International Workshop on Neural Networks and Their Applications, pp. 197–217, 1988. ISBN: 2-906899-14-3
Google Scholar
C. L. Chen and R. S. Nutter. Improving the Training Speed of Three-Layer Feedforward Neural Nets by Optimal Estimation of the Initial Weights. In International Joint Conference on Neural Networks, vol. 3, pp. 2063–2068. IEEE, 1991.
Google Scholar
T. Denoeux and R. Lengellé. Initializing Back Propagation Networks with Prototypes. Neural Networks, vol. 6, pp. 351–363, Pergamon Press Ltd., 1993.
Google Scholar
G. P. Drago and S. Ridella. Statistically Controlled Activation Weight Initialization (SCAWI). IEEE Transactions on Neural Networks, vol. 3, num. 4, pp. 627–631, Jul. 1992.
Google Scholar
S. E. Fahlman. An Empirical Study of Learning Speed in Backpropagation Networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Sep. 1988.
Google Scholar
E. Fiesler. Neural Network Classification and Formalization. In J. Fulcher (ed.), Computer Standards & Interfaces, vol. 16, num. 3, special issue on Neural Network Standardization, pp. 231–239. North-Holland/Elsevier, 1994. ISSN: 0920-5489
Google Scholar
M. D. Garris and R. A. Wilkinson. NIST Special Database 3. National Institute of Standarts and Technology, Advanced System Division, Image Recognition Group, Feb. 1992.
Google Scholar
J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation, vol. I. Addison Wesley, 1991. ISBN: 0-201-51560-1
Google Scholar
P. Haffner, A. Waibel, H. Sawai, and K. Shikano. Fast Back-Propagation Learning Methods for Neural Networks in Speech. Technical Report TR-1-0058, ATR Interpreting Telephony Research Laboratories, 1988.
Google Scholar
G. G. Judge, W. E. Griffiths, R. Carter Hill, and T.-C. Lee. The Theory and Practice of Econometrics. Wiley Series in Probability and mathematical statistics. John Wiley and Sons, 2nd edition, 1985.
Google Scholar
J. F. Kolen and J. B. Pollack. Back Propagation is Sensitive to Initial Conditions. Technical Report TR 90-JK-BPSIC. Laboratory for Artificial Intelligence Research, Computer and Information Science Department, 1990.
Google Scholar
Y. K. Kim and J. B. Ra. Weight Value Initialization for Improving Training Speed in the Back-propagation Network. In International Joint Conference on Neural Networks, vol. 3, pp. 2396–2401. IEEE, 1991.
Google Scholar
Y. C. Lee, G. Doolen, H. Chen, G. Sun, T. Maxwell, H. Lee, and C. L. Giles. Machine Learning Using a Higher Order Correlation Network. Physica D: Nonlinear Phenomena, vol. 22, pp. 276–306, 1986. ISSN: 0167-2789
MathSciNet Google Scholar
Y. Lee, S.-H. Oh, and M. W. Kim. An Analysis of Premature Saturation in Back Propagation Learning. Neural Networks, vol. 6, pp. 719–728, 1993.
Google Scholar
P. M. Murphy and D. W. Aha (Librarians). UCI Repository of machine learning databases [Machine-readable data repository], anonymous-ftp access ics.uci.edu: pub/machine-learning-databases, 1994.
Google Scholar
D. Nguyen and B. Widrow. Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) San Diego, vol. III, pp. 21–26, Edward Brothers, 1990.
Google Scholar
Y.-H. Pao. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley Publishing Company, Inc., Reading, Mass., 1989. ISBN: 0-201-12584-6
Google Scholar
J.C. Platt. Learning by Combining Memorization and Gradient Descent. In R. P. Lippman et al. (eds.), Advances in Neural Information Processing Systems, vol. III, pp. 714–720. Morgan Kaufmann, San Mateo, 1991.
Google Scholar
D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations. The MIT Press, Cambridge, Mass., 1986. ISBN: 0-262-18120-7
Google Scholar
F. J. Śmieja. Hyperplane “Spin” Dynamics, Network Plasticity and Back-Propagation Learning. GMD report, GMD, St. Augustin, Germany, Nov. 28, 1991.
Google Scholar
G. Thimm, R. Grau, and E. Fiesler. Modular Object-Oriented Neural Network Simulators and Topology Generalizations. In M. Marinaro and P. G. Morasso (eds.), Proceedings of the International Conference on Artificial Neural Networks (ICANN 94), vol. 1, pp. 747–750, London, U.K., 1994. Springer-Verlag. ISBN: 3-540-19887-3
Google Scholar
G. Thimm, P. Moerland, and E. Fiesler. The Learning Rate and the Gain of the Activation Function in Backpropagation Neural Networks are Exchangeable. Submitted to Neural Computation. See also P. Moerland, G. Thimm, and E. Fiesler. Results on the Steepness in Backpropagation Neural Networks. In Marc Aguilar (ed.), Proceedings of the '94 SIPAR-Workshop on Parallel and Distributed Computing, Inst. of Informatics, University Pérolles, Chemin du Musée 3, Fribourg, Switzerland, pp. 91–94, Oct. 1994. SI Group for Parallel Systems.
Google Scholar
L. F. A. Wessels and E. Barnard. Avoiding False Local Minima by Proper Initialization of Connections. IEEE Transactions on Neural Networks, vol. 3, num. 6, pp. 899–905, Nov. 1992.
Google Scholar
R. L. Watrous and G. M. Kuhn. Some Considerations on the Training of Recurrent Neural Net-works for Time-Varying Signals. In M. Gori (ed.), Second Workshop on Neural Networks for Speech Processing, pp. 5–17, Trieste, Italy, 1993. Università di Firenze, Edizioni LINT Trieste S.r.l.
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP, Case Postale 592, CH-1920, Martigny, Switzerland
G. Thimm & E. Fiesler

Authors

G. Thimm
View author publications
You can also search for this author in PubMed Google Scholar
E. Fiesler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

José Mira Francisco Sandoval

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thimm, G., Fiesler, E. (1995). Neural network initialization. In: Mira, J., Sandoval, F. (eds) From Natural to Artificial Neural Computation. IWANN 1995. Lecture Notes in Computer Science, vol 930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59497-3_220

Download citation

DOI: https://doi.org/10.1007/3-540-59497-3_220
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59497-0
Online ISBN: 978-3-540-49288-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics