Abstract
Proper initialization is one of the most important prerequisites for fast convergence of feed-forward neural networks like high order and multilayer perceptrons. This publication aims at determining the optimal value of the initial weight variance (or range), which is the principal parameter of random weight initialization methods for both types of neural networks.
An overview of random weight initialization methods for multilayer perceptrons is presented. These methods are extensively tested using eight real-world benchmark data sets and a broad range of initial weight variances by means of more than 30, 000 simulations, in the aim to find the best weight initialization method for multilayer perceptrons.
For high order networks, a large number of experiments (more than 200, 000 simulations) was performed, using three weight distributions, three activation functions, several network orders, and the same eight data sets. The results of these experiments are compared to weight initialization techniques for multilayer perceptrons, which leads to the proposal of a suitable weight initialization method for high order perceptrons.
The conclusions on the weight initialization methods for both types of networks are justified by sufficiently small confidence intervals of the mean convergence times.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
I. Bellido and E. Fiesler. Do Dackpropagation Trained Neural Networks Have Normal Weight Distributions? In Stan Gielen and Bert Kappen (eds.), ICANN '93; Proceedings of the International Conference on Artificial Neural Networks, pp. 772–775, London, U.K., 1993. Springer-Verlag.
E. J. W. Boers and H. Kuiper. Biological Metaphors and the Design of Modular Artificial Neural Networks. Master's thesis, Leiden University, Leiden, The Netherlands, Aug. 1992.
L.-Y. Bottou. Reconnaissance de la Parole par Reseaux Multi-Couches. In Neuro-Nimes '88; Proceedings of the International Workshop on Neural Networks and Their Applications, pp. 197–217, 1988. ISBN: 2-906899-14-3
C. L. Chen and R. S. Nutter. Improving the Training Speed of Three-Layer Feedforward Neural Nets by Optimal Estimation of the Initial Weights. In International Joint Conference on Neural Networks, vol. 3, pp. 2063–2068. IEEE, 1991.
T. Denoeux and R. Lengellé. Initializing Back Propagation Networks with Prototypes. Neural Networks, vol. 6, pp. 351–363, Pergamon Press Ltd., 1993.
G. P. Drago and S. Ridella. Statistically Controlled Activation Weight Initialization (SCAWI). IEEE Transactions on Neural Networks, vol. 3, num. 4, pp. 627–631, Jul. 1992.
S. E. Fahlman. An Empirical Study of Learning Speed in Backpropagation Networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Sep. 1988.
E. Fiesler. Neural Network Classification and Formalization. In J. Fulcher (ed.), Computer Standards & Interfaces, vol. 16, num. 3, special issue on Neural Network Standardization, pp. 231–239. North-Holland/Elsevier, 1994. ISSN: 0920-5489
M. D. Garris and R. A. Wilkinson. NIST Special Database 3. National Institute of Standarts and Technology, Advanced System Division, Image Recognition Group, Feb. 1992.
J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation, vol. I. Addison Wesley, 1991. ISBN: 0-201-51560-1
P. Haffner, A. Waibel, H. Sawai, and K. Shikano. Fast Back-Propagation Learning Methods for Neural Networks in Speech. Technical Report TR-1-0058, ATR Interpreting Telephony Research Laboratories, 1988.
G. G. Judge, W. E. Griffiths, R. Carter Hill, and T.-C. Lee. The Theory and Practice of Econometrics. Wiley Series in Probability and mathematical statistics. John Wiley and Sons, 2nd edition, 1985.
J. F. Kolen and J. B. Pollack. Back Propagation is Sensitive to Initial Conditions. Technical Report TR 90-JK-BPSIC. Laboratory for Artificial Intelligence Research, Computer and Information Science Department, 1990.
Y. K. Kim and J. B. Ra. Weight Value Initialization for Improving Training Speed in the Back-propagation Network. In International Joint Conference on Neural Networks, vol. 3, pp. 2396–2401. IEEE, 1991.
Y. C. Lee, G. Doolen, H. Chen, G. Sun, T. Maxwell, H. Lee, and C. L. Giles. Machine Learning Using a Higher Order Correlation Network. Physica D: Nonlinear Phenomena, vol. 22, pp. 276–306, 1986. ISSN: 0167-2789
Y. Lee, S.-H. Oh, and M. W. Kim. An Analysis of Premature Saturation in Back Propagation Learning. Neural Networks, vol. 6, pp. 719–728, 1993.
P. M. Murphy and D. W. Aha (Librarians). UCI Repository of machine learning databases [Machine-readable data repository], anonymous-ftp access ics.uci.edu: pub/machine-learning-databases, 1994.
D. Nguyen and B. Widrow. Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) San Diego, vol. III, pp. 21–26, Edward Brothers, 1990.
Y.-H. Pao. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley Publishing Company, Inc., Reading, Mass., 1989. ISBN: 0-201-12584-6
J.C. Platt. Learning by Combining Memorization and Gradient Descent. In R. P. Lippman et al. (eds.), Advances in Neural Information Processing Systems, vol. III, pp. 714–720. Morgan Kaufmann, San Mateo, 1991.
D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations. The MIT Press, Cambridge, Mass., 1986. ISBN: 0-262-18120-7
F. J. Śmieja. Hyperplane “Spin” Dynamics, Network Plasticity and Back-Propagation Learning. GMD report, GMD, St. Augustin, Germany, Nov. 28, 1991.
G. Thimm, R. Grau, and E. Fiesler. Modular Object-Oriented Neural Network Simulators and Topology Generalizations. In M. Marinaro and P. G. Morasso (eds.), Proceedings of the International Conference on Artificial Neural Networks (ICANN 94), vol. 1, pp. 747–750, London, U.K., 1994. Springer-Verlag. ISBN: 3-540-19887-3
G. Thimm, P. Moerland, and E. Fiesler. The Learning Rate and the Gain of the Activation Function in Backpropagation Neural Networks are Exchangeable. Submitted to Neural Computation. See also P. Moerland, G. Thimm, and E. Fiesler. Results on the Steepness in Backpropagation Neural Networks. In Marc Aguilar (ed.), Proceedings of the '94 SIPAR-Workshop on Parallel and Distributed Computing, Inst. of Informatics, University Pérolles, Chemin du Musée 3, Fribourg, Switzerland, pp. 91–94, Oct. 1994. SI Group for Parallel Systems.
L. F. A. Wessels and E. Barnard. Avoiding False Local Minima by Proper Initialization of Connections. IEEE Transactions on Neural Networks, vol. 3, num. 6, pp. 899–905, Nov. 1992.
R. L. Watrous and G. M. Kuhn. Some Considerations on the Training of Recurrent Neural Net-works for Time-Varying Signals. In M. Gori (ed.), Second Workshop on Neural Networks for Speech Processing, pp. 5–17, Trieste, Italy, 1993. Università di Firenze, Edizioni LINT Trieste S.r.l.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thimm, G., Fiesler, E. (1995). Neural network initialization. In: Mira, J., Sandoval, F. (eds) From Natural to Artificial Neural Computation. IWANN 1995. Lecture Notes in Computer Science, vol 930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59497-3_220
Download citation
DOI: https://doi.org/10.1007/3-540-59497-3_220
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59497-0
Online ISBN: 978-3-540-49288-7
eBook Packages: Springer Book Archive