Abstract
Stopped training is a method to avoid over-fitting of neural network models by preventing an iterative optimization method from reaching a local minimum of the objective function. It is motivated by the observation that over-fitting occurs gradually as training progresses. The stopping time is typically determined by monitoring the expected generalization performance of the model as approximated by the error on a validation set. In this paper we propose to use an analytic estimate for this purpose. However, these estimates require knowledge of the analytic form of the objective function used for training the network and are only applicable when the weights correspond to a local minimum of this objective function. For this reason, we propose the use of an auxiliary, regularized objective function. The algorithm is “self-contained” and does not require to split the data in a training and a separate validation set.
Preview
Unable to display preview. Download preview PDF.
References
Barron, A. (1984), Predicted squared error: a criterion for automatic model selection, in S. Farlow, ed., ‘Self-Organizing Methods in Modeling', Marcel Dekker, New York.
Chauvin, Y. (1991), Generalization dynamics in LMS trained linear networks, in R. P. Lippman, J. E. Moody and D. S. Touretzky, eds, ‘Advances in Neural Information Processing Systems 3', Morgan Kaufmann Publishers, San Mateo, CA, pp. 890–896.
Finnoff, W., Hergert, F. and Zimmermann, H. G. (1993), ‘Improving model selection by nonconvergent methods', Neural Networks 6, 771–783.
Geman, S., Bienenstock, E. and Doursat, R. (1992), ‘Neural networks and the bias/variance dilemma', Neural Computation 4(1), 1–58.
Larsen, J. (1992), A generalization error estimate for nonlinear systems, in ‘Proceedings of the 1992 IEEE Workshop on Neural Networks for Signal Processing', IEEE Service Center, Piscataway, NJ, pp. 29–38.
Leen, T. K. and Orr, G. B. (1992), Weight-space probability densities and convergence times for stochastic learning, in ‘Int. Joint Conference on Neural Networks', Vol. 4, Baltimore, MD, pp. 158–164.
Moody, J. E. (1991), Note on generalization, regularization and architecture selection in nonlinear learning systems, in B. H. Juang, S. Y. Kung and C. A. Kamm, eds, ‘Neural Networks for Signal Processing', IEEE Signal Processing Society, pp. 1–10.
Moody, J. and Utans, J. (1994), Architecture selection strategies for neural networks: Application to corporate bond rating prediction, in A. N. Refenes, ed., ‘Neural Networks in the Captial Markets', John Wiley & Sons.
Sjöberg, J. and Ljung, L. (1992), Overtraining, regularization, and searching for minimum in neural networks, technical Report LiTH-ISY-I-1297, Dept. of Electrical Engineering, Linköping University, S-581 83 Linköping, Sweden.
Stone, M. (1978), ‘Cross-validation: A review', Math. Operationsforsch. Statist., Ser. Statistics 9(1).
Weigend, A. S. and Rummelhart, D. E. (1991), The effective dimension of the space of hidden units, in ‘Proceedings of the International Joint Conference on Neural Networks', Vol. III, Singapore, pp. 2069–2074.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Utans, J. (1997). A non-convergent on-line training algorithm for neural networks. In: Mira, J., Moreno-Díaz, R., Cabestany, J. (eds) Biological and Artificial Computation: From Neuroscience to Technology. IWANN 1997. Lecture Notes in Computer Science, vol 1240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032551
Download citation
DOI: https://doi.org/10.1007/BFb0032551
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63047-0
Online ISBN: 978-3-540-69074-0
eBook Packages: Springer Book Archive