Early Stopping - But When?

Prechelt, Lutz

doi:10.1007/3-540-49430-8_3

Early Stopping - But When?

Lutz Prechelt⁶

Chapter
First Online: 01 January 2002

6459 Accesses
392 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

Abstract

Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using difierent 12 problems and 24 difierent network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Amari, N. Murata, K.-R. Müller, M. Finke, and H. Yang. Statistical theory of overtraining-is cross-validation efiective? In [23], pages 176–182, 1996.
Google Scholar
S. Amari, N. Murata, K.-R. Müller, M. Finke, and H. Yang. Aymptotic statistical theory of overtraining and cross-validation. IEEE Trans. on Neural Networks, 8(5):985–996, September 1997.
Google Scholar
P. Baldi and Y. Chauvin. Temporal evolution of generalization during learning in linear networks. Neural Computation, 3:589-603, 1991.
Google Scholar
J. D. Cowan, G. Tesauro, and J. Alspector, editors. Advances in Neural Information Processing Systems 6, San Mateo, CA, 1994. Morgan Kaufman Publishers Inc.
Google Scholar
Y Le Cun, J. S. Denker, and S. A. Solla. Optimal brain damage. In [22], pages 598–605, 1990.
Google Scholar
S. E. Fahlman. An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh,PA, September 1988.
Google Scholar
S. E. Fahlman and C. Lebiere. The Cascade-Correlation learning architecture. In [22f], pages 524–532, 1990.
Google Scholar
E. Fiesler (rs efiesler@idiap.ch URL). Comparative bibliography of ontogenic neural networks. (submitted for publication), 1994.
Google Scholar
W. Finnoff, F. Hergert, and H. G. Zimmermann. Improving model selection by nonconvergent methods. Neural Networks, 6:771–783, 1993.
Article Google Scholar
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4:1–58, 1992.
Article Google Scholar
S. J. Hanson, J. D. Cowan, and C. L. Giles, editors. Advances in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan Kaufman Publishers Inc.
Google Scholar
B. Hassibi and D. G. Stork. Second order derivatives for network pruning: Optimal brain surgeon. In [11], pages 164–171, 1993.
Google Scholar
A. Krogh and J. A. Hertz. A simple weight decay can improve generalization. In [16], pages 950–957, 1992.
Google Scholar
A. U. Levin, T. K. Leen, and J. E. Moody. Fast pruning using principal components. In [4], 1994.
Google Scholar
R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors. Advances in Neural Information Processing Systems 3, San Mateo, CA, 1991. Morgan Kaufman Publishers Inc.
Google Scholar
J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors. Advances in Neural Information Processing Systems 4, San Mateo, CA, 1992. Morgan Kaufman Publishers Inc.
Google Scholar
N. Morgan and H. Bourlard. Generalization and parameter estimation in feedforward nets: Some experiments. In [22], pages 630–637, 1990.
Google Scholar
S. J. Nowlan and G. E. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4):473–493, 1992.
Article Google Scholar
L. Prechelt. PROBEN1-A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, FakultÄt für Informatik, UniversitÄt Karlsruhe, Germany, September 1994. Anonymous FTP: ftp://pub/papers/techreports/1994/1994-21.ps.gz_on_ftp.ira.uka.de.
Google Scholar
R. Reed. Pruning algorithms-a survey. IEEE Transactions on Neural Networks, 4(5):740–746, 1993.
Article Google Scholar
M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proc. of the IEEE Intl. Conf. on Neural Networks, pages 586–591, San Francisco, CA, April1993.
Google Scholar
D. S. Touretzky, editor. Advances in Neural Information Processing Systems 2, San Mateo, CA, 1990. Morgan Kaufman Publishers Inc.
Google Scholar
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors. Advances in Neural Information Processing Systems 8, Cambridge, MA, 1996. MIT Press.
Google Scholar
C. Wang, S. S. Venkatesh, and J. S. Judd. Optimal stopping and efiective machine complexity in learning. In [4], 1994.
Google Scholar
A. S. Weigend, D. E. Rumelhart, and B. A. Huberman. Generalization by weightelimination with application to forecasting. In [15], pages 875–882, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

FakultÄt für Informatik, UniversitÄt Karlsruhe, D-76128, Karlsruhe, Germany
Lutz Prechelt

Authors

Lutz Prechelt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Willamette University, Salem, OR, 97301, USA
Genevieve B. Orr
GMD First (Forschungszentrum Informationstechnik), Rudower Chaussee 5, D-12489, Berlin, Germany
Klaus-Robert Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Prechelt, L. (1998). Early Stopping - But When?. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_3

Download citation

DOI: https://doi.org/10.1007/3-540-49430-8_3
Published: 28 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics