Abstract
In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [25]. The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.
Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H.: Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21, 243–247 (1969)
Amari, S., Murata, N., Müller, K.R., Finke, M., Yang, H.: Asymptotic Statistical Theory of Overtraining and Cross-Validation. Technical report METR 95-06 and IEEE Transactions on Neural Networks 8(5), 985–996 (1995)
Nonboe Andersen, L., Larsen, J., Hansen, L.K., Hintz-madsen, M.: Adaptive Regularization of Neural Classifiers. In: Principe, J., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, pp. 24–33. IEEE, Piscataway (1997)
Bishop, C.M.: Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Neural Networks. IEEE Transactions on Neural Networks 4(4), 882–884 (1993)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Non-linear Equations. Prentice- Hall, Englewood Cliffs (1983)
Drucker, H., Le Cun, Y.: Improving Generalization Performance in Character Recognition. In: Juang, B.H., et al. (eds.) Neural Networks for Signal Processing: Proceedings of the 1991 IEEE-SP Workshop, pp. 198–207. IEEE, Piscataway (1991)
Geisser, S.: The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association 50, 320–328 (1975)
Geman, S., Bienenstock, E., Doursat, R.: Neural Networks and the Bias/Variance Dilemma. Neural Computation 4, 1–58 (1992)
Girosi, F., Jones, M., Poggio, T.: Regularization Theory and Neural Networks Architectures. Neural Computation 7(2), 219–269 (1995)
Goutte, C., Larsen, J.: Adaptive Regularization of Neural Networks using Conjugate Gradient. In: Proceedings of ICASSP 1998, Seattle, USA, vol. 2, pp. 1201–1204 (1998)
Goutte, C.: Note on Free Lunches and Cross-Validation. Neural Computation 9(6), 1211–1215 (1997)
Goutte, C.: Regularization with a Pruning Prior. Neural Networks (1997) (to appear)
Hansen, L.K., Rasmussen, C.E.: Pruning from Adaptive Regularization. Neural Computation 6, 1223–1232 (1994)
Hansen, L.K., Rasmussen, C.E., Svarer, C., Larsen, J.: Adaptive Regularization. In: Vlontzos, J., Hwang, J.-N., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pp. 78–87. IEEE, Piscataway (1994)
Hansen, L.K., Larsen, J.: Linear Unlearning for Cross-Validation. Advances in Computational Mathematics 5, 269–280 (1996)
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company, Redwood City (1991)
Hintz-Madsen, M., With Pedersen, M., Hansen, L.K., Larsen, J.: Design and Evaluation of Neural Classifiers. In: Usui, S., Tohkura, Y., Katagiri, S., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, pp. 223–232. IEEE, Piscataway (1996)
Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)
Kearns, M.: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. Neural Computation 9(5), 1143–1161 (1997)
Larsen, J.: A Generalization Error Estimate for Nonlinear Systems. In: Kung, S.Y., et al. (eds.) Proceedings of the 1992 IEEE-SP Workshop on Neural Networks for Signal Processing, vol. 2, pp. 29–38. IEEE, Piscataway (1992)
Larsen, J.: Design of Neural Network Filters, Ph.D. Thesis, Electronics Institute, Technical University of Denmark (1993), ftp://eivind.imm.dtu.dk/dist/PhD_thesis/jlarsen.thesis.ps.Z
Larsen, J., Hansen, L.K.: Generalization Performance of Regularized Neural Network Models. In: Vlontzos, J., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pp. 42–51. IEEE, Piscataway (1994)
Larsen, J., Hansen, L.K.: Empirical Generalization Assessment of Neural Network Models. In: Girosi, F., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, pp. 30–39. IEEE, Piscataway (1995)
Larsen, J., Hansen, L.K., Svarer, C., Ohlsson, M.: Design and Regularization of Neural Networks: The Optimal Use of a Validation Set. In: Usui, S., Tohkura, Y., Katagiri, S., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, pp. 62–71. IEEE, Piscataway (1996)
Larsen, J., et al.: Optimal Data Set Split Ratio for Empirical Generalization Error Estimates (in preparation)
Le Cun, Y., Denker, J.S., Solla, S.A.: Optimal Brain Damage. In: Touretzky, D.S. (ed.) Proceedings of the 1989 Conference on Advances in Neural Information Processing Systemsshers, vol. 2, pp. 598–605. Morgan Kaufmann Publishers, San Mateo (1990)
Lowe, D.: Adaptive Radial Basis Function Nonlinearities and the Problem of Generalisation. In: Proc. IEE Conf. on Artificial Neural Networks, pp. 171–175 (1989)
Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs (1987)
MacKay, D.J.C.: A Practical Bayesian Framework for Backprop Networks. Neural Computation 4(3), 448–472 (1992)
Moody, J.: Prediction Risk and Architecture Selection for Neural Networks. In: Cherkassky, V., et al. (eds.) From Statistics to Neural Networks: Theory and Pattern Recognition Applications, vol. 136. Springer-Verlag Series F, Berlin (1994)
Moody, J., Rögnvaldsson, T.: Smoothing Regularizers for Projective Basis Function Networks. In: Proceedings of the 1996 Conference on Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)
Murata, N., Yoshizawa, S., Amari, S.: Network Information Criterion — Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks 5(6), 865–872 (1994)
Nowlan, S., Hinton, G.: Simplifying Neural Networks by Soft Weight Sharing. Neural Computation 4(4), 473–493 (1992)
With Pedersen, M.: Training Recurrent Networks. In: Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII. IEEE, Piscataway (1997)
Peterson, G.E., Barney, H.L.: Control Methods Used in a Study of the Vowels. JASA 24, 175–184 (1952)
Shadafan, R.S., Niranjan, M.: A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space. Neural Computation 6(6), 1202–1222 (1994)
Sjöberg, J.: Non-Linear System Identification with Neural Networks, Ph.D. Thesis no. 381, Department of Electrical Engineering, Linköping University, Sweden (1995)
Stone, M.: Cross-validatory Choice and Assessment of Statistical Predictors. Journal of the Royal Statistical Society B 36(2), 111–147 (1974)
Svarer, C., Hansen, L.K., Larsen, J., Rasmussen, C.E.: Designer Networks for Time Series Processing. In: Kamm, C.A., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, vol. 3, pp. 78–87. IEEE, Piscataway (1993)
Watrous, R.L.: Current Status of PetersonBarney Vowel Formant Data. JASA 89, 2459–2460 (1991)
Weigend, A.S., Huberman, B.A., Rumelhart, D.E.: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems 1(3), 193–209 (1990)
Williams, P.M.: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation 7(1), 117–143 (1995)
Wolpert, D.H., Macready, W.G.: The Mathematics of Search. Technical Report SFI-TR-95-02-010, Santa Fe Instute (1995)
Wu, L., Moody, J.: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3) (1996)
Zhu, H., Rohwer, R.: No Free Lunch for Cross Validation. Neural Computation 8(7), 1421–1426 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Larsen, J., Svarer, C., Andersen, L.N., Hansen, L.K. (2012). Adaptive Regularization in Neural Network Modeling. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-35289-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)