Skip to main content

Generalization Error and Training Error at Singularities of Multilayer Perceptrons

  • Conference paper
  • First Online:
Book cover Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence (IWANN 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2084))

Included in the following conference series:

  • 1434 Accesses

Abstract

The neuromanifold or the parameter space of multila yer perceptrons includes complex singularities at which the Fisher information matrix degenerates. The parameters are unidentifiable at singularities, and this causes serious difficulties in learning, known as plateaus in the cost function. The natural or adaptive natural gradient method is proposed for overcoming this difficulty. It is important to study the relation betw een the generalization error and and the training error at the singularities, because the generalization error is estimated in terms of the training error. The generalization error is studied both for the maximum likelihood estimator (mle) and the Bayesian predictive distribution estimator in terms of the Gaussian random field, by using a simple model. This elucidates the strange behaviors of learning dynamics around singularities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amari, S.: Natural gradient works efficiently in learning, Neural Computation, 10, 251–276, 1998.

    Article  Google Scholar 

  2. Amari, S. and Murata, N.: Statistical theory of learning curves under entropic loss criterion Neural Computation, 5, 140–153, 1993.

    Article  Google Scholar 

  3. Amari S. and Nagaoka, H.: Information Geometry, AMS and Oxford University Press, 2000.

    Google Scholar 

  4. Amari, S. and Ozeki, T.: Differential and algebraic geometry of multilayer perceptrons, IEICE Transactions on Fundamentals of Electronics, Communications and Computer System, E84-A, 31–38, 2001.

    Google Scholar 

  5. Amari, S., Park, H., and Fukumizu, F.: Adaptive method of realizing natural gradient learning for multilayer perceptrons, Neural Computation, 12, 1399–1409, 2000.

    Article  Google Scholar 

  6. Dacunha-Castelle, D. and Gassiat, E.: Testing in locally conic models, and application to mixture models, Probability and Statistics, 1, 285–317, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  7. Fukumizu, K.: Statistical analysis of unidentifiable models and its application to multilayer neural networks, Memo at Post-Conference of the Bernoulli-RIKEN BSI 2000 Symposium on Neural Networks and Learning, October 2000.

    Google Scholar 

  8. Fukumizu, K.: Likelihood Ratio of Unidentifiable Models and Multilayer Neural Networks, Research Memorandum, 780, Inst. of Statitical Mathematics, 2001.

    Google Scholar 

  9. Hagiwara, k., Kuno, K. and Usui, S.: On the problem in model selection of neural network regression in overrealizable scenario, Proceeding of International Joint Conference of Neural Networks, 2000.

    Google Scholar 

  10. Hartigan, J. A.: A failure of likelihood asymptotics for normal mixtures, Proceedings of Berkeley Conference in Honor of J. Neyman and J. Kiefer, 2, 807–810, 1985.

    Google Scholar 

  11. Kitahara, M., Hayasaka, T., Toda, N. and Usui, S.: On the probability distribution of estimators of regression model using 3-layered neural networks (in Japanese), Workshop on Information-Based Induction Sciences (IBIS 2000), 21–26, July, 2000

    Google Scholar 

  12. Park, H., Amari, S. and Fukumizu, F.: Adaptive natural gradient learning algorithms for various stochastic models, Neural Networks, 13, 755–764, 2000.

    Article  Google Scholar 

  13. Rattray, M., Saad, D. and Amari S.: Natural Gradient Descent for On-line Learning, Physical Review Letters, 81, 5461–5464, 1998.

    Article  Google Scholar 

  14. Watanabe, S.: Algebraic analysis for non-identifiable learning machines, Neural Computation, to appear.

    Google Scholar 

  15. Watanabe, S.: Training and generalization errors of learning machines with algebraic singularities (in Japanease), The Trans. of IEICE A, J84-A, 99–108,2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amari, Si., Ozeki, T., Park, H. (2001). Generalization Error and Training Error at Singularities of Multilayer Perceptrons. In: Mira, J., Prieto, A. (eds) Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence. IWANN 2001. Lecture Notes in Computer Science, vol 2084. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45720-8_37

Download citation

  • DOI: https://doi.org/10.1007/3-540-45720-8_37

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42235-8

  • Online ISBN: 978-3-540-45720-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics