Theoretical Analysis of Function of Derivative Term in On-Line Gradient Descent Learning

Hara, Kazuyuki; Katahira, Kentaro; Okanoya, Kazuo; Okada, Masato

doi:10.1007/978-3-642-33266-1_2

Theoretical Analysis of Function of Derivative Term in On-Line Gradient Descent Learning

Kazuyuki Hara²¹,
Kentaro Katahira^22,23,
Kazuo Okanoya^23,24 &
…
Masato Okada^24,23,22

Conference paper

3165 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7553))

Abstract

In on-line gradient descent learning, the local property of the derivative term of the output can slow convergence. Improving the derivative term, such as by using the natural gradient, has been proposed for speeding up the convergence. Beside this sophisticated method, ”simple method” that replace the derivative term with a constant has proposed and showed that this greatly increases convergence speed. Although this phenomenon has been analyzed empirically, however, theoretical analysis is required to show its generality. In this paper, we theoretically analyze the effect of using the simple method. Our results show that, with the simple method, the generalization error decreases faster than with the true gradient descent method when the learning step is smaller than optimum value η _opt. When it is larger than η _opt, it decreases slower with the simple method, and the residual error is larger than with the true gradient descent method. Moreover, when there is output noise, η _opt is no longer optimum; thus, the simple method is not robust in noisy circumstances.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Krogh, A., Hertz, J., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City (1991)
Google Scholar
Biehl, M., Schwarze, H.: Learning by on-line gradient descent. Journal of Physics A: Mathematical and General Physics 28, 643–656 (1995)
Article MATH MathSciNet Google Scholar
Saad, D., Solla, S.A.: On-line learning in soft-committee machines. Physical Review E 52, 4225–4243 (1995)
Article Google Scholar
Hara, K., Katahira, K., Okanoya, K., Okada, M.: Statistical Mechanics of On-Line Node-perturbation Learning. Information Processing Society of Japan, Transactions on Mathematical Modeling and Its Applications 4(1), 72–81 (2011)
Google Scholar
Fukumizu, K.: A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network. Neural Networks 9(5), 871–879 (1996)
Article Google Scholar
Rattray, M., Saad, D.: Incorporating Curvature Information into On-line learning. In: Saad, D. (ed.) On-line Learning in Neural Networks, pp. 183–207. Cambridge University Press, Cambridge (1998)
Google Scholar
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Fahlman, S.E.: An Empirical Study of Learning Speed in Back-Propagation Networks, CMU-CS-88-162 (1988)
Google Scholar
Williams, C.K.I.: Computation with Infinite Neural Networks. Neural Computation 10, 1203–1216 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Industrial Technology, Nihon University, 1-2-1, Izumi-cho, Narashino, Chiba, 275-8575, Japan
Kazuyuki Hara
Center for Evolutionary Cognitive Sciences, The University of Tokyo, 3-8-1, Komaba, Meguro-ku, Tokyo, Japan
Kentaro Katahira & Masato Okada
Brain Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
Kentaro Katahira, Kazuo Okanoya & Masato Okada
Graduate School of Frontier Science, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan
Kazuo Okanoya & Masato Okada

Authors

Kazuyuki Hara
View author publications
You can also search for this author in PubMed Google Scholar
Kentaro Katahira
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Okanoya
View author publications
You can also search for this author in PubMed Google Scholar
Masato Okada
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Neuro Heuristic Research Group, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa
Department of Informatics, Nicolaus Copernicus University, 87-100, Toruń, Poland
Włodzisław Duch
Center for Complex Systems Studies, Kalamazoo College, 49006, Kalamazoo, MI, USA
Péter Érdi
Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, 16146, Genoa, Italy
Francesco Masulli
Institut für Neuroinformatik, Universität Ulm, 89069, Ulm, Germany
Günther Palm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hara, K., Katahira, K., Okanoya, K., Okada, M. (2012). Theoretical Analysis of Function of Derivative Term in On-Line Gradient Descent Learning. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33266-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-33266-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33265-4
Online ISBN: 978-3-642-33266-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics