Learning Long Term Dependencies with Recurrent Neural Networks

Schäfer, Anton Maximilian; Udluft, Steffen; Zimmermann, Hans Georg

doi:10.1007/11840817_8

Anton Maximilian Schäfer^20,21,
Steffen Udluft²⁰ &
Hans Georg Zimmermann²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

International Conference on Artificial Neural Networks

3556 Accesses

Abstract

Recurrent neural networks (RNNs) unfolded in time are in theory able to map any open dynamical system. Still they are often blamed to be unable to identify long-term dependencies in the data. Especially when they are trained with backpropagation through time (BPTT) it is claimed that RNNs unfolded in time fail to learn inter-temporal influences more than ten time steps apart.

This paper provides a disproof of this often cited statement. We show that RNNs and especially normalised recurrent neural networks (NRNNs) unfolded in time are indeed very capable of learning time lags of at least a hundred time steps. We further demonstrate that the problem of a vanishing gradient does not apply to these networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Learning Multiple Timescales in Recurrent Neural Networks

A Taxonomy of Recurrent Learning Rules

Recurrent Neural Network

References

Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
MATH Google Scholar
Kolen, J.F., Kremer, S.: A Field Guide to Dynamical Recurrent Networks. IEEE Press, Los Alamitos (2001)
Google Scholar
Schaefer, A.M., Zimmermann, H.G.: Recurrent neural networks are universal approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)
Chapter Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L., et al (eds.) Parallel Distributed Processing: Explorations in The Microstructure of Cognition, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)
Google Scholar
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(2), 107–116 (1998)
Article MATH MathSciNet Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)
Article Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 237–243. IEEE Press, Los Alamitos (2001)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Zimmermann, H.G., Neuneier, R.: Neural network architectures for the modeling of dynamical systems. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press, Los Alamitos (2001)
Google Scholar
Werbos, P.J.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University (1974)
Google Scholar
Zimmermann, H.G., Grothmann, R., Schaefer, A.M., Tietz, C.: In: Prokhorov, D. (ed.) Proceedings of the International Joint Conference on Neural Networks (IJCNN), Montreal. MIT Press, Cambridge (2005)
Google Scholar
Neuneier, R., Zimmermann, H.G.: How to train neural networks. In: Orr, G.B., Mueller, K.R. (eds.) Neural Networks: Tricks of the Trade, pp. 373–423. Springer, Berlin (1998)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Information & Communications, Learning Systems, Siemens AG, Corporate Technology, 81739, Munich, Germany
Anton Maximilian Schäfer, Steffen Udluft & Hans Georg Zimmermann
Department Optimisation and Operations Research, University of Ulm, 89069, Ulm, Germany
Anton Maximilian Schäfer

Authors

Anton Maximilian Schäfer
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Udluft
View author publications
You can also search for this author in PubMed Google Scholar
Hans Georg Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Computer Engineering, Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, GR-157 80, Zographou, Greece
Stefanos D. Kollias
Department of Electrical and Computer Engineering, National Technical University of Athens, 15780, Zographou, Greece
Andreas Stafylopatis
Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Włodzisław Duch
Adaptive Informatics Research Centre, Helsinki University of Technology, HUT, P.O. Box 5400, 02015, Finland
Erkki Oja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schäfer, A.M., Udluft, S., Zimmermann, H.G. (2006). Learning Long Term Dependencies with Recurrent Neural Networks. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_8

Download citation

DOI: https://doi.org/10.1007/11840817_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics