Skip to main content

Learning Long Term Dependencies with Recurrent Neural Networks

  • Conference paper
Artificial Neural Networks – ICANN 2006 (ICANN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

  • 3503 Accesses

Abstract

Recurrent neural networks (RNNs) unfolded in time are in theory able to map any open dynamical system. Still they are often blamed to be unable to identify long-term dependencies in the data. Especially when they are trained with backpropagation through time (BPTT) it is claimed that RNNs unfolded in time fail to learn inter-temporal influences more than ten time steps apart.

This paper provides a disproof of this often cited statement. We show that RNNs and especially normalised recurrent neural networks (NRNNs) unfolded in time are indeed very capable of learning time lags of at least a hundred time steps. We further demonstrate that the problem of a vanishing gradient does not apply to these networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)

    MATH  Google Scholar 

  2. Kolen, J.F., Kremer, S.: A Field Guide to Dynamical Recurrent Networks. IEEE Press, Los Alamitos (2001)

    Google Scholar 

  3. Schaefer, A.M., Zimmermann, H.G.: Recurrent neural networks are universal approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L., et al (eds.) Parallel Distributed Processing: Explorations in The Microstructure of Cognition, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)

    Google Scholar 

  5. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(2), 107–116 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)

    Article  Google Scholar 

  7. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 237–243. IEEE Press, Los Alamitos (2001)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Zimmermann, H.G., Neuneier, R.: Neural network architectures for the modeling of dynamical systems. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press, Los Alamitos (2001)

    Google Scholar 

  10. Werbos, P.J.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University (1974)

    Google Scholar 

  11. Zimmermann, H.G., Grothmann, R., Schaefer, A.M., Tietz, C.: In: Prokhorov, D. (ed.) Proceedings of the International Joint Conference on Neural Networks (IJCNN), Montreal. MIT Press, Cambridge (2005)

    Google Scholar 

  12. Neuneier, R., Zimmermann, H.G.: How to train neural networks. In: Orr, G.B., Mueller, K.R. (eds.) Neural Networks: Tricks of the Trade, pp. 373–423. Springer, Berlin (1998)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schäfer, A.M., Udluft, S., Zimmermann, H.G. (2006). Learning Long Term Dependencies with Recurrent Neural Networks. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_8

Download citation

  • DOI: https://doi.org/10.1007/11840817_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-38625-4

  • Online ISBN: 978-3-540-38627-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics