Abstract
Recurrent neural networks (RNNs) unfolded in time are in theory able to map any open dynamical system. Still they are often blamed to be unable to identify long-term dependencies in the data. Especially when they are trained with backpropagation through time (BPTT) it is claimed that RNNs unfolded in time fail to learn inter-temporal influences more than ten time steps apart.
This paper provides a disproof of this often cited statement. We show that RNNs and especially normalised recurrent neural networks (NRNNs) unfolded in time are indeed very capable of learning time lags of at least a hundred time steps. We further demonstrate that the problem of a vanishing gradient does not apply to these networks.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
Kolen, J.F., Kremer, S.: A Field Guide to Dynamical Recurrent Networks. IEEE Press, Los Alamitos (2001)
Schaefer, A.M., Zimmermann, H.G.: Recurrent neural networks are universal approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L., et al (eds.) Parallel Distributed Processing: Explorations in The Microstructure of Cognition, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(2), 107–116 (1998)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 237–243. IEEE Press, Los Alamitos (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Zimmermann, H.G., Neuneier, R.: Neural network architectures for the modeling of dynamical systems. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press, Los Alamitos (2001)
Werbos, P.J.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University (1974)
Zimmermann, H.G., Grothmann, R., Schaefer, A.M., Tietz, C.: In: Prokhorov, D. (ed.) Proceedings of the International Joint Conference on Neural Networks (IJCNN), Montreal. MIT Press, Cambridge (2005)
Neuneier, R., Zimmermann, H.G.: How to train neural networks. In: Orr, G.B., Mueller, K.R. (eds.) Neural Networks: Tricks of the Trade, pp. 373–423. Springer, Berlin (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schäfer, A.M., Udluft, S., Zimmermann, H.G. (2006). Learning Long Term Dependencies with Recurrent Neural Networks. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_8
Download citation
DOI: https://doi.org/10.1007/11840817_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)