Abstract
Unlike traditional recurrent neural networks, the Long Short-Term Memory (LSTM) model generalizes well when presented with training sequences derived from regular and also simple nonregular languages. Our novel combination of LSTM and the decoupled extended Kalman filter, however, learns even faster and generalizes even better, requiring only the 10 shortest exemplars (n ≤ 10) of the context sensitive language anbncn to deal correctly with values of n up to 1000 and more. Even when we consider the relatively high update complexity per timestep, in many cases the hybrid offers faster learning than LSTM by itself.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Work supported by SNF grant 2100-49’144.96, Spanish Comisión Interministerial de Ciencia y Tecnología grant TIC2000-1599-C02-02, and Generalitat Valenciana grant FPI-99-14-268. J.R. Dorronsoro (Ed.): ICANN 2002, LNCS 2415, pp. 655-660, 2002.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boden, M., Wiles, J.: Context-free and context-sensitive dynamics in recurrent neural networks. Connection Science 12,3 (2000).
Chalup, S., Blair, A.: Hill climbing in recurrent neural networks for learning the anbn nn language. Proc. 6th Conf. on Neural Information Processing (1999) 508–513.
Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12,10 (2000) 2451–2471.
Gers, F. A., Schmidhuber, J.: LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Transactions on Neural Networks 12,6 (2001) 1333–1340.
Haykin, S. (ed.): Kalman filtering and neural networks. Wiley (2001).
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9,8 (1997) 1735–1780.
Puskorius, G. V., Feldkamp, L. A.: Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks. IEEE Transactions on Neural Networks 5,2 (1994) 279–297.
Rodriguez, P., Wiles, J., Elman, J.: A recurrent neural network that learns to count. Connection Science 11,1 (1999) 5–40.
Rodriguez, P., Wiles, J.: Recurrent neural networks can learn to implement symbol-sensitive counting. Advances in Neural Information Processing Systems 10 (1998) 87–93. The MIT Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gers, F.A., Pérez-Ortiz, J.A., Eck, D., Schmidhuber, J. (2002). Learning Context Sensitive Languages with LSTM Trained with Kalman Filters. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_107
Download citation
DOI: https://doi.org/10.1007/3-540-46084-5_107
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive