Abstract
Function approximators are often used in reinforcement learning tasks with large or continuous state spaces. Artificial neural networks, among them recurrent neural networks are popular function approximators, especially in tasks where some kind of of memory is needed, like in real-world partially observable scenarios. However, convergence guarantees for such methods are rarely available. Here, we propose a method using a class of novel RNNs, the echo state networks. Proof of convergence to a bounded region is provided for k-order Markov decision processes. Runs on POMDPs were performed to test and illustrate the working of the architecture.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sutton, R., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Gordon, G.J.: Chattering in SARSA(lambda) - a CMU Learning Lab Internal Report (1996)
Jaeger, H.: Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the ’echo state network’ approach. Technical Report GMD Report 159, German National Research Center for Information Technology (2002)
Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless telecommunication. Science, 78–80 (2004)
Tesauro, G., Sejnowski, T.J.: A parallel network that learns to play backgammon. Artificial Intelligence 39, 357–390 (1989)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Lin, L.J., Mitchell, T.M.: Memory approaches to reinforcement learning in non-markovian domains. Technical Report CMU-CS-92-138, Carnegie Mellon University, Pittsburgh, PA (1992)
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Glickman, M.R., Sycara, K.: Evolution of goal-directed behavior from limited information in a complex environment. In: Proc. of the Genetic and Evol. Comp. Conf., Orlando, Florida, USA, pp. 1281–1288. Morgan Kaufmann, San Francisco (1999)
Bakker, B.: Reinforcement learning with long short-term memory. Advances in Neural Information Processing Systems 14, 1475–1482 (2002)
Bakker, P.B.: The State of Mind - Reinforcement Learning with Recurrent Neural Networks. PhD thesis, Universiteit Leiden (2004)
Schmidhuber, J.: Making the world differentiable. Technical Report TR-FKI-126-90, Institut für Informatik, Technische Universität München (1990)
Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: International Conference on Machine Learning, pp. 30–37 (1995)
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK (1989)
Gordon, G.J.: Reinforcement learning with function approximation converges to a region. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1040–1046. MIT Press, Cambridge (2001)
Kaelbling, L.P., Littman, A.R.C., Acting, M.L.: optimally in partially observable stochastic domains. In: Proc. of the 12th Nat’l Conf. on Artif. Intell. (1994)
Russell, S.J., Norvig, P.: Artificial Intelligence: a Modern Approach. Prentice-Hall, Englewood Cliffs (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szita, I., Gyenes, V., Lőrincz, A. (2006). Reinforcement Learning with Echo State Networks. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_86
Download citation
DOI: https://doi.org/10.1007/11840817_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)