Years and Authors of Summarized Original Work
-
1992; Watkins
Problem Definition
Many sequential decision problems ranging from dynamic resource allocation to robotics can be formulated in terms of stochastic control and solved by methods of reinforcement learning. Therefore, reinforcement learning (a.k.a neuro-dynamic programming) has become one of the major approaches to tackling real-life problems.
In reinforcement learning, an agent wanders in an unknown environment and tries to maximize its long-term return by performing actions and receiving rewards. The most popular mathematical models to describe reinforcement learning problems are the Markov Decision Process (MDP) and its generalization, the partially observable MDP. In contrast to supervised learning, in reinforcement learning, the agent is learning through interaction with the environment and thus influences the “future.” One of the challenges that arises in such cases is the exploration-exploitation dilemma. The agent can...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Allender E, Arora S, Kearns M, Moore C, Russell A (2002) Note on the representational incompatibility of function approximation and factored dynamics. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Brafman R, Tennenholtz M (2002) R-max – a general polynomial time algorithm for near optimal reinforcement learning. J Mach Learn Res 3:213–231
Even-Dar E, Mansour Y (2003) Learning rates for qlearning. J Mach Learn Res 5:1–25
Guestrin C, Koller D, Parr R, Venkataraman S (2003) Efficient solution algorithms for factored MDPs. J Artif Intell Res 19:399–468
Kakade S (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University College London
Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49(2–3):209–232
Lusena C, Goldsmith J, Mundhenk M (2001) Nonapproximability results for partially observable Markov decision processes. J Artif Intell Res 14:83–103
Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Inverted autonomous helicopter flight via reinforcement learning. In: Ang MH Jr, Khatib O (eds) International symposium on experimental robotics. Springer tracts in advanced robotics 21. Springer, Berlin/New York
Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math Oper Res 12(3):441–450
Puterman M (1994) Markov decision processes. Wiley-Interscience, New York
Sutton R (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44
Sutton R, Barto A (1998) Reinforcement learning. An introduction. MIT, Cambridge
Tesauro GJ (1996) TD-gammon, a self-teaching backgammon program, achieves a master-level play. Neural Comput 6:215–219
Tsitsiklis JN, Van Roy B (1996) Feature-based methods for large scale dynamic programming. Mach Learn 22:59–94
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University
Watkins C, Dyan P (1992) Qlearning. Mach Learn 8(3/4):279–292
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Even-Dar, E. (2016). Reinforcement Learning. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_341
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2864-4_341
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering