Reinforcement Learning: Past, Present and Future

Sutton, Richard S.

doi:10.1007/3-540-48873-1_26

Richard S. Sutton⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1585))

Included in the following conference series:

Asia-Pacific Conference on Simulated Evolution and Learning

1125 Accesses
10 Citations

Abstract

Reinforcement learning (RL) concerns the problem of a learning agent inter- acting with its environment to achieve a goal. Instead of being given examples of desired behavior, the learning agent must discover by trial and error how to be- have in order to get the most reward. RL has become popular as an approach to artificial intelligence because of its simple algorithms and mathematical founda- tions (Watkins, 1989; Sutton, 1988; Bertsekas and Tsitsiklis, 1996) and because of a string of strikingly successful applications (e.g., Tesauro, 1995; Crites and Barto, 1996; Zhang and Dietterich, 1996; Nie and Haykin, 1996; Singh and Bert- sekas, 1997; Baxter, Tridgell, and Weaver, 1998). An overall introduction to the field is provided by a recent textbook (Sutton and Barto, 1998). Here we summa- rize three stages in the development of the field, which we coarsely characterize as the past, present, and future of reinforcement learning.

The slides used in the talk corresponding to this extended abstract can be found at http://envy.cs.umass.edu/~rich/SEAL98/sld001.htm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baxter, J., Tridgell, A., Weaver, L. (1998). KnightCap: A chess progream that learns by combining TD(λ) with game-tree search. Proceedings of the Fifteenth International Conference on Machine Learning, pp. 28–36.
Google Scholar
Bertsekas, D. P., and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
MATH Google Scholar
Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using re-inforcement learning. In Advances in Neural Information Processing Systems 9, pp. 1017–1023. MIT Press, Cambridge, MA.
Google Scholar
McCallum, A. K. (1995) Reinforcement Learning with Selective Perception and Hidden State. University of Rochester PhD. thesis.
Google Scholar
Nie, J., and Haykin, S. (1996). A dynamic channel assignment policy through Q-learning. CRL Report 334. Communications Research Laboratory, Mc-Master University, Hamilton, Ontario.
Google Scholar
Precup, D., Sutton, R.S. (1998). Multi-time models for temporally abstract planning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge, MA.
Google Scholar
Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 10, pp. 974–980. MIT Press, Cambridge, MA.
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44.
Google Scholar
Sutton, R. S., and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
Google Scholar
Sutton, R. S., Precup, D., Singh, S. (1998). Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Technical Report 98-74, Department of Computer Science, University of Massachusetts.
Google Scholar
Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.
Article Google Scholar
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University.
Google Scholar
Zhang, W., and Dietterich, T. G. (1996). High-performance job-shop scheduling with a time delay TD(λ) network. In Advances in Neural Information Processing Systems 9, pp. 1024–1030. MIT Press, Cambridge, MA.
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs, Florham Park, NJ, 07932, USA
Richard S. Sutton

Authors

Richard S. Sutton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University College, UNSW Australian Defence Force Academy, Canberra, ACT, Australia, 2600
Bob McKay , Xin Yao & Charles S. Newton , &
Department of Electrical Engineering Korea Advanced Institute of Science and Technology, 373-1, Kusung-dong, Yusung-gu, Taejon-shi, 305-701, Korea
Jong-Hwan Kim
Department of Information Electronics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
Takeshi Furuhashi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutton, R.S. (1999). Reinforcement Learning: Past, Present and Future. In: McKay, B., Yao, X., Newton, C.S., Kim, JH., Furuhashi, T. (eds) Simulated Evolution and Learning. SEAL 1998. Lecture Notes in Computer Science(), vol 1585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48873-1_26

Download citation

DOI: https://doi.org/10.1007/3-540-48873-1_26
Published: 21 May 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65907-5
Online ISBN: 978-3-540-48873-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics