Skip to main content

Reinforcement Learning: Past, Present and Future

  • Conference paper
  • First Online:
Book cover Simulated Evolution and Learning (SEAL 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1585))

Included in the following conference series:

Abstract

Reinforcement learning (RL) concerns the problem of a learning agent inter- acting with its environment to achieve a goal. Instead of being given examples of desired behavior, the learning agent must discover by trial and error how to be- have in order to get the most reward. RL has become popular as an approach to artificial intelligence because of its simple algorithms and mathematical founda- tions (Watkins, 1989; Sutton, 1988; Bertsekas and Tsitsiklis, 1996) and because of a string of strikingly successful applications (e.g., Tesauro, 1995; Crites and Barto, 1996; Zhang and Dietterich, 1996; Nie and Haykin, 1996; Singh and Bert- sekas, 1997; Baxter, Tridgell, and Weaver, 1998). An overall introduction to the field is provided by a recent textbook (Sutton and Barto, 1998). Here we summa- rize three stages in the development of the field, which we coarsely characterize as the past, present, and future of reinforcement learning.

The slides used in the talk corresponding to this extended abstract can be found at http://envy.cs.umass.edu/~rich/SEAL98/sld001.htm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baxter, J., Tridgell, A., Weaver, L. (1998). KnightCap: A chess progream that learns by combining TD(λ) with game-tree search. Proceedings of the Fifteenth International Conference on Machine Learning, pp. 28–36.

    Google Scholar 

  • Bertsekas, D. P., and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.

    MATH  Google Scholar 

  • Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using re-inforcement learning. In Advances in Neural Information Processing Systems 9, pp. 1017–1023. MIT Press, Cambridge, MA.

    Google Scholar 

  • McCallum, A. K. (1995) Reinforcement Learning with Selective Perception and Hidden State. University of Rochester PhD. thesis.

    Google Scholar 

  • Nie, J., and Haykin, S. (1996). A dynamic channel assignment policy through Q-learning. CRL Report 334. Communications Research Laboratory, Mc-Master University, Hamilton, Ontario.

    Google Scholar 

  • Precup, D., Sutton, R.S. (1998). Multi-time models for temporally abstract planning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge, MA.

    Google Scholar 

  • Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 10, pp. 974–980. MIT Press, Cambridge, MA.

    Google Scholar 

  • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44.

    Google Scholar 

  • Sutton, R. S., and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.

    Google Scholar 

  • Sutton, R. S., Precup, D., Singh, S. (1998). Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Technical Report 98-74, Department of Computer Science, University of Massachusetts.

    Google Scholar 

  • Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.

    Article  Google Scholar 

  • Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University.

    Google Scholar 

  • Zhang, W., and Dietterich, T. G. (1996). High-performance job-shop scheduling with a time delay TD(λ) network. In Advances in Neural Information Processing Systems 9, pp. 1024–1030. MIT Press, Cambridge, MA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sutton, R.S. (1999). Reinforcement Learning: Past, Present and Future. In: McKay, B., Yao, X., Newton, C.S., Kim, JH., Furuhashi, T. (eds) Simulated Evolution and Learning. SEAL 1998. Lecture Notes in Computer Science(), vol 1585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48873-1_26

Download citation

  • DOI: https://doi.org/10.1007/3-540-48873-1_26

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65907-5

  • Online ISBN: 978-3-540-48873-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics