On the significance of Markov decision processes

Sutton, Richard S.

doi:10.1007/BFb0020167

Richard S. Sutton¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1327))

Included in the following conference series:

International Conference on Artificial Neural Networks

176 Accesses

Abstract

Formulating the problem facing an intelligent agent as a Markov decision process (MDP) is increasingly common in artificial intelligence, reinforcement learning, artificial life, and artificial neural networks. In this short paper we examine some of the reasons for the appeal of this framework. Foremost among these are its generality, simplicity, and emphasis on goal-directed interaction between the agent and its environment. MDPs may be becoming a common focal point for different approaches to understanding the mind. Finally, we speculate that this focus may be an enduring one insofar as many of the efforts to extend the MDP framework end up bringing a wider class of problems back within it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A. G., Bradtke, S. J., and Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138.
Google Scholar
Barto, A. G., Sutton, R. S., and Watkins, C. J. C. H. (1990). Learning and sequential decision making. In Gabriel, M. and Moore, J., editors, Learning and Computational Neuroscience: Foundations of Adaptive Networks, pages 539–602. MIT Press, Cambridge, MA.
Google Scholar
Bellman, R. E. (1957). A Markov decision process.Journal of Mathematical Mech., 6:679–684.
Google Scholar
Boutilier, C., Dearden, R., and Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence.
Google Scholar
Crites, R. H. and Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, M. E. H. editor, Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pages 1017–1023, Cambridge, MA. MIT Press.
Google Scholar
Dean, T. L., Kaelbling, L. P., Kirman, J., and Nicholson, A. (1995).Planning under time constraints in stochastic domains. Artificial Intelligence, 76(12):35–74.
Google Scholar
Houk, J. C., Adams, J. L., and Barto, A. G. (1995).A model of how the basal ganglia generates and uses neural signals that predict reinforcement. In Houk, J. C., Davis, J. L., and Beiser, D. G., editors, Models of Information Processing in the Basal Ganglia, pages 249–270. MIT Press, Cambridge, MA.
Google Scholar
McCallum, A. K. (1995). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Rochester.
Google Scholar
Precup, D. and Sutton, R. S. (in preparation). Multi-time models for temporally abstract planning.
Google Scholar
Santamaria, J. C., Sutton, R. S., and Ram, A. (1996). Experiments with reinforcement learning in problems with continuous state and action spaces. Technical Report UM-CS-1996-088, Department of Computer Science, University of Massachusetts, Amherst, MA 01003.
Google Scholar
Schultz, W., Dayan, P., and Montague, P. R. (1997).A neural substrate of prediction and reward. Science, 275:1593–1598.
Google Scholar
Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8:323–339.
Google Scholar
Singh, S. P., Jaakkola, T., and Jordan, M. I. (1994). Learning without stateestimation in partially observable Markovian decision problems. In Cohen, W. W. and Hirsch, H., editors, Proceedings of the Eleventh International Conference on Machine Learning, pages 284–292, San Francisco, CA. Morgan Kaufmann.
Google Scholar
Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Reinforcement learing with soft state aggregation. In G. Tesauro, D. Touretzky, T. L., editor, Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, pages 359–368, Cambridge, MA. MIT Press.
Google Scholar
Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Prieditis, A. and Russell, S., editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 531–539, San Francisco, CA. Morgan Kaufmann.
Google Scholar
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Sys tems: Proceedings of the 1995 Conference, pages 1038–1044, Cambridge, MA. MIT Press.
Google Scholar
Sutton, R. S. and Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In Gabriel, M. and Moore, J., editors, Learning and Computational Neuroscience: Foundations of Adaptive Networks, pages 497–537. MIT Press, Cambridge, MA.
Google Scholar
Sutton, R. S. and Barto, A. G. (1998). Introduction to Reinforcement Learning. MIT Press/Bradford Books, Cambridge, MA.
Google Scholar
Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.
Google Scholar
Tesauro, G. J. and Galperin, G. R. (1997). On-line policy improvement using monte-carlo search. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, Cambridge, MA. MIT Press.
Google Scholar
Van Roy, B., Bertsekas, D. P., Lee, Y., and Tsitsiklis, J. N. (1996). A neurodynamic programming approach to retailer inventory management. Technical ] Report LIDS-P-?, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.
Google Scholar
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards.PhD thesis, Cambridge University, Cambridge, England.
Google Scholar
Witten, I. H. (1977). Exploring, modelling and controlling discrete sequential environments. International Journal of Man-Machine Studies, 9:715–735.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, Amherst, MA, USA
Richard S. Sutton

Authors

Richard S. Sutton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Wulfram Gerstner Alain Germond Martin Hasler Jean-Daniel Nicoud

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutton, R.S. (1997). On the significance of Markov decision processes. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020167

Download citation

DOI: https://doi.org/10.1007/BFb0020167
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63631-1
Online ISBN: 978-3-540-69620-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics