Partially observable markov decision processes for artificial intelligence

Kaelbling, Leslie Pack; Littman, Michael L.; Cassandra, Anthony R.

doi:10.1007/BFb0013957

Leslie Pack Kaelbling¹,
Michael L. Littman¹ &
Anthony R. Cassandra¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1093))

Included in the following conference series:

International Workshop on Reasoning with Uncertainty in Robotics

1832 Accesses
3 Citations

Abstract

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. In many cases, we have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective. We begin by introducing the theory of Markov decision processes (MDPs) and partially observable Markov decision processes POMDPs. We then outline a novel algorithm for solving POMDPs off line and show how, in many cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a simple example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Astrom, K. J. (1965), Optimal control of Markov decision processes with incomplete state estimation, Journal of Mathematical Analysis and Applications 10, 174–205
Article Google Scholar
Boutilier, C. & Dearden, R. (1994), Using abstractions for decision theoretic planning with time constraints, in Proceedings of the Twelfth National Conference on Artificial Intelligence
Google Scholar
Boutilier, C., Dearden, R. & Goldszmidt, M. (1995), Exploiting structure in policy construction, in Proceedings of the International Joint Conference on Artificial Intelligence
Google Scholar
Cassandra, A. R. (1994), Algorithms for partially observable Markov decision processes, Technical Report CS-94-14, Brown University, Providence, Rhode Island
Google Scholar
Cheng, H.-T. (1988), Algorithms for Partially Observable Markov Decision Processes, PhD thesis, University of British Columbia, British Columbia, Canada
Google Scholar
Dean, T., Kaelbling, L. P., Kirman, J. & Nicholson, A. (To Appear), Planning under time constraints in stochastic domains, Artificial Intelligence
Google Scholar
Drummond, M. & Bresina, J. (1990), Anytime synthetic projection, in Proceedings of the Eighth National Conference on Artificial Intelligence, Morgan Kaufmann, pp. 138–144
Google Scholar
Howard, R. A. (1960), Dynamic Programming and Markov Processes, The MIT Press, Cambridge, Massachusetts
Google Scholar
Kalman, R. E. (1960), A new approach to linear filtering and prediction problems, Transactions of the American Society of Mechanical Engineers Journal of Basic Engineering 82, 35–45
Google Scholar
Kushmerick, N., Hanks, S. & Weld, D. (1993), An Algorithm for Probabilistic Planning, Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence.
Google Scholar
Littman, M. L. (1994), The witness algorithm: Solving partially observable Markov decision processes, Technical Report CS-94-40, Brown University, Providence, Rhode Island
Google Scholar
Littman, M. L., Cassandra, A. R. & Kaelbling, L. P. (1995 a), An efficient algorithm for dynamic programming in partially observable Markov decision processes, Technical Report CS-95-19, Brown University, Providence, Rhode Island
Google Scholar
Littman, M. L., Cassandra, A. R. & Kaelbling, L. P. (1995 b), Learning policies for partially observable environments: Scaling up, in Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann
Google Scholar
Lovejoy, W. S. (1991), A survey of algorithmic methods for partially observed Markov decision processes, Annals of Operations Research 28(1), 47–65
Article Google Scholar
Monahan, G. E. (1982), A survey of partially observable Markov decision processes: Theory, models, and algorithms, Management Science 28(1), 1–16
Google Scholar
Moore, R. C. (1985), A formal theory of knowledge and action, in J. R. Hobbs & R. C. Moore (eds.) Formal Theories of the Commonsense World, Ablex Publishing Company, Norwood, New Jersey
Google Scholar
Puterman, M. L. (1994), Markov Decision Processes, John Wiley & Sons, New York
Google Scholar
Singh, S. P., Jaakkola, T. & Jordan, M.I. (1994), Model-free reinforcement learning for non-Markovian decision problems, in Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, San Francisco, California, pp. 284–292
Google Scholar
Smallwood, R. D. & Sondik, E. J. (1973), The optimal control of partially observable Markov processes over a finite horizon, Operations Research 21, 1071–1088
Google Scholar
Sondik, E. J. (1978), The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs, Operations Research 26(2), 282–304
Google Scholar
Tseng, P. (1990), Solving H-horizon, stationary Markov decision problems in time proportional to log (H), Operations Research Letters 9(5), 287–297
Article Google Scholar
White, III, C. C. (1991), Partially observed Markov decision processes: A survey, Annals of Operations Research
Google Scholar

Download references

Author information

Authors and Affiliations

Brown University, 02912-1910, Providence, Rhode Island, USA
Leslie Pack Kaelbling, Michael L. Littman & Anthony R. Cassandra

Authors

Leslie Pack Kaelbling
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Littman
View author publications
You can also search for this author in PubMed Google Scholar
Anthony R. Cassandra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Leo Dorst Michiel van Lambalgen Frans Voorbraak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaelbling, L.P., Littman, M.L., Cassandra, A.R. (1996). Partially observable markov decision processes for artificial intelligence. In: Dorst, L., van Lambalgen, M., Voorbraak, F. (eds) Reasoning with Uncertainty in Robotics. RUR 1995. Lecture Notes in Computer Science, vol 1093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0013957

Download citation

DOI: https://doi.org/10.1007/BFb0013957
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61376-3
Online ISBN: 978-3-540-68506-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics