Partially Observable Markov Decision Processes

Poupart, Pascal

doi:10.1007/978-0-387-30164-8_629

Partially Observable Markov Decision Processes

Pascal Poupart³

Reference work entry

1431 Accesses

Synonyms

POMDPs; Belief state Markov decision processes; Dynamic decision networks; Dual control

Definition

A partially observable Markov decision process (POMDP) refers to a class of sequential decision-making problems under uncertainty. This class includes problems with partially observable states and uncertain action effects. A POMDP is formally defined by a tuple \(\langle \mathcal{S},\ \mathcal{A},\ \mathcal{O},\ T,\ Z,\ R,\ {b}_{0},\ h,\ \gamma \rangle\) where \(\mathcal{S}\) is the set of states \(s,\ \mathcal{A}\) is the set of actions \(a,\ \mathcal{O}\) is the set of observations o, T(s, a, s ^′) = Pr(s ^′ | s, a) is the transition function indicating the probability of reaching s ^′ when executing a in s, Z(a, s ^′, o ^′) = Pr(o ^′ | a, s ^′) is the observation function indicating the probability of observing o ^′ in state s ^′ after executing a, R(s, a) ∈ ℛ is the reward function indicating the (immediate) expected utility of executing a in s, b ₀ = Pr(s ₀) is the distribution over...

This is a preview of subscription content, log in via an institution.

Recommended Reading

Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policygradient methods for POMDPs. In International Conference on Machine Learning, pp. 3–10.
Google Scholar
Amato, C., Bernstein, D. S., & Zilberstein, S. (2009). Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Journal of Autonomous Agents and Multi-Agent Systems, 21, 293–320.
Article Google Scholar
Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In International Joint Conferences on Artificial Intelligence, pp. 2418–2424.
Google Scholar
Aström, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–2005.
Article MathSciNet MATH Google Scholar
Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1168–1175
Google Scholar
Buede, D. M. (1999). Dynamic decision networks: An approach for solving the dual control problem. Cincinnati: Spring INFORMS.
Google Scholar
Drake, A. (1962). Observation of a Markov Process through a noisy channel. PhD thesis, Massachusetts Institute of Technology.
Google Scholar
Hansen, E. (1997). An improved policy iteration algorithm for partially observable MDPs. In Neural Information Processing Systems, pp. 1015–1021.
Google Scholar
Hauskrecht, M., & Fraser, H. S. F. (2010). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221–244.
Article Google Scholar
Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., & Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable markov decision process. Computer Vision and Image Understanding, 114, 503–519.
Article Google Scholar
Kaelbling, L. P., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Article MathSciNet MATH Google Scholar
Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Uncertainty in Artificial Intelligence, pp. 427–436.
Google Scholar
Pineau, J. & Gordon, G. (2005). POMDP planning for robust robot control. In International Symposium on Robotics Research, pp. 69–82.
Google Scholar
Pineau, J., Gordon, G. J., & Thrun, S. (2003). Policy-contingent abstraction for robust robot control. In Uncertainty in Artificial Intelligence, pp. 477–484.
Google Scholar
Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research, 27, 335–380.
MATH Google Scholar
Piotr, J. (2005). Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24, 49–79.
Google Scholar
Porta, J. M., Vlassis, N. A., Spaan, M. T. J., & Poupart, P. (2006). Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research, 7, 2329–2367.
MathSciNet Google Scholar
Poupart, P., & Boutilier, C. (2004). VDCBPI: An approximate scalable algorithm for large POMDPs. In Neural Information Processing Systems, pp. 1081–1088.
Google Scholar
Poupart, P., & Vlassis, N. (2008). Model-based Bayesian reinforcement learning in partially observable domains. In International Symposium on Artificial Intelligence and Mathematics (ISAIM).
Google Scholar
Puterman, M. L. (1994). Markov decision processes. New York: Wiley.
Book MATH Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Article Google Scholar
Ross, S., Chaib-Draa, B., & Pineau, J. (2007). Bayes-adaptive POMDPs. In Advances in Neural Information Processing Systems (NIPS).
Google Scholar
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, 663–704.
MathSciNet MATH Google Scholar
Roy, N., Gordon, G. J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.
Article MATH Google Scholar
Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Neural Information Processing Systems.
Google Scholar
Shani, G., Brafman, R. I., Shimony, S. E., & Poupart, P. (2008). Efficient ADD operations for point-based algorithms. In International Conference on Automated Planning and Scheduling, pp. 330–337.
Google Scholar
Sim, H. S., Kim, K.-E., Kim, J. H., Chang, D.-S., & Koo, M.-W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In Twenty-Third National Conference on Artificial Intelligence (AAAI), pp. 1088–1093.
Google Scholar
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21, 1071–1088.
Article MATH Google Scholar
Theocharous, G., & Mahadevan, S. (2002). Approximate planning with hierarchical partially observable Markov decision process models for robot navigation. In IEEE International Conference on Robotics and Automation, pp. 1347–1352.
Google Scholar
Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech & Language, 24, 562–588.
Article Google Scholar
Toussaint, M., Charlin, L., & Poupart, P. (2008). Hierarchical POMDP controller optimization by likelihood maximization. In Uncertainty in Artificial Intelligence, pp. 562–570.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Waterloo, Waterloo, Canada
Pascal Poupart

Authors

Pascal Poupart
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Poupart, P. (2011). Partially Observable Markov Decision Processes. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_629

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_629
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics