Skip to main content

Partially Observable Markov Decision Processes

  • Reference work entry
  • 1431 Accesses

Synonyms

POMDPs; Belief state Markov decision processes; Dynamic decision networks; Dual control

Definition

A partially observable Markov decision process (POMDP) refers to a class of sequential decision-making problems under uncertainty. This class includes problems with partially observable states and uncertain action effects. A POMDP is formally defined by a tuple \(\langle \mathcal{S},\ \mathcal{A},\ \mathcal{O},\ T,\ Z,\ R,\ {b}_{0},\ h,\ \gamma \rangle\) where \(\mathcal{S}\) is the set of states \(s,\ \mathcal{A}\) is the set of actions \(a,\ \mathcal{O}\) is the set of observations o,  T(s,  a,  s ) = Pr(s | s, a) is the transition function indicating the probability of reaching s when executing a in s, Z(a, s , o ) = Pr(o | a, s ) is the observation function indicating the probability of observing o in state s after executing a,  R(s, a) ∈ is the reward function indicating the (immediate) expected utility of executing a in s, b 0 = Pr(s 0) is the distribution over...

This is a preview of subscription content, log in via an institution.

Recommended Reading

  • Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policygradient methods for POMDPs. In International Conference on Machine Learning, pp. 3–10.

    Google Scholar 

  • Amato, C., Bernstein, D. S., & Zilberstein, S. (2009). Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Journal of Autonomous Agents and Multi-Agent Systems, 21, 293–320.

    Article  Google Scholar 

  • Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In International Joint Conferences on Artificial Intelligence, pp. 2418–2424.

    Google Scholar 

  • Aström, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–2005.

    Article  MathSciNet  MATH  Google Scholar 

  • Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1168–1175

    Google Scholar 

  • Buede, D. M. (1999). Dynamic decision networks: An approach for solving the dual control problem. Cincinnati: Spring INFORMS.

    Google Scholar 

  • Drake, A. (1962). Observation of a Markov Process through a noisy channel. PhD thesis, Massachusetts Institute of Technology.

    Google Scholar 

  • Hansen, E. (1997). An improved policy iteration algorithm for partially observable MDPs. In Neural Information Processing Systems, pp. 1015–1021.

    Google Scholar 

  • Hauskrecht, M., & Fraser, H. S. F. (2010). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221–244.

    Article  Google Scholar 

  • Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., & Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable markov decision process. Computer Vision and Image Understanding, 114, 503–519.

    Article  Google Scholar 

  • Kaelbling, L. P., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.

    Article  MathSciNet  MATH  Google Scholar 

  • Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Uncertainty in Artificial Intelligence, pp. 427–436.

    Google Scholar 

  • Pineau, J. & Gordon, G. (2005). POMDP planning for robust robot control. In International Symposium on Robotics Research, pp. 69–82.

    Google Scholar 

  • Pineau, J., Gordon, G. J., & Thrun, S. (2003). Policy-contingent abstraction for robust robot control. In Uncertainty in Artificial Intelligence, pp. 477–484.

    Google Scholar 

  • Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research, 27, 335–380.

    MATH  Google Scholar 

  • Piotr, J. (2005). Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24, 49–79.

    Google Scholar 

  • Porta, J. M., Vlassis, N. A., Spaan, M. T. J., & Poupart, P. (2006). Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research, 7, 2329–2367.

    MathSciNet  Google Scholar 

  • Poupart, P., & Boutilier, C. (2004). VDCBPI: An approximate scalable algorithm for large POMDPs. In Neural Information Processing Systems, pp. 1081–1088.

    Google Scholar 

  • Poupart, P., & Vlassis, N. (2008). Model-based Bayesian reinforcement learning in partially observable domains. In International Symposium on Artificial Intelligence and Mathematics (ISAIM).

    Google Scholar 

  • Puterman, M. L. (1994). Markov decision processes. New York: Wiley.

    Book  MATH  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.

    Article  Google Scholar 

  • Ross, S., Chaib-Draa, B., & Pineau, J. (2007). Bayes-adaptive POMDPs. In Advances in Neural Information Processing Systems (NIPS).

    Google Scholar 

  • Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, 663–704.

    MathSciNet  MATH  Google Scholar 

  • Roy, N., Gordon, G. J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.

    Article  MATH  Google Scholar 

  • Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Neural Information Processing Systems.

    Google Scholar 

  • Shani, G., Brafman, R. I., Shimony, S. E., & Poupart, P. (2008). Efficient ADD operations for point-based algorithms. In International Conference on Automated Planning and Scheduling, pp. 330–337.

    Google Scholar 

  • Sim, H. S., Kim, K.-E., Kim, J. H., Chang, D.-S., & Koo, M.-W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In Twenty-Third National Conference on Artificial Intelligence (AAAI), pp. 1088–1093.

    Google Scholar 

  • Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21, 1071–1088.

    Article  MATH  Google Scholar 

  • Theocharous, G., & Mahadevan, S. (2002). Approximate planning with hierarchical partially observable Markov decision process models for robot navigation. In IEEE International Conference on Robotics and Automation, pp. 1347–1352.

    Google Scholar 

  • Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech & Language, 24, 562–588.

    Article  Google Scholar 

  • Toussaint, M., Charlin, L., & Poupart, P. (2008). Hierarchical POMDP controller optimization by likelihood maximization. In Uncertainty in Artificial Intelligence, pp. 562–570.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Poupart, P. (2011). Partially Observable Markov Decision Processes. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_629

Download citation

Publish with us

Policies and ethics