Definition
A partially observable Markov decision process (POMDP) refers to a class of sequential decision-making problems under uncertainty. This class includes problems with partially observable states and uncertain action effects. A POMDP is formally defined by a tuple \(\langle \mathcal{S},\ \mathcal{A},\ \mathcal{O},\ T,\ Z,\ R,\ {b}_{0},\ h,\ \gamma \rangle\) where \(\mathcal{S}\) is the set of states \(s,\ \mathcal{A}\) is the set of actions \(a,\ \mathcal{O}\) is the set of observations o, T(s, a, s ′) = Pr(s ′ | s, a) is the transition function indicating the probability of reaching s ′ when executing a in s, Z(a, s ′, o ′) = Pr(o ′ | a, s ′) is the observation function indicating the probability of observing o ′ in state s ′ after executing a, R(s, a) ∈ ℛ is the reward function indicating the (immediate) expected utility of executing a in s, b 0 = Pr(s 0) is the distribution over...
This is a preview of subscription content, log in via an institution.
Recommended Reading
Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policygradient methods for POMDPs. In International Conference on Machine Learning, pp. 3–10.
Amato, C., Bernstein, D. S., & Zilberstein, S. (2009). Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Journal of Autonomous Agents and Multi-Agent Systems, 21, 293–320.
Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In International Joint Conferences on Artificial Intelligence, pp. 2418–2424.
Aström, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–2005.
Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1168–1175
Buede, D. M. (1999). Dynamic decision networks: An approach for solving the dual control problem. Cincinnati: Spring INFORMS.
Drake, A. (1962). Observation of a Markov Process through a noisy channel. PhD thesis, Massachusetts Institute of Technology.
Hansen, E. (1997). An improved policy iteration algorithm for partially observable MDPs. In Neural Information Processing Systems, pp. 1015–1021.
Hauskrecht, M., & Fraser, H. S. F. (2010). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221–244.
Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., & Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable markov decision process. Computer Vision and Image Understanding, 114, 503–519.
Kaelbling, L. P., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Uncertainty in Artificial Intelligence, pp. 427–436.
Pineau, J. & Gordon, G. (2005). POMDP planning for robust robot control. In International Symposium on Robotics Research, pp. 69–82.
Pineau, J., Gordon, G. J., & Thrun, S. (2003). Policy-contingent abstraction for robust robot control. In Uncertainty in Artificial Intelligence, pp. 477–484.
Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research, 27, 335–380.
Piotr, J. (2005). Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24, 49–79.
Porta, J. M., Vlassis, N. A., Spaan, M. T. J., & Poupart, P. (2006). Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research, 7, 2329–2367.
Poupart, P., & Boutilier, C. (2004). VDCBPI: An approximate scalable algorithm for large POMDPs. In Neural Information Processing Systems, pp. 1081–1088.
Poupart, P., & Vlassis, N. (2008). Model-based Bayesian reinforcement learning in partially observable domains. In International Symposium on Artificial Intelligence and Mathematics (ISAIM).
Puterman, M. L. (1994). Markov decision processes. New York: Wiley.
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Ross, S., Chaib-Draa, B., & Pineau, J. (2007). Bayes-adaptive POMDPs. In Advances in Neural Information Processing Systems (NIPS).
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, 663–704.
Roy, N., Gordon, G. J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.
Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Neural Information Processing Systems.
Shani, G., Brafman, R. I., Shimony, S. E., & Poupart, P. (2008). Efficient ADD operations for point-based algorithms. In International Conference on Automated Planning and Scheduling, pp. 330–337.
Sim, H. S., Kim, K.-E., Kim, J. H., Chang, D.-S., & Koo, M.-W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In Twenty-Third National Conference on Artificial Intelligence (AAAI), pp. 1088–1093.
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21, 1071–1088.
Theocharous, G., & Mahadevan, S. (2002). Approximate planning with hierarchical partially observable Markov decision process models for robot navigation. In IEEE International Conference on Robotics and Automation, pp. 1347–1352.
Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech & Language, 24, 562–588.
Toussaint, M., Charlin, L., & Poupart, P. (2008). Hierarchical POMDP controller optimization by likelihood maximization. In Uncertainty in Artificial Intelligence, pp. 562–570.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Poupart, P. (2011). Partially Observable Markov Decision Processes. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_629
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_629
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering