Synonyms
Definition
A Markov Decision Process (MDP) is a discrete, stochastic, and generally finite model of a system to which some external control can be applied. Originally developed in the Operations Research and Statistics communities, MDPs, and their extension to Partially Observable Markov Decision Processes (POMDPs), are now commonly used in the study of reinforcement learning in the Artificial Intelligence and Robotics communities (Bellman, 1957; Bertsekas & Tsitsiklis, 1996Howard, 1960; Puterman, 1994; ). When used for reinforcement learning, firstly the parameters of an MDP are learned from data, and then the MDP is processed to choose a behavior.
Formally, an MDP is defined as a tuple: \(<\mathcal{S},\mathcal{A},T,R>\), where \(\mathcal{S}\) is a discrete set of states, \(\mathcal{A}\) is a discrete set of actions, \(T : \mathcal{S}\times \mathcal{A}\rightarrow (\mathcal{S}\rightarrow \mathbb{R})\) is a stochastic transition function, and \(R :...
Recommended Reading
Albus, J. S. (1981). Brains, behavior, and robotics. Peterborough: BYTE. ISBN: 0070009759.
Andre, D., Friedman, N., & Parr, R. (1997). Generalized prioritized sweeping. Neural and Information Processing Systems, pp. 1001–1007.
Andre, D., Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI).
Baird, L. C. (1995). Residual algorithms: reinforcement learning with function approximation. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (ICML95) (pp. 30–37). San Mateo: Morgan Kaufmann.
Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.
Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming.
Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303.
Gordon, G. J. (1995). Stable function approximation in dynamic programming (Technical report CMU-CS-95-103). School of Computer Science, Carnegie Mellon University.
Guestrin, C., et al. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399–468.
Hansen, E. A., & Zilberstein, S. (1998). Heuristic search in cyclic AND/OR graphs. Proceedings of the Fifteenth National Conference on Artificial Intelligence. http://rbr.cs.umass.edu/shlomo/papers/HZaaai98.html
Howard, R. A. (1960). Dynamic programming and Markov processes. Cambridge: MIT Press.
Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. European Conference on Machine Learning (ECML). Lecture Notes in Computer Science 4212, Springer, pp. 282–293.
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.
Moore, A. W., Baird, L., & Pack Kaelbling, L. (1999). Multi-value-functions: efficient automatic action hierarchies for multiple goal MDPs. International Joint Conference on Artificial Intelligence (IJCAI99).
Munos, R., & Moore, A. W. (2001). Variable resolution discretization in optimal control. Machine Learning, 1, 1–31.
Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and mathematical statistics. Applied probability and statistics section. New York: Wiley. ISBN: 0-471-61977-9.
St-Aubin, R., Hoey, J., & Boutilier, C. (2000). APRICODD: approximate policy construction using decision diagrams. NIPS-2000.
Sutton, R. S., Precup, D., & Singh, S. (1998). Intra-option learning about temporally abstract actions. Machine Learning: Proceedings of the Fifteenth International Conference (ICML98), Morgan Kaufmann, Madison, pp. 556–564.
Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Uther, W. (2011). Markov Decision Processes. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_512
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_512
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering