Abstract
We study a countable state and action Markov decision process with bounded rewards occurring up to the entrance into an absorbing set. Two optimality criteria are considered, the classical total reward criterion and a target-level criterion. For all discount factors smaller than a critical one, the standard results in dynamic programming (optimality equation, optimality of a decision rule, value iteration) are shown to hold. The value iteration is combined with an extrapolation giving upper and lower bounds to the value function at each step of iteration. The asymptotic behavior of the extrapolation method as well as the characterizations of the critical discount factor are based on the Perron-Frobenius theory for nonlinear operators. The special case of a Markov decision model with a random horizon is studied in detail. Finally, as a byproduct, an efficient computation of the mean entrance time of a Markov chain into an absorbing set is obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Bibliography
E. Altman. Applications of Markov Decision Processes in Communication Networks: a Survey. In E. Feinberg and A. Shwartz (eds): Markov Decision Processes, Models, Methods, Directions, and Open Problems. Kluwer, Bosten, 488–536, 2001.
R.E. Bellman. Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.
D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol I/II, Athena Scientific, Belmont, Massachusetts, 2000/2001.
M. Boukiz and Y. Kebir. Target-Level Criterion in Markov Decision Processes. J. Optim. Theory. Appl., 86: 1–15, 1995.
O. Hernández-Lerma and J. B. Lasserre. Further Topics on Discrete-Time Markov Control Processes, Springer, New York, 1999.
K. Hinderer and K.-H. Waldmann. The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Methods of Oper. Res., 57: 1–19, 2003.
K. Hinderer and K.-H. Waldmann. Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control and Optimization, 43: 2109–2131, 2005.
A. Hordijk (1974). Dynamic programming and Markov potential theory, Mathematical Centre Tracts 51, Amsterdam, 1974.
G. Hübner. Bounds and good policies in stationary finite-stage Markovian decision problems. Adv. Appl. Prob., 12: 154–173, 1980.
G. Latouche and V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modeling, ASA-SIAM series on statistics and applied probability, Philadelphia, 1999.
T. Ogiwara. Nonlinear Perron-Frobenius problem on an ordered Banach space. Japan J. Math., 21: 43–103, 1995.
S.R. Pliska. On the transient case for Markov decision chains with general state spaces. In M.L. Puterman (ed.). Dynamic Programming and Its Applications, Academic Press, New York, pages 335–349, 1978.
M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994.
H. Schellhaas. Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung. ZOR, 18: 91–104, 1974.
L.I. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York, 1999.
A.F. Veinott. Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Stat, 40: 1635–1660, 1969.
K.-H. Waldmann. On bounds for dynamic programs. Mathematics of Operations Research, 10: 220–232, 1985.
K.-H. Waldmann. Bounds for the Distribution of the Run length of One-Sided and Two-Sided CUSUM Quality Control Schemes. Technometrics, 28: 61–67, 1986a.
K.-H. Waldmann. Bounds for the Distribution of the Run length of Geometric Moving Average Charts. Appl. Statist. 36: 151–158, 1986b.
P. Whittle. Optimization Over Time, Vol II, John Wiley, New York, 1983.
C. Wu and Y. Lin. Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values. J. Math. Anal. Appl, 231: 47–60, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 CISM, Udine
About this chapter
Cite this chapter
Waldmann, KH. (2006). On Markov decision models with an absorbing set. In: Della Riccia, G., Dubois, D., Kruse, R., Lenz, HJ. (eds) Decision Theory and Multi-Agent Planning. CISM International Centre for Mechanical Sciences, vol 482. Springer, Vienna. https://doi.org/10.1007/3-211-38167-8_9
Download citation
DOI: https://doi.org/10.1007/3-211-38167-8_9
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-31787-7
Online ISBN: 978-3-211-38167-0
eBook Packages: EngineeringEngineering (R0)