Abstract
This paper is devoted to studying continuous-time Markov decision processes with general state and action spaces, under the long-run expected average reward criterion. The transition rates of the underlying continuous-time Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We provide new sufficient conditions for the existence of average optimal policies. Moreover, such sufficient conditions are imposed on the controlled process’ primitive data and thus they are directly verifiable. Finally, we apply our results to two new examples.
Similar content being viewed by others
References
Dong ZQ (1979) Continuous-time Markov decision programming with average reward criterion countable state and action space. Sci Sinica SP ISS(II): 141–148
Doshi BT (1976) Continuous-time control of Markov processes on an arbitrary state space: discounted rewards. Ann Stat 4: 1219–1235
Feller W (1940) On the integro-differential equations of purely discontinuous Markoff processes. Trans Am Math Soc 48: 488–515
Guo XP (2007a) Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math Oper Res 32(1): 73–87
Guo XP (2007b) Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans Automat Control 52(6): 1139–1143
Guo XP, Hernández-Lerma O (2003) Drift and monotonicity conditions for continuous—time controlled Markov chains an average criterion. IEEE Trans Automat Control 48: 236–245
Guo XP, Hernández-Lerma (2009) Continuous-time Markov decision processes: theory and applications. Springer, New York
Guo XP, Rieder U (2006) Average optimality for continuous-time Markov decision processes in Polish spaces. Ann Appl Probab 16(2): 730–756
Guo XP, Ye LE (2008) New discount and average optimality conditions for continuous-time Markov decision processes (submitted)
Hernández-Lerma O (1994) Lectures on continuous-time Markov control processes. Sociedad Matemática Mexicana, México City
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes. Springer, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Huang YH, Guo XP First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Math Appl Sinica (to appear)
Kakumanu P (1972) Nondiscounted continuous-time Markov decision processes with contable state space. SIAM J Control 10: 210–220
Lewis ME, Puterman ML (2000) A note on bias optimality in controlled queueing systems. J Appl Probab 37: 300–305
Puterman ML (1994) Markov decision processes. Wiley, New York
Sennott LI (1999) Stochastic dynamic programming and the control of queueing system. Wiley, New York
Song JS (1987) Continuous-time Markov decision programming with non-uniformlu bounded transition rates. Sci Sinica 12: 1258–1267
Ye LE, Guo XP (2008) Construction and regularity of transition functions on Polish spaces under measurablity conditions (submitted)
Ye LE, Guo XP, Hernández-Lerma O (2008) Existence and regularity of a nonhomogeneous transition matrix under measurability conditions. J Theor Probab 21: 604–627
Zhu QX (2007) Average optimality inequality for continuous-time Markov decision processes in Polish spaces. Math Meth Oper Res 66: 299–313
Zhu QX (2008) Average optimality for continuous-time Markov decision processes with a policy iteration approach. J Math Anal Appl 339: 691–704
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported by NSFC.
Rights and permissions
About this article
Cite this article
Ye, L., Guo, X. New sufficient conditions for average optimality in continuous-time Markov decision processes. Math Meth Oper Res 72, 75–94 (2010). https://doi.org/10.1007/s00186-010-0307-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-010-0307-4
Keywords
- Average reward criterion
- Continuous-time Markov decision process
- Unbounded transition and reward rates
- Optimality two-inequality approach
- Optimal stationary policy