Skip to main content
Log in

Total reward criteria for unconstrained/constrained continuous-time Markov decision processes

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system’s primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. X. P. Guo, O. Hernández-Lerma, and T. Prieto-Rumeau, A survey of recent results on continuoustime Markov decision processes, TOP, 2006, 14(2): 177–246.

    Article  MathSciNet  MATH  Google Scholar 

  2. X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains, Ann. Appl. Probab., 2003a, 13: 363–388.

    Article  MathSciNet  MATH  Google Scholar 

  3. X. P. Guo and O. Hernández-Lerma, Continuous-time controlled Markov chains with discounted rewards, Acta. Appl. Math., 2003, 79: 195–216.

    Article  MathSciNet  MATH  Google Scholar 

  4. X. P. Guo and O. Hernández-Lerma, Constrained continuous-time Markov controlled processes with discounted criteria, Stochastic Anal. Appl., 2003, 21(2): 379–399.

    Article  MathSciNet  MATH  Google Scholar 

  5. O. Hernández-Lerma and T. E. Govindan, Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon, Acta. Appl. Math., 2001, 67: 277–293.

    Article  MathSciNet  MATH  Google Scholar 

  6. M. L. Puterman, Markov Decision Processes, Wiley, New York, 1994.

    Book  MATH  Google Scholar 

  7. X. P. Guo and X. R. Cao, Optimal control of ergodic continues-time Markov chains with average sample-path rewards, SIAM J. Control Optim., 2005, 44(1): 29–48.

    Article  MathSciNet  MATH  Google Scholar 

  8. X. P. Guo and O. Hernández-Lerma, Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion, IEEE Trans. Automat. Control., 2003, 48(2): 236–245.

    Article  MathSciNet  Google Scholar 

  9. X. P. Guo and K. Liu, A note on optimality conditions for continuous-time Markov decision processes with average cost criterion, IEEE Trans. Automat. Control, 2001, 46: 1984–1988.

    Article  MathSciNet  MATH  Google Scholar 

  10. X. P. Guo and W. P. Zhu, Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under average criterion, Anziam J., 2002, 43: 541–551.

    MathSciNet  MATH  Google Scholar 

  11. P. Kakumanu, Non-discounted continuous-time Markov decision processes with countable state space, SIAM J. Control, 1972, 10: 210–220.

    Article  MathSciNet  MATH  Google Scholar 

  12. M. E. Lewis and M. L. Puterman, A note on bias optimality in controlled queueing systems, J. Appl. Probab., 2000, 37: 300–305.

    Article  MathSciNet  MATH  Google Scholar 

  13. T. Prieto-Rumeau and O. Hernández-Lerma, Ergodic control of continuous-time Markov chains with pathwise constraints, SIAM J. Control Optim., 2008, 47(4): 1888–1908.

    Article  MathSciNet  MATH  Google Scholar 

  14. T. Prieto-Rumeau, Blackwell optimality in the class of Markov policies for continuous-time controlled Markov chains, Acta Appl. Math., 2006, 92: 77–96.

    Article  MathSciNet  MATH  Google Scholar 

  15. T. Prieto-Rumeau and O. Hernández-Lerma, The Laurent series, sensitive discount and Blackwell optimality for continuous-time controlled Markov chains, Math. Meth. Oper. Res., 2005a, 61: 123–145.

    Article  MATH  Google Scholar 

  16. T. Prieto-Rumeau and O. Hernández-Lerma, Bias optimality for continuous-time controlled Markov chains, SIAM J. Control Optim., 2006, 45: 51–73.

    Article  MathSciNet  MATH  Google Scholar 

  17. D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd Edition, Athena Scientific, Belmont, 2001.

    MATH  Google Scholar 

  18. O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.

    MATH  Google Scholar 

  19. E. Altman, Constrained Markov Decision Processes, Chapman and Hall/CRC, Boca Raton, FL, 1999.

    MATH  Google Scholar 

  20. J. Alvarez-Mena and O. Hernández-Lerma, Convergence of the optimal values of constrained Markov control processes, Math. Meth. Oper. Res., 2002, 55(3): 461–484.

    Article  MATH  Google Scholar 

  21. A. B. Piunovskiy, Optimal Control of Random Sequences in Problems with Constraints, Dordrecht, Kluwer, 1997.

    MATH  Google Scholar 

  22. Y. Serin and V. Kulkarni, Markov decision processes under observability constraints, Math. Meth. Oper. Res., 2005, 61: 311–328.

    Article  MathSciNet  MATH  Google Scholar 

  23. W. J. Anderson, Continuous-Time Markov Chains, Springer-Verlag, New York, 1991.

    MATH  Google Scholar 

  24. K. L. Chung, Markov Chains with Stationary Transition Probabilities, Springer-Verlag, Berlin, 1960.

    MATH  Google Scholar 

  25. L. E. Ye, X. P. Guo, and O. Hernández-Lerma, Existence and reguarity of a nonhomogeneous transition matrix under measurability conditions, J. Theor. Probab., 2008, 21: 604–627.

    Article  MATH  Google Scholar 

  26. F. J. Beutler and K. W. Ross, Optimal policies for controlled Markov chains with a constraint, J. Math. Anal. Appl., 1985, 112: 236–252.

    Article  MathSciNet  MATH  Google Scholar 

  27. X. P. Guo, Constrained nonhomogeneous Markov decision processes with expected total reward criterion, Acta Appl. Math. Sin., English Ser., 2000, 23: 230–235.

    Google Scholar 

  28. L. L. Zhang and X. P. Guo, Constrained continuous-time Markov decision processes with average criteria, Math. Meth. Oper. Res., 2007, 67: 323–340.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Guo.

Additional information

This research is supported by the National Natural Science Foundation of China under Grant Nos. 10925107 and 60874004.

This paper was recommended for publication by Editor Guohua ZOU.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, X., Zhang, L. Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. J Syst Sci Complex 24, 491–505 (2011). https://doi.org/10.1007/s11424-011-8004-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-011-8004-9

Key words

Navigation