Hostname: page-component-8448b6f56d-m8qmq Total loading time: 0 Render date: 2024-04-24T02:32:03.824Z Has data issue: false hasContentIssue false

Discounted Continuous-Time Controlled Markov Chains: Convergence of Control Models

Published online by Cambridge University Press:  30 January 2018

Tomás Prieto-Rumeau*
Affiliation:
Universidad Nacional de Educación a Distancia
Onésimo Hernández-Lerma*
Affiliation:
CINVESTAV-IPN
*
Postal address: Departamento de Estadística, Facultad de Ciencias, Universidad Nacional de Educación a Distancia, Calle Senda del Rey 9, 28040, Madrid, Spain. Email address: tprieto@ccia.uned.es
∗∗ Postal address: Departamento de Matemáticas, CINVESTAV-IPN, México D.F. 07000, México.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We are interested in continuous-time, denumerable state controlled Markov chains (CMCs), with compact Borel action sets, and possibly unbounded transition and reward rates, under the discounted reward optimality criterion. For such CMCs, we propose a definition of a sequence of control models {ℳn} converging to a given control model ℳ, which ensures that the discount optimal reward and policies of ℳn converge to those of ℳ. As an application, we propose a finite-state and finite-action truncation technique of the original control model ℳ, which is illustrated by approximating numerically the optimal reward and policy of a controlled population system with catastrophes. We study the corresponding convergence rates.

Type
Research Article
Copyright
© Applied Probability Trust 

References

Altman, E. (1994). Denumerable constrained Markov decision processes and finite approximations. Math. Operat. Res. 19, 169191.Google Scholar
Álvarez-Mena, J. and Hernández-Lerma, O. (2002). Convergence of the optimal values of constrained Markov control processes. Math. Meth. Operat. Res. 55, 461484.Google Scholar
Guo, X. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains with discounted rewards. Acta Appl. Math. 79, 195216.CrossRefGoogle Scholar
Guo, X. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans. Automatic Control 48, 236245.Google Scholar
Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.CrossRefGoogle Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.Google Scholar
Kushner, H. J. and Dupuis, P. (2001). Numerical Methods for Stochastic Control Problems in Continuous Time, 2nd edn. Springer, New York.CrossRefGoogle Scholar
Langen, H.-J. (1981). Convergence of dynamic programming models. Math. Operat. Res. 6, 493512.Google Scholar
Leizarowitz, A. and Shwartz, A. (2008). Exact finite approximations of average-cost countable Markov decision processes. Automatica J. IFAC 44, 14801487.Google Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2010). Policy iteration and finite approximations to discounted continuous-time controlled Markov chains. In Modern Trends in Controlled Stochastic Processes, ed. Piunovskiy, A. B., Luniver Press, pp. 84101.Google Scholar
Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201207.Google Scholar
Rudin, W. (1976). Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, New York.Google Scholar
Song, Q. S. (2008). Convergence of Markov chain approximation on generalized HJB equation and its applications. Automatica J. IFAC 44, 761766.Google Scholar
Tidball, M. M., Lombardi, A., Pourtallier, O. and Altman, E. (2000). Continuity of optimal values and solutions for control of Markov chains with constraints. SIAM J. Control Optimization 38, 12041222.CrossRefGoogle Scholar
Whitt, W. (1978). Approximation of dynamic programs. I. Math. Operat. Res. 3, 231243.Google Scholar