Abstract
Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions.
Similar content being viewed by others
References
D.P. Bertsekas and S.D. Shreve,Stochastic Optimal Control — The Discrete Time Case (Academic Press, 1978).
V.S. Borkar, Controlled Markov chains and stochastic networks, SIAM J. Control Optim. 21 (1983) 652–666.
V.S. Borkar, On minimum cost per unit time control of Markov chains, SIAM J. Control Optim. 22 (1984) 965–978.
J.L. Doob,Stochastic Processes (Wiley, New York, 1953).
S.N. Ethier and T.G. Kurtz,Markov Processes, Characterization and Convergence (Wiley, New York, 1986).
M. Kurano, Markov decision processes with a Borel measurable cost function — the average case, Math. Oper. Res. 11 (1986) 309–320.
M. Kurano, The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin, SIAM J. Control Optim. 27 (1989) 296–307.
M. Loève,Probability Theory, 2nd ed. (Van Nostrand, Princeton, NJ, 1960).
S.M. Ross, Arbitrary state Markovian decision processes, Ann. Math. Statist. 39 (1968) 2118–2122.
R.E. Strauch, Negative dynamic programming, Ann. Math. Statist. 37 (1966) 871–890.
H.C. Tijms, On dynamic programming with arbitrary state space, compact action space and the average return as criterion, Report BW 55/75, Math. Centrum, Amsterdam (1975).
J. Wijngaard, Stationary Markovian decision problems and perturbation theory of quasi-compact linear operators, Math. Oper. Res. 2 (1977) 91–102.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kurano, M. Average cost Markov decision processes under the hypothesis of Doeblin. Ann Oper Res 29, 375–385 (1991). https://doi.org/10.1007/BF02283606
Issue Date:
DOI: https://doi.org/10.1007/BF02283606