Abstract
This note deals with Markov decision chains evolving on a denumerable state space. Under standard continuity-compactness requirements, an explicit example is provided to show that, with respect to a strong sample-path average reward criterion, the Lyapunov function condition does not ensure the existence of an optimal stationary policy.
Similar content being viewed by others
References
Hordijk, A.: Dynamic Programming and Potential Theory. Mathematical Centre Tract, vol. 51. Mathematisch Centrum, Amsterdam (1974)
Cavazos-Cadena, R., Montes-de-Oca, R.: Sample-path optimality in average Markov decision chains under a double Lyapunov function condition. In: Hernández-Hernández, D., Minjárez-Sosa, A. (eds.) Optimization, Control, and Applications of Stochastic Systems, In Honor of Onésimo Hernández-Lerma, pp. 31–57. Springer, New York (2012)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Thomas, L.C.: Connectedness conditions for denumerable state Markov decision processes. In: Hartley, R., Thomas, L.C., White, D.J. (eds.) Recent Developments in Markov Decision Processes, pp. 181–204. Academic Press, London (1980)
Cavazos-Cadena, R., Fernández-Gaucherand, E.: Denumerable controlled Markov chains with average reward criterion: sample path optimality. Math. Methods Oper. Res. 41, 89–108 (1995)
Lasserre, J.B.: Sample-path average optimality for Markov control processes. IEEE Trans. Autom. Control 44, 1966–1971 (1999)
Hunt, F.Y.: Sample path optimality for a Markov optimization problems. Stoch. Process. Appl. 115, 769–779 (2005)
Ross, S.M.: Applied Probability Models with Optimization Applications. Holden-Day, Oakland (1970)
Acknowledgements
This work was supported in part by the PSF Organization under Grant No. 012/300/02, and by CONACYT (México) and ASCR (Czech Republic) under Grant No. 171396.
The authors are grateful to the editor for helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cavazos-Cadena, R., Montes-de-Oca, R. & Sladký, K. A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion. J Optim Theory Appl 163, 674–684 (2014). https://doi.org/10.1007/s10957-013-0474-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-013-0474-6