Abstract.
This work concerns finte-state Markov decision chains endowed with the long-run average reward criterion. Assuming that the optimality equation has a solution, it is shown that a nearly optimal stationary policy, as well as an approximation to the optimal average reward within a specified error, can be obtained in a finite number of steps of the value iteration method. These results extend others already available in the literature, which were established under more stringent restrictions on the ergodic structure of the decision process.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Manuscript received: October 2001/Final version received: February 2002
RID="*"
ID="*" The support of the PSF Organization under Grant No. 010/300/01-1 is deeply acknowledged.
Rights and permissions
About this article
Cite this article
Cavazos-Cadena, R., Cavazos-Cadena, R. Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains. Mathematical Methods of OR 56, 181–196 (2002). https://doi.org/10.1007/s001860200205
Issue Date:
DOI: https://doi.org/10.1007/s001860200205