Abstract.
We consider discrete time Markov Decision Process (MDP) with finite state and action spaces under average reward optimality criterion. The decomposition theory, in Ross and Varadarajan [11], leads to a natural partition of the state space into strongly communicating classes and a set of states that are transient under all stationary strategies. Then, an optimal pure strategy can be obtained from an optimal strategy for some smaller aggregated MDP. This decomposition gives an efficient method for solving large-scale MDPs. In this paper, we consider deterministic MDPs and we construct a simple algorithm, based on graph theory, to determine an aggregated optimal policy. In the case of MDPs without cycles, we propose an algorithm for computing aggregated optimal strategies. In the general case, we propose some new improving algorithms for computing aggregated optimal strategies.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Manuscript received: September 2000/Final version received: December 2000
Rights and permissions
About this article
Cite this article
Abbad, M., Daoui, C. Algorithms for aggregated limiting average Markov decision problems. Mathematical Methods of OR 53, 451–463 (2001). https://doi.org/10.1007/s001860100117
Issue Date:
DOI: https://doi.org/10.1007/s001860100117