Abstract
In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an average optimal policy through the decomposition of the state space into some communicating classes and a transient class. For each communicating class, a relatively optimal policy is found, which is used to find an optimal policy by applying the value iteration algorithm. Using a pattern matrix determining the behaviour pattern of the decision process, the decomposition of the state space is effectively done, so that the proposed algorithm simplifies the structured one given by the excellent Leizarowitz’s paper (Math Oper Res 28:553–586, 2003). Also, a numerical example is given to comprehend the algorithm.
Similar content being viewed by others
References
Bather J (1973) Optimal decision procedures for finite Markov chains. II. Communicating systems. Adv Appl Probab 5:521–540
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton, NJ
Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev 9:165–177
Denardo EV (1982) Dynamic programming: models and applications. Prentice-Hall Inc., Englewood Cliffs, NJ
Federgruen A, Schweitzer PJ (1978) Discounted and undiscounted value-iteration in Markov decision problems: a survey. In: Dynamic programming and its applications. Proceedings of the conference, University of British Columbia, Vancouver, BC, 1977. Academic, New York, pp 23–52
Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Manage Sci 25(4):352–362
Hordijk A, Puterman ML (1987) On the convergence of policy iteration in finite state undiscounted Markov decision processes: the unichain case. Math Oper Res 12(1):163–176
Howard RA (1960) Dynamic programming and Markov processes. The Technology Press of MIT, Cambridge
Kemeny JG, Snell JL (1960) Finite Markov chains. In: The University series in undergraduate mathematics. D. Van Nostrand Co. Inc., Princeton-Toronto-London-New York
Leizarowitz A (2003) An algorithm to identify and compute average optimal policies in multichain Markov decision processes. Math Oper Res 28(3):553–586
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York (A Wiley-Interscience Publication)
Schweitzer PJ (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J Math Anal Appl 34:495–501
White DJ (1963) Dynamic programming, Markov chains, and the method of successive approximations. J Math Anal Appl 6:373–376
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Iki, T., Horiguchi, M. & Kurano, M. A structured pattern matrix algorithm for multichain Markov decision processes. Math Meth Oper Res 66, 545–555 (2007). https://doi.org/10.1007/s00186-006-0138-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-006-0138-5