Skip to main content
Log in

A structured pattern matrix algorithm for multichain Markov decision processes

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an average optimal policy through the decomposition of the state space into some communicating classes and a transient class. For each communicating class, a relatively optimal policy is found, which is used to find an optimal policy by applying the value iteration algorithm. Using a pattern matrix determining the behaviour pattern of the decision process, the decomposition of the state space is effectively done, so that the proposed algorithm simplifies the structured one given by the excellent Leizarowitz’s paper (Math Oper Res 28:553–586, 2003). Also, a numerical example is given to comprehend the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bather J (1973) Optimal decision procedures for finite Markov chains. II. Communicating systems. Adv Appl Probab 5:521–540

    Article  MATH  MathSciNet  Google Scholar 

  • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton, NJ

    Google Scholar 

  • Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev 9:165–177

    Article  MATH  MathSciNet  Google Scholar 

  • Denardo EV (1982) Dynamic programming: models and applications. Prentice-Hall Inc., Englewood Cliffs, NJ

    Google Scholar 

  • Federgruen A, Schweitzer PJ (1978) Discounted and undiscounted value-iteration in Markov decision problems: a survey. In: Dynamic programming and its applications. Proceedings of the conference, University of British Columbia, Vancouver, BC, 1977. Academic, New York, pp 23–52

  • Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Manage Sci 25(4):352–362

    MATH  MathSciNet  Google Scholar 

  • Hordijk A, Puterman ML (1987) On the convergence of policy iteration in finite state undiscounted Markov decision processes: the unichain case. Math Oper Res 12(1):163–176

    Article  MATH  MathSciNet  Google Scholar 

  • Howard RA (1960) Dynamic programming and Markov processes. The Technology Press of MIT, Cambridge

    MATH  Google Scholar 

  • Kemeny JG, Snell JL (1960) Finite Markov chains. In: The University series in undergraduate mathematics. D. Van Nostrand Co. Inc., Princeton-Toronto-London-New York

  • Leizarowitz A (2003) An algorithm to identify and compute average optimal policies in multichain Markov decision processes. Math Oper Res 28(3):553–586

    Article  MATH  MathSciNet  Google Scholar 

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York (A Wiley-Interscience Publication)

  • Schweitzer PJ (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J Math Anal Appl 34:495–501

    Article  MATH  MathSciNet  Google Scholar 

  • White DJ (1963) Dynamic programming, Markov chains, and the method of successive approximations. J Math Anal Appl 6:373–376

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masayuki Horiguchi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iki, T., Horiguchi, M. & Kurano, M. A structured pattern matrix algorithm for multichain Markov decision processes. Math Meth Oper Res 66, 545–555 (2007). https://doi.org/10.1007/s00186-006-0138-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-006-0138-5

Keywords

Navigation