Skip to main content
Log in

A value iteration method for undiscounted multichain Markov decision processes

  • Published:
Zeitschrift für Operations Research Aims and scope Submit manuscript

Abstract

This paper proposes a value iteration method which finds anε-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy isε-optimal.

Zusammenfassung

In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eineε-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für dieε-Optimalität einer Politik angegeben.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bather J (1973) Optimal decision procedures for finite Markov chains. III. Adv Appl Prob 5:541–553

    Google Scholar 

  • Breiman L (1964) Stopping-rule problems. In: Beckenbach EF (ed) Applied combinatorial mathematics. Academic Press, New York, pp 284–319

    Google Scholar 

  • Brown B (1965) On the iterative method of dynamic programming on a finite state space discrete time Markov process. Ann Math Statist 36:1279–1285

    Google Scholar 

  • Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J Appl Math 16:468–487

    Article  Google Scholar 

  • Denardo EV (1971) Markov renewal programs with small interest rates. Ann Math Statist 42:477–496

    Google Scholar 

  • Denardo EV (1973) A Markov decision problem. In: Hu TC, Robinson SM (eds) Mathematical programming. Academic Press, New York, pp 33–68

    Google Scholar 

  • Derman C (1970) Finite state Markovian decision processes. Academic Press, New York

    Google Scholar 

  • Federgruen A, Spreen D (1980) A new specification of the multichain policy iteration algorithm in undiscounted Markov renewal programs. Mgmt Sci 26:1211–1217

    Google Scholar 

  • Hordijk A, Kallenberg L (1979) Linear programming and Markov decision chains. Mgmt Sci 25:352–362

    Google Scholar 

  • Howard R (1960) Dynamic programming and Markov processes. MIT Press, Cambridge

    Google Scholar 

  • Kallenberg L (1982) Linear programming and finite Markovian control problems. Mathematical Centre Tract 148, Amsterdam

  • Morton TE (1971) Undiscounted Markov renewal programming via modified successive approximations. Opns Res 19:1081–1089

    Google Scholar 

  • Odoni AR (1969) On finding the maximal gain for Markov decision processes. Opns Res 17:857–860

    Google Scholar 

  • Ohno K (1985) Modified policy iteration algorithm with non-optimality tests for undiscounted Markov decision processes. Working paper, Konan Univ.

  • Ohno K, Ichiki K (1987) Computing optimal policies for controlled tandem queueing systems. Opns Res 35:121–126

    Google Scholar 

  • Platzman L (1977) Improved conditions for convergence in undiscounted Markov renewal programming. Opns Res 25:529–533

    Google Scholar 

  • Schweitzer PJ (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J Math Anal Appl 34:495–501

    Article  Google Scholar 

  • Schweitzer PJ, Federgruen A (1977) The asymptotic behavior of undiscounted value iteration in Markov decision problem. Math Opns Res 2:360–381

    Google Scholar 

  • Schweitzer PJ, Federgruen A (1979) Geometric convergence of value-iteration in multichain Markov decision problems. Adv Appl Prob 11:188–217

    Google Scholar 

  • Schweitzer PJ (1984) A value-iteration scheme for undiscounted multichain Markov renewal programs. Zeit Opns Res 28:143–152

    Google Scholar 

  • Schweitzer PJ (1985) Iterative bounds on the relative value vector in undiscounted Markov renewal programming. Zeit Opns Res 29:269–284

    Google Scholar 

  • Spreen D (1981) A further anticycling rule in multichain policy iteration for undiscounted Markov renewal programs. Zeit Opns Res 25:225–233

    Google Scholar 

  • Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Statist 37:1284–1294

    Google Scholar 

  • Wal van der J (1980) The method of value oriented successive approximations for the average reward Markov decision process. OR Spektrum 1:233–242

    Article  Google Scholar 

  • Wal van der J (1981) Stochastic dynamic programming. Mathematical Centre Tract 139, Amsterdam

  • White DJ (1963) Dynamic programming, Markov chain, and the method of successive approximations. J Math Anal Appl 6:373–376

    Article  Google Scholar 

  • Whittle P (1983) Optimization over time, vol 2. John Wiley, Chichester

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohno, K. A value iteration method for undiscounted multichain Markov decision processes. Zeitschrift für Operations Research 32, 71–93 (1988). https://doi.org/10.1007/BF01919182

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01919182

Key words

Navigation