Abstract
This paper proposes a value iteration method which finds anε-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy isε-optimal.
Zusammenfassung
In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eineε-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für dieε-Optimalität einer Politik angegeben.
Similar content being viewed by others
References
Bather J (1973) Optimal decision procedures for finite Markov chains. III. Adv Appl Prob 5:541–553
Breiman L (1964) Stopping-rule problems. In: Beckenbach EF (ed) Applied combinatorial mathematics. Academic Press, New York, pp 284–319
Brown B (1965) On the iterative method of dynamic programming on a finite state space discrete time Markov process. Ann Math Statist 36:1279–1285
Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J Appl Math 16:468–487
Denardo EV (1971) Markov renewal programs with small interest rates. Ann Math Statist 42:477–496
Denardo EV (1973) A Markov decision problem. In: Hu TC, Robinson SM (eds) Mathematical programming. Academic Press, New York, pp 33–68
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Federgruen A, Spreen D (1980) A new specification of the multichain policy iteration algorithm in undiscounted Markov renewal programs. Mgmt Sci 26:1211–1217
Hordijk A, Kallenberg L (1979) Linear programming and Markov decision chains. Mgmt Sci 25:352–362
Howard R (1960) Dynamic programming and Markov processes. MIT Press, Cambridge
Kallenberg L (1982) Linear programming and finite Markovian control problems. Mathematical Centre Tract 148, Amsterdam
Morton TE (1971) Undiscounted Markov renewal programming via modified successive approximations. Opns Res 19:1081–1089
Odoni AR (1969) On finding the maximal gain for Markov decision processes. Opns Res 17:857–860
Ohno K (1985) Modified policy iteration algorithm with non-optimality tests for undiscounted Markov decision processes. Working paper, Konan Univ.
Ohno K, Ichiki K (1987) Computing optimal policies for controlled tandem queueing systems. Opns Res 35:121–126
Platzman L (1977) Improved conditions for convergence in undiscounted Markov renewal programming. Opns Res 25:529–533
Schweitzer PJ (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J Math Anal Appl 34:495–501
Schweitzer PJ, Federgruen A (1977) The asymptotic behavior of undiscounted value iteration in Markov decision problem. Math Opns Res 2:360–381
Schweitzer PJ, Federgruen A (1979) Geometric convergence of value-iteration in multichain Markov decision problems. Adv Appl Prob 11:188–217
Schweitzer PJ (1984) A value-iteration scheme for undiscounted multichain Markov renewal programs. Zeit Opns Res 28:143–152
Schweitzer PJ (1985) Iterative bounds on the relative value vector in undiscounted Markov renewal programming. Zeit Opns Res 29:269–284
Spreen D (1981) A further anticycling rule in multichain policy iteration for undiscounted Markov renewal programs. Zeit Opns Res 25:225–233
Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Statist 37:1284–1294
Wal van der J (1980) The method of value oriented successive approximations for the average reward Markov decision process. OR Spektrum 1:233–242
Wal van der J (1981) Stochastic dynamic programming. Mathematical Centre Tract 139, Amsterdam
White DJ (1963) Dynamic programming, Markov chain, and the method of successive approximations. J Math Anal Appl 6:373–376
Whittle P (1983) Optimization over time, vol 2. John Wiley, Chichester
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ohno, K. A value iteration method for undiscounted multichain Markov decision processes. Zeitschrift für Operations Research 32, 71–93 (1988). https://doi.org/10.1007/BF01919182
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01919182