A value iteration method for undiscounted multichain Markov decision processes

Ohno, K.

doi:10.1007/BF01919182

A value iteration method for undiscounted multichain Markov decision processes

Published: March 1988

Volume 32, pages 71–93, (1988)
Cite this article

Zeitschrift für Operations Research Aims and scope Submit manuscript

K. Ohno¹

163 Accesses
Explore all metrics

Abstract

This paper proposes a value iteration method which finds anε-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy isε-optimal.

Zusammenfassung

In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eineε-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für dieε-Optimalität einer Politik angegeben.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bather J (1973) Optimal decision procedures for finite Markov chains. III. Adv Appl Prob 5:541–553
Google Scholar
Breiman L (1964) Stopping-rule problems. In: Beckenbach EF (ed) Applied combinatorial mathematics. Academic Press, New York, pp 284–319
Google Scholar
Brown B (1965) On the iterative method of dynamic programming on a finite state space discrete time Markov process. Ann Math Statist 36:1279–1285
Google Scholar
Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J Appl Math 16:468–487
Article Google Scholar
Denardo EV (1971) Markov renewal programs with small interest rates. Ann Math Statist 42:477–496
Google Scholar
Denardo EV (1973) A Markov decision problem. In: Hu TC, Robinson SM (eds) Mathematical programming. Academic Press, New York, pp 33–68
Google Scholar
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Google Scholar
Federgruen A, Spreen D (1980) A new specification of the multichain policy iteration algorithm in undiscounted Markov renewal programs. Mgmt Sci 26:1211–1217
Google Scholar
Hordijk A, Kallenberg L (1979) Linear programming and Markov decision chains. Mgmt Sci 25:352–362
Google Scholar
Howard R (1960) Dynamic programming and Markov processes. MIT Press, Cambridge
Google Scholar
Kallenberg L (1982) Linear programming and finite Markovian control problems. Mathematical Centre Tract 148, Amsterdam
Morton TE (1971) Undiscounted Markov renewal programming via modified successive approximations. Opns Res 19:1081–1089
Google Scholar
Odoni AR (1969) On finding the maximal gain for Markov decision processes. Opns Res 17:857–860
Google Scholar
Ohno K (1985) Modified policy iteration algorithm with non-optimality tests for undiscounted Markov decision processes. Working paper, Konan Univ.
Ohno K, Ichiki K (1987) Computing optimal policies for controlled tandem queueing systems. Opns Res 35:121–126
Google Scholar
Platzman L (1977) Improved conditions for convergence in undiscounted Markov renewal programming. Opns Res 25:529–533
Google Scholar
Schweitzer PJ (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J Math Anal Appl 34:495–501
Article Google Scholar
Schweitzer PJ, Federgruen A (1977) The asymptotic behavior of undiscounted value iteration in Markov decision problem. Math Opns Res 2:360–381
Google Scholar
Schweitzer PJ, Federgruen A (1979) Geometric convergence of value-iteration in multichain Markov decision problems. Adv Appl Prob 11:188–217
Google Scholar
Schweitzer PJ (1984) A value-iteration scheme for undiscounted multichain Markov renewal programs. Zeit Opns Res 28:143–152
Google Scholar
Schweitzer PJ (1985) Iterative bounds on the relative value vector in undiscounted Markov renewal programming. Zeit Opns Res 29:269–284
Google Scholar
Spreen D (1981) A further anticycling rule in multichain policy iteration for undiscounted Markov renewal programs. Zeit Opns Res 25:225–233
Google Scholar
Veinott AF (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Statist 37:1284–1294
Google Scholar
Wal van der J (1980) The method of value oriented successive approximations for the average reward Markov decision process. OR Spektrum 1:233–242
Article Google Scholar
Wal van der J (1981) Stochastic dynamic programming. Mathematical Centre Tract 139, Amsterdam
White DJ (1963) Dynamic programming, Markov chain, and the method of successive approximations. J Math Anal Appl 6:373–376
Article Google Scholar
Whittle P (1983) Optimization over time, vol 2. John Wiley, Chichester
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, Nagoya Institute of Technology, Showa-ku, 466, Nagoya, Japan
K. Ohno

Authors

K. Ohno
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohno, K. A value iteration method for undiscounted multichain Markov decision processes. Zeitschrift für Operations Research 32, 71–93 (1988). https://doi.org/10.1007/BF01919182

Download citation

Received: 15 April 1987
Revised: 15 August 1987
Issue Date: March 1988
DOI: https://doi.org/10.1007/BF01919182

Key words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A value iteration method for undiscounted multichain Markov decision processes

Abstract

Zusammenfassung

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-Markov Decision Processes with Vector Pay-Offs

On Friedmann’s Subexponential Lower Bound for Zadeh’s Pivot Rule

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

A value iteration method for undiscounted multichain Markov decision processes

Abstract

Zusammenfassung

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semi-Markov Decision Processes with Vector Pay-Offs

On Friedmann’s Subexponential Lower Bound for Zadeh’s Pivot Rule

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now