Skip to main content
Log in

Improved iterative computation of the expected discounted return in Markov and semi-Markov chains

  • Published:
Zeitschrift für Operations Research Aims and scope Submit manuscript

Abstract

This paper seeks to reduce the computation needed by iterative methods to find the expected discounted return in a finite semi-Markov or Markov chain. Two new norm reducing extrapolations and a new iterative method are presented and shown to be convergent. Their application is illustrated on several 100 row problems. One of the extrapolations, the row sum extrapolation, appears to be promising.

Zusammenfassung

Die Arbeit zeigt, wie der Rechenaufwand iterativer Methoden zur Bestimmung des erwarteten diskontierten Nutzens einer endlichen Markowschen oder Semi-Markowschen Kette reduziert werden kann. Zwei neue normreduzierende Extrapolationen und eine neue iterative Methode werden dargestellt, und ihre Konvergenz wird gezeigt. Die Resultate mehrerer numerischer Testbeispiele werden angegeben.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Denardo, E.: Contraction mappings in the theory underlying dynamic programming, SIAM Review9, 1967, 165–177.

    Article  Google Scholar 

  • Doob, J.: Stochastic Processes. New York 1953.

  • Federgruen, A., andP. Schweitzer: A survey of asymptotic value-iteration for undiscounted Markovian decision processes. To appear in: Ed. by D. White. Proc. Intl. Conf. on Markov Decision Processes. Manchester 1979.

  • Federgruen, A., P. Schweitzer, andH. Tijms: Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl.65, 1978, 711–730.

    Article  Google Scholar 

  • Fox, L.: Finite-difference methods for elliptic boundary-value problems. The State of the Art in Numerical Analysis. Ed. by D. Jacobs. London-New York 1977, 799–881.

  • Hajnal, J.: The ergodic properties of nonhomogeneous Markov chains, Proc. Cambridge Phil. Soc.52, 1956, 67–77.

    Google Scholar 

  • —: Weak ergodicity in nonhomogeneous Markov chains. Proc. Cambridge Phil. Soc.54, 1958, 233–246.

    Google Scholar 

  • van Hee, K., A. Hordijk, andJ. van der Wahl: Successive approximations for convergent dynamic programming. Ed. by H. Tijms and J. Wessels. Markov Decision Theory. Math. Centre Tract 93, Amsterdam 1977, 183–212.

  • Hinderer, K.: Estimates for finite-stage dynamic programs. J. Math. Anal. Appl.55, 1976, 207–238.

    Article  Google Scholar 

  • -: On approximate solutions of finite stage dynamic programs. Ed. by M. Puterman. Dynamic Programming and Its Application, New York 1978.

  • Hinderer, K., andG. Hübner: On exact and approximate solutions of unstructured finite-stage dynamic programs. Ed. by H. Tijms and J. Wessels. Markov Decision Theory. Math. Centre Tract 93, Amsterdam 1977, 57–76.

  • Hitchcock, D., andJ. MacQueen: On computing the expected discounted return in a Markov chain. Naval Res. Logist. Quart.17, 1970, 237–241.

    Google Scholar 

  • Howard, R.: Dynamic Programming and Markov Processes. New York 1960.

  • Hübner, G.: Contraction properties of Markov decision models with applications to the elimination of non-optimal actions. Dynamische Optimierung, Bonner Math. Schriften98, 1977, 57–65.

    Google Scholar 

  • Jewell, W.: Markov-renewal programming. I: Formulation, finite return models. Opns. Res.11, 1963, 938–948.

    Google Scholar 

  • MacQueen, J.: A modified dynamic programming method for Markovian decision problems. J. Math. Anal. Appl.14, 1966, 38–43.

    Article  Google Scholar 

  • McDowell, L.: Variable Successive Over-Relaxation. Report Nr. 244, Department of Computer Science, University of Illinois, Urbana, Illinois, 1967.

    Google Scholar 

  • Morton, T.: On the asymptotic convergence rate of cost difference for Markovian decision processes. Opns. Res.19, 1971a, 244–248.

    Google Scholar 

  • —: Undiscounted Markov renewal programming via modified successive approximations. Opns. Res.19, 1971b, 1081–1089.

    Google Scholar 

  • Morton, T., andW. Wecker: Discounting, ergodicity, and convergence for Markov decision provesses. Man. Sci.23, 1977, 890–900.

    Google Scholar 

  • van Nunen, J.: A set of successive approximation methods for discounted Markovian decision problems. Z. Opns. Res.20, 1976a, 203–208.

    Google Scholar 

  • -: Contracting Markov Decision Processes. Math. Centre Tract 71, Amsterdam 1976a.

  • Porteus, E.: Some bounds for discounted sequential decision processes. Man. Sci.18, 1971, 7–11.

    Google Scholar 

  • —: Bounds and transformations for finite Markov decision chains, Opns. Res.23, 1975, 761–784.

    Google Scholar 

  • -: Overview of iterative methods for discounted finite Markov and semi-markov decision chains. To appear. Ed. by D. White. Intl. Conf. on Markov Decision Processes. Manchester 1979a.

  • -: Improved iterative computation of the expected discounted return in Markov and semi-Markov chains. Research Paper 443 Rev., Graduate School of Business, Stanford University, 1979b.

  • Porteus, E., andJ. Totten: An experiment in computing the expected discounted return in a finite Markov chain. Research Paper 250, Graduate School of Business, Stanford University, 1975.

  • —: Accelerated computation of the expected discounted return in a Markov chain. Opns. Res.26, 1978, 350–358.

    Google Scholar 

  • Puterman, M., andM. Shin: Modified policy iteration algorithms for discounted Markov decision problems24, 1978, 1127–1137.

    Google Scholar 

  • Reetz, D.: Decision exclusion algorithm for a class of Markovian decision processes. Z. Opns. Res.20, 1976, 125–131.

    Google Scholar 

  • Reid, J.: Sparse matrices. Ed. by D. Jacobs. The State of the Art in Numerical Analysis. New York 1977, 85–148.

  • Rothblum, U.: Iterated successive approximation for sequential decision processes. New Haven 1979.

  • Schweitzer, P.: Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl.34, 1971, 495–501.

    Article  Google Scholar 

  • Schweitzer, P., andA. Federgruen: Geometric convergence of value-iteration in multichain Markov renewal programming. Adv. Appl. Prob.11, 1979, 188–217.

    Google Scholar 

  • Settari, A., andK. Aziz: A generalization of the additive correction methods for the iterative solution of matrix equations. SIAM J. Numer. Anal.10, 1973, 506–521.

    Article  Google Scholar 

  • Varga, R.: Matrix Iterative Analysis. Englewood Cliffs 1962.

  • Verkhovsky, B.: Smoothing system design and parametric Markovian programming. Ed. by H. Tijms and J. Wessels. Markov Decision Theory. Math. Centre Tract 93, Amsterdam 1977, 105–117.

  • van der Wal, J.: A successive approximation algorithm for an undiscounted Markov decision process, Computing17, 1976, 157–162.

    Google Scholar 

  • —: Discounted Markov games: generalized policy iteration method. J. Optzn. Th. Appl.25, 1978, 125–138.

    Article  Google Scholar 

  • White, D.: Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl.6, 1963, 373–376.

    Article  Google Scholar 

  • Young, D.: Iterative Solution of Large Linear Systems. New York 1971.

  • Young, D., andR. Gregory: A Survey of Numerical Mathematics. Reading 1973.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Porteus, E.L. Improved iterative computation of the expected discounted return in Markov and semi-Markov chains. Zeitschrift für Operations Research 24, 155–170 (1980). https://doi.org/10.1007/BF01919243

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01919243

Keywords

Navigation