Abstract
This paper seeks to reduce the computation needed by iterative methods to find the expected discounted return in a finite semi-Markov or Markov chain. Two new norm reducing extrapolations and a new iterative method are presented and shown to be convergent. Their application is illustrated on several 100 row problems. One of the extrapolations, the row sum extrapolation, appears to be promising.
Zusammenfassung
Die Arbeit zeigt, wie der Rechenaufwand iterativer Methoden zur Bestimmung des erwarteten diskontierten Nutzens einer endlichen Markowschen oder Semi-Markowschen Kette reduziert werden kann. Zwei neue normreduzierende Extrapolationen und eine neue iterative Methode werden dargestellt, und ihre Konvergenz wird gezeigt. Die Resultate mehrerer numerischer Testbeispiele werden angegeben.
Similar content being viewed by others
References
Denardo, E.: Contraction mappings in the theory underlying dynamic programming, SIAM Review9, 1967, 165–177.
Doob, J.: Stochastic Processes. New York 1953.
Federgruen, A., andP. Schweitzer: A survey of asymptotic value-iteration for undiscounted Markovian decision processes. To appear in: Ed. by D. White. Proc. Intl. Conf. on Markov Decision Processes. Manchester 1979.
Federgruen, A., P. Schweitzer, andH. Tijms: Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl.65, 1978, 711–730.
Fox, L.: Finite-difference methods for elliptic boundary-value problems. The State of the Art in Numerical Analysis. Ed. by D. Jacobs. London-New York 1977, 799–881.
Hajnal, J.: The ergodic properties of nonhomogeneous Markov chains, Proc. Cambridge Phil. Soc.52, 1956, 67–77.
—: Weak ergodicity in nonhomogeneous Markov chains. Proc. Cambridge Phil. Soc.54, 1958, 233–246.
van Hee, K., A. Hordijk, andJ. van der Wahl: Successive approximations for convergent dynamic programming. Ed. by H. Tijms and J. Wessels. Markov Decision Theory. Math. Centre Tract 93, Amsterdam 1977, 183–212.
Hinderer, K.: Estimates for finite-stage dynamic programs. J. Math. Anal. Appl.55, 1976, 207–238.
-: On approximate solutions of finite stage dynamic programs. Ed. by M. Puterman. Dynamic Programming and Its Application, New York 1978.
Hinderer, K., andG. Hübner: On exact and approximate solutions of unstructured finite-stage dynamic programs. Ed. by H. Tijms and J. Wessels. Markov Decision Theory. Math. Centre Tract 93, Amsterdam 1977, 57–76.
Hitchcock, D., andJ. MacQueen: On computing the expected discounted return in a Markov chain. Naval Res. Logist. Quart.17, 1970, 237–241.
Howard, R.: Dynamic Programming and Markov Processes. New York 1960.
Hübner, G.: Contraction properties of Markov decision models with applications to the elimination of non-optimal actions. Dynamische Optimierung, Bonner Math. Schriften98, 1977, 57–65.
Jewell, W.: Markov-renewal programming. I: Formulation, finite return models. Opns. Res.11, 1963, 938–948.
MacQueen, J.: A modified dynamic programming method for Markovian decision problems. J. Math. Anal. Appl.14, 1966, 38–43.
McDowell, L.: Variable Successive Over-Relaxation. Report Nr. 244, Department of Computer Science, University of Illinois, Urbana, Illinois, 1967.
Morton, T.: On the asymptotic convergence rate of cost difference for Markovian decision processes. Opns. Res.19, 1971a, 244–248.
—: Undiscounted Markov renewal programming via modified successive approximations. Opns. Res.19, 1971b, 1081–1089.
Morton, T., andW. Wecker: Discounting, ergodicity, and convergence for Markov decision provesses. Man. Sci.23, 1977, 890–900.
van Nunen, J.: A set of successive approximation methods for discounted Markovian decision problems. Z. Opns. Res.20, 1976a, 203–208.
-: Contracting Markov Decision Processes. Math. Centre Tract 71, Amsterdam 1976a.
Porteus, E.: Some bounds for discounted sequential decision processes. Man. Sci.18, 1971, 7–11.
—: Bounds and transformations for finite Markov decision chains, Opns. Res.23, 1975, 761–784.
-: Overview of iterative methods for discounted finite Markov and semi-markov decision chains. To appear. Ed. by D. White. Intl. Conf. on Markov Decision Processes. Manchester 1979a.
-: Improved iterative computation of the expected discounted return in Markov and semi-Markov chains. Research Paper 443 Rev., Graduate School of Business, Stanford University, 1979b.
Porteus, E., andJ. Totten: An experiment in computing the expected discounted return in a finite Markov chain. Research Paper 250, Graduate School of Business, Stanford University, 1975.
—: Accelerated computation of the expected discounted return in a Markov chain. Opns. Res.26, 1978, 350–358.
Puterman, M., andM. Shin: Modified policy iteration algorithms for discounted Markov decision problems24, 1978, 1127–1137.
Reetz, D.: Decision exclusion algorithm for a class of Markovian decision processes. Z. Opns. Res.20, 1976, 125–131.
Reid, J.: Sparse matrices. Ed. by D. Jacobs. The State of the Art in Numerical Analysis. New York 1977, 85–148.
Rothblum, U.: Iterated successive approximation for sequential decision processes. New Haven 1979.
Schweitzer, P.: Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl.34, 1971, 495–501.
Schweitzer, P., andA. Federgruen: Geometric convergence of value-iteration in multichain Markov renewal programming. Adv. Appl. Prob.11, 1979, 188–217.
Settari, A., andK. Aziz: A generalization of the additive correction methods for the iterative solution of matrix equations. SIAM J. Numer. Anal.10, 1973, 506–521.
Varga, R.: Matrix Iterative Analysis. Englewood Cliffs 1962.
Verkhovsky, B.: Smoothing system design and parametric Markovian programming. Ed. by H. Tijms and J. Wessels. Markov Decision Theory. Math. Centre Tract 93, Amsterdam 1977, 105–117.
van der Wal, J.: A successive approximation algorithm for an undiscounted Markov decision process, Computing17, 1976, 157–162.
—: Discounted Markov games: generalized policy iteration method. J. Optzn. Th. Appl.25, 1978, 125–138.
White, D.: Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl.6, 1963, 373–376.
Young, D.: Iterative Solution of Large Linear Systems. New York 1971.
Young, D., andR. Gregory: A Survey of Numerical Mathematics. Reading 1973.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Porteus, E.L. Improved iterative computation of the expected discounted return in Markov and semi-Markov chains. Zeitschrift für Operations Research 24, 155–170 (1980). https://doi.org/10.1007/BF01919243
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01919243