Abstract
The functional equations of Markov renewal programming with a scalar again rateg are v=max [q (f)−gT (f)+P(f) v;f ∈A] whereA is the Cartesian product set of allowed policies. When there functional equations are solved iteratively, convergent upper and lower bounds on the gain rateg were given by the author in J. Math. Anal. Appl. 34, 1971, 495–501. In this paper, an augmented iterative scheme is exhibited which supplies convergent upper and lower bounds on the value vector v as well.
Zusammenfassung
Die Funktionalgleichungen für undiskontierte Markoffsche Erneuerungsprogramme mit skalarem Durschnittsgewinng lauten v=max [q (f)−gT (f)+P (f)v;f ∈A] wobeiA die Menge der zulässigen Politiken ist. Konvergierende obere und untere Schranken für den Durchschnittsgewinn wurden vom Autor in J. Math. Anal. Appl. 34, 1971, 495–501 hergeleitet. In dieser Arbeit wird ein verbessertes iteratives Verfahren vorgestellt, das auch konvergierende obere und untere Schranken für den relativen Gewinn liefert.
Similar content being viewed by others
References
Federgruen A., andP.J. Schweitzer: A Lyapunov Function for Markov Renewal Programming. (in preparation).
Hastings N.A.J.: Bounds on the Gain Rate of a Markov Decision Process. Operations Research19, 1971, 240–244.
Howard R.A.: Semi-Markovian Decision Processes. Bull. Inter. Stat. Inst.40, part 2, 1963, 625–652.
Jewell W.S.: Markov Renewal Programming I and II. Operations Research11, 1963, 938–971.
MacQueen J.B.: A Modified Dynamic Programming Method for Markovian Decision Problems. J. Math. Anal. Appl.14, 1966, 38–43.
—: A Test for Suboptimal Actions in Markovian Decision Problems. Operations Research15, 1967, 559–561.
Odoni A. R.: On Finding the Maximal Gain for Markov Decision Processes. Operations Research17, 1969, 857–860.
Ohno K.: A unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-Markov Decision Processes. J. Opns. Res. Soc. Japan,24, 1981, 296–324.
Schweitzer P.J.: Multiple Policy Improvements in Undiscounted Markov Renewal Programming. Operations Research19, 1971a, 784–793.
—: Iterative Solution of the Functional Equations of Undiscounted Markov Renewal Programming. J. Math. Anal. Appl.34, 1971b, 495–501.
Schweitzer P.J., andA. Federgruen: The Asymptotic Behavior of Undiscounted Value-Iteration in Markov Decision Problems. Math. of Operations Research2, 1977, 360–381.
—: The Functional Equations of Undiscounted Markov Renewal Programming. Math. of Operations Research3, 1978, 308–322.
—: Geometric Convergence of Value-Iteration in Multichain Markov Decision Problems. Adv. Applied Probability11, 1979, 188–217.
White D.J.: Dynamic Programming, Markov Chains, and the Method of Successive Approximations. J. Math. Anal. Appl.6, 1963, 373–376.
-: Elimination of Non-optimal Actions in Markov Decision Processes. In Dynamic Programming and its Applications, ed. M. L. Puterman, Academic Press, 1978, 131–160.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schweitzer, P.J. Iterative bounds on the relative value vector in undiscounted Markov renewal programming. Zeitschrift für Operations Research 29, 269–284 (1985). https://doi.org/10.1007/BF01918760
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01918760