Abstract
We consider the optimization of finite-state, finite-action Markov decision processes under constraints. Costs and constraints are of the discounted or average type, and possibly finite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic linear program, through which we investigate sensitivity issues. We establish conditions for the continuity of the optimal value in the discount factor. In particular, the optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one. This generalizes a well-known result for the unconstrained case. We also establish the continuity in the discount factor for certain non-stationary policies. We then discuss the sensitivity of optimal policies and optimal values to small changes in the transition matrix and in the instantaneous cost functions. The importance of the last two results is related to the performance of adaptive policies for constrained MDP under various cost criteria [3,5]. Finally, we establish the convergence of the optimal value for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem.
Similar content being viewed by others
References
E. Altman and A. Shwartz, Non-stationary policies for controlled Markov chains, EE Pub. 633, Technion, June (1987).
E. Altman and A. Shwartz, Markov decision problems and state-action frequencies, EE Pub. 692, Technion, November, 1988, SIAM J. Control Optim. 29, No. 4 (1991).
E. Altman and A. Shwartz, Adaptive control of constrained Markov chains, EE. Pub. 717, Technion, March 1989, IEEE Trans. Autom. Control AC-36(1991)454–462.
E. Altman and A. Shwartz, Adaptive control of constrained Markov chains,Trans. 14th Symp. on Operations Research, Ulm, Germany (1989).
E. Altman and A. Shwartz, Adaptive control of constrained Markov chains: Criteria and policies, Ann. Oper. Res. 28(1991)101–134.
V.S. Borkar, A convex analytic approach to Markov decision processes, Prob. Theor. Rel. Fields 78(1988)583–602.
V.S. Borkar, Controlled Markov chains with constraints, Preprint (revised) (1989).
G.B. Dantzig, J. Folkman and N. Shapiro, On the continuity of the minimum set of a continuous function, J. Math. Anal. Appl. 17(1967)519–548.
R. Dekker, Denumerable Markov decision chains: Optimal policies for small interest rates, Thesis, Institute for Applied Mathematics and Computer Science, University of Leiden (1984).
C. Derman,Finite State Markovian Decision Processes (Academic Press, 1970).
C. Derman and M. Klein, Some remarks in finite horizon Markovian decision models, Oper Res. 13(1965)272–278.
W.-R. Heilmann, Solving stochastic dynamic programming problems by linear programming — an annotated bibliography, Zeit. Oper. Res. 22(1978)43–53.
O. Hernández-Lerma,Adaptive Control of Markov Processes (Springer, 1989).
A. Hordijk and L.C.M. Kallenberg, Constrained undiscounted stochastic dynamic programming, Math. Oper. Res. 9, No. 2 (1984)276–289.
L.C.M. Kallenberg,Linear Programming and Finite Markovian Control Problems, Mathematical Centre Tracts 148, Amsterdam (1983).
A.S. Manne, Linear programming and sequential decisions, Manag. Sci. 6(1960)259–267.
P. Nain and K.W. Ross, Optimal priority assignment with hard constraint, IEEE Trans. Autom. Control AC-31, 10(1986)883–888.
K.W. Ross, Randomized and past-dependent policies for Markov decision processes with multiple constraints, Oper. Res. 37, No. 3 (May 1989).
K.W. Ross and B. Chen, Optimal scheduling of interactive and non-interactive traffic in telecommunication systems, IEEE Trans. Autom. Control AC-33, 3(1988)261–267.
M. Schäl, Estimation and control in discounted dynamic programming, Stochastics 20(1987)51–71.
A. Shwartz and A.M. Makowski, An optimal adaptive scheme for two competing queues with constraints, in:Analysis and Optimization of Systems, ed. A. Bensoussan and J.L. Lions, Lecture Notes in Control and Information Sciences (Springer, 1986), pp. 515–532.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Altman, E., Shwartz, A. Sensitivity of constrained Markov decision processes. Ann Oper Res 32, 1–22 (1991). https://doi.org/10.1007/BF02204825
Issue Date:
DOI: https://doi.org/10.1007/BF02204825