Abstract
We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.
Similar content being viewed by others
References
E. Altman and P. Nain, Closed-loop control with delayed information, INRIA Report No. 1638,Performance '92, Newport, USA (1992).
E. Altman and S. Stidham, Optimality of monotonic policies for two-action Markovian decision processes, including information and action delays, Technical Report No. UNC/OR/TR-94-2, Department of Operations Research, University of North Carolina, Chapel Hill (1994).
D. Artiges, Routing to parallel servers with delay, Research Report, INRIA, Sophia Antipolis, France (1993).
D. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice-Hall, Englewood Cliffs, NJ, 1987).
T. Crabill, D. Gross, and M. Magazine, A classified bibliography of research on optimal design and control of queues, Oper. Res. 25 (1977) 219–232.
P. Glasserman and D. Yao, Monotone optimal control of permutable GSMPs, Math. Oper. Res. 19 (1994) 449–476.
P. Glasserman and D. Yao,Monotone Structure in Discrete-Event Systems (Wiley, New York, 1994).
K.F. Hinderer, On the structure of solutions of stochastic dynamic programs,Proc. 7th Conf. on Probability Theory, ed. M. Iosifescu (Editura Academiei Republicii Socialiste România, Bucharest, 1984) pp. 173–182.
S.G. Johansen and S. Stidham, Control of arrivals to a stochastic input-output system, Adv. Appl. Prob. 12 (1980) 972–999.
S.M. Ross,Stochastic Processes (Wiley, New York, 1983).
M. Schäl, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrscheinlichkeitstheorie verw. Gerb. 32 (1975) 179–196.
R. Serfozo, Monotone optimal policies for Markov decision processes,Stochastic Systems, II: Optimization, Vol. 6, ed. R. Wets, Mathematical Programming Studies (North-Holland, Amsterdam, New York, 1976) pp. 202–215.
R. Serfozo, Optimal control of random walks, birth and death processes, and queues, Adv. Appl. Prob. 13 (1981) 61–83.
M.J. Sobel, Optimal operation of queues, in:Mathematical Methods in Queueing Theory, Vol. 98, ed. A.B. Clarke, Lecture Notes in Economics and Mathematical Systems (Springer, Berlin, 1974) pp. 145–162.
S. Stidham, Optimal control of admission to a queueing system, IEEE Trans. Autom. Contr. 30 (1985) 705–713.
S. Stidham and N.U. Prabhu, Optimal control of queueing systems, in:Mathematical Methods in Queueing Theory, Vol. 98, ed. A.B. Clarke, Lecture Notes in Economics and Mathematical Systems (Springer, Berlin, 1974) pp. 263–294.
S. Stidham and R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems 13 (1993) 291–314.
D. Topkis, Minimizing a submodular function on a lattice, Oper. Res. 26 (1978) 305–321.
M. Veatch and L. Wein, Monotone control of queueing and production/inventory systems, Queueing Systems 12 (1992) 391–408.
R. Weber and S. Stidham, Control of service rates in networks of queues, Adv. Appl. Prob.24 (1987) 202–218.
C.C. White, Monotone control laws for noisy, countable-state Markov chains, Eur. J. Oper. Res. 5 (1980) 124–132.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Altman, E., Stidham, S. Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Syst 21, 267–291 (1995). https://doi.org/10.1007/BF01149165
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01149165