Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information

Altman, Eitan; Stidham, Shaler

doi:10.1007/BF01149165

Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information

Published: September 1995

Volume 21, pages 267–291, (1995)
Cite this article

Queueing Systems Aims and scope Submit manuscript

Eitan Altman¹ &
Shaler Stidham Jr.²

295 Accesses
Explore all metrics

Abstract

We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Relationship between randomized F-policy and randomized N-policy in discrete-time queues

Article 15 September 2015

A Linear Programming Approach to Markov Reward Error Bounds for Queueing Networks

References

E. Altman and P. Nain, Closed-loop control with delayed information, INRIA Report No. 1638,Performance '92, Newport, USA (1992).
E. Altman and S. Stidham, Optimality of monotonic policies for two-action Markovian decision processes, including information and action delays, Technical Report No. UNC/OR/TR-94-2, Department of Operations Research, University of North Carolina, Chapel Hill (1994).
Google Scholar
D. Artiges, Routing to parallel servers with delay, Research Report, INRIA, Sophia Antipolis, France (1993).
Google Scholar
D. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice-Hall, Englewood Cliffs, NJ, 1987).
Google Scholar
T. Crabill, D. Gross, and M. Magazine, A classified bibliography of research on optimal design and control of queues, Oper. Res. 25 (1977) 219–232.
Google Scholar
P. Glasserman and D. Yao, Monotone optimal control of permutable GSMPs, Math. Oper. Res. 19 (1994) 449–476.
Google Scholar
P. Glasserman and D. Yao,Monotone Structure in Discrete-Event Systems (Wiley, New York, 1994).
Google Scholar
K.F. Hinderer, On the structure of solutions of stochastic dynamic programs,Proc. 7th Conf. on Probability Theory, ed. M. Iosifescu (Editura Academiei Republicii Socialiste România, Bucharest, 1984) pp. 173–182.
Google Scholar
S.G. Johansen and S. Stidham, Control of arrivals to a stochastic input-output system, Adv. Appl. Prob. 12 (1980) 972–999.
Google Scholar
S.M. Ross,Stochastic Processes (Wiley, New York, 1983).
Google Scholar
M. Schäl, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrscheinlichkeitstheorie verw. Gerb. 32 (1975) 179–196.
Google Scholar
R. Serfozo, Monotone optimal policies for Markov decision processes,Stochastic Systems, II: Optimization, Vol. 6, ed. R. Wets, Mathematical Programming Studies (North-Holland, Amsterdam, New York, 1976) pp. 202–215.
Google Scholar
R. Serfozo, Optimal control of random walks, birth and death processes, and queues, Adv. Appl. Prob. 13 (1981) 61–83.
Google Scholar
M.J. Sobel, Optimal operation of queues, in:Mathematical Methods in Queueing Theory, Vol. 98, ed. A.B. Clarke, Lecture Notes in Economics and Mathematical Systems (Springer, Berlin, 1974) pp. 145–162.
Google Scholar
S. Stidham, Optimal control of admission to a queueing system, IEEE Trans. Autom. Contr. 30 (1985) 705–713.
Google Scholar
S. Stidham and N.U. Prabhu, Optimal control of queueing systems, in:Mathematical Methods in Queueing Theory, Vol. 98, ed. A.B. Clarke, Lecture Notes in Economics and Mathematical Systems (Springer, Berlin, 1974) pp. 263–294.
Google Scholar
S. Stidham and R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems 13 (1993) 291–314.
Google Scholar
D. Topkis, Minimizing a submodular function on a lattice, Oper. Res. 26 (1978) 305–321.
Google Scholar
M. Veatch and L. Wein, Monotone control of queueing and production/inventory systems, Queueing Systems 12 (1992) 391–408.
Google Scholar
R. Weber and S. Stidham, Control of service rates in networks of queues, Adv. Appl. Prob.24 (1987) 202–218.
Google Scholar
C.C. White, Monotone control laws for noisy, countable-state Markov chains, Eur. J. Oper. Res. 5 (1980) 124–132.
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, Centre Sophia Antipolis, 2004, rte. des Lucioles, B.P. 93, 06902, Sophia Antipolis Cedex, France
Eitan Altman
Department of Operations Research, University of North Carolina, CB 3180, Smith Building, 27599-3180, Chapel Hill, NC, USA
Shaler Stidham Jr.

Authors

Eitan Altman
View author publications
You can also search for this author inPubMed Google Scholar
Shaler Stidham Jr.
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altman, E., Stidham, S. Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Syst 21, 267–291 (1995). https://doi.org/10.1007/BF01149165

Download citation

Received: 08 April 1994
Revised: 27 July 1995
Issue Date: September 1995
DOI: https://doi.org/10.1007/BF01149165

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Relationship between randomized F-policy and randomized N-policy in discrete-time queues

A Linear Programming Approach to Markov Reward Error Bounds for Queueing Networks

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now