Abstract
We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which usesexpected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.
Similar content being viewed by others
References
Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with an average cost criterion: A survey. SIAM J Control Optim 31:282–344
Bertsekas DP (1987) Dynamic programming: Deterministic and stochastic models. Prentice-Hall Englewood Cliffs
Borkar VS (1991) Topics in controlled Markov chains. Pitman Research Notes in Mathematics Series 240, Longman Scientific & Technical UK
Cavazos-Cadena R (1991) Recent results on conditions for the existence of average optimal stationary policies. Annals Operat Res 28:3–28
Cavazos-Cadena R (1992) Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl Math Optim 26:171–194
Cavazos-Cadena R, Hernández-Lerma O (1992) Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26:113–137
Cavazos-Cadena R, Fernández-Gaucherand E (1993) Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs. SIE Working paper 93-15, SIE Department The University of Arizona
Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer-Verlag New York
Ephremides A, Verdú S (1989) Control and optimization methods in communication networks. IEEE Trans Automat Control 34:930–942
Fernández-Gaucherand E, Arapostathis A, Marcus SI (1990) Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes. Syst Control Lett 15:425–432
Fernández-Gaucherand E, Ghosh MK, Marcus SI (1994) Controlled Markov processes on the infinite planning horizon: Weighted and overtaking cost criteria.ZOR: Methods and Models of Operations Research 39:131–155
Flynn J (1980) On optimality criteria for dynamic programs with long finite horizon. J Math Anal Appl 76:202–208
Foster FG (1953) On the stochastic processes associated with certain queueing processes. Ann Math Stat 24:355–360
Georgin J-P (1978) Contrôle des chaines de Markov sur des espaces arbitraires. Ann Inst H Poincaré, Sect B, 14:255–277
Ghosh MK, Marcus SI (1992) On strong average optimality of Markov decision processes with unbounded costs. Operat Res Lett 11:99–104
Hernández-Lerma O (1989) Adaptive Markov control processes. Springer-Verlag New York
Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameters. Lect Notes Operat Res Math Syst 33, Springer-Verlag Berlin
Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum Amsterdam
Ross SM (1970) Applied probability models with optimization applications. Holden-Day San Francisco
Ross SM (1983) Introduction to stochastic dynamic programming. Academic Press New York
Royden HL (1968) Real analysis, 2nd ed. Macmillan New York
Stidham S, Weber R (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291–314
Tijms HC (1986) Stochastic modelling and analysis: A computational approach. John Wiley Chichester
Yushkevich AA (1973) On a class of strategies in general Markov decision models. Theory Prob Applications 18:777–779
Author information
Authors and Affiliations
Additional information
Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under grant NSF-INT 9201430, and by CONACyT-MEXICO.
Partially supported by the MAXTOR Foundation for applied Probability and Statistics, under grant No. 01-01-56/04-93.
Research partially supported by the Engineering Foundation under grant RI-A-93-10, and by a grant from the AT&T Foundation.
Rights and permissions
About this article
Cite this article
Cavazos-Cadena, R., Fernández-Gaucherand, E. Denumerable controlled Markov chains with average reward criterion: Sample path optimality. ZOR - Methods and Models of Operations Research 41, 89–108 (1995). https://doi.org/10.1007/BF01415067
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01415067