Skip to main content
Log in

Denumerable controlled Markov chains with average reward criterion: Sample path optimality

  • Articles
  • Published:
Zeitschrift für Operations Research Aims and scope Submit manuscript

Abstract

We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which usesexpected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with an average cost criterion: A survey. SIAM J Control Optim 31:282–344

    Google Scholar 

  • Bertsekas DP (1987) Dynamic programming: Deterministic and stochastic models. Prentice-Hall Englewood Cliffs

    Google Scholar 

  • Borkar VS (1991) Topics in controlled Markov chains. Pitman Research Notes in Mathematics Series 240, Longman Scientific & Technical UK

    Google Scholar 

  • Cavazos-Cadena R (1991) Recent results on conditions for the existence of average optimal stationary policies. Annals Operat Res 28:3–28

    Google Scholar 

  • Cavazos-Cadena R (1992) Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl Math Optim 26:171–194

    Google Scholar 

  • Cavazos-Cadena R, Hernández-Lerma O (1992) Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26:113–137

    Google Scholar 

  • Cavazos-Cadena R, Fernández-Gaucherand E (1993) Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs. SIE Working paper 93-15, SIE Department The University of Arizona

  • Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer-Verlag New York

    Google Scholar 

  • Ephremides A, Verdú S (1989) Control and optimization methods in communication networks. IEEE Trans Automat Control 34:930–942

    Google Scholar 

  • Fernández-Gaucherand E, Arapostathis A, Marcus SI (1990) Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes. Syst Control Lett 15:425–432

    Google Scholar 

  • Fernández-Gaucherand E, Ghosh MK, Marcus SI (1994) Controlled Markov processes on the infinite planning horizon: Weighted and overtaking cost criteria.ZOR: Methods and Models of Operations Research 39:131–155

    Google Scholar 

  • Flynn J (1980) On optimality criteria for dynamic programs with long finite horizon. J Math Anal Appl 76:202–208

    Google Scholar 

  • Foster FG (1953) On the stochastic processes associated with certain queueing processes. Ann Math Stat 24:355–360

    Google Scholar 

  • Georgin J-P (1978) Contrôle des chaines de Markov sur des espaces arbitraires. Ann Inst H Poincaré, Sect B, 14:255–277

    Google Scholar 

  • Ghosh MK, Marcus SI (1992) On strong average optimality of Markov decision processes with unbounded costs. Operat Res Lett 11:99–104

    Google Scholar 

  • Hernández-Lerma O (1989) Adaptive Markov control processes. Springer-Verlag New York

    Google Scholar 

  • Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameters. Lect Notes Operat Res Math Syst 33, Springer-Verlag Berlin

    Google Scholar 

  • Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum Amsterdam

    Google Scholar 

  • Ross SM (1970) Applied probability models with optimization applications. Holden-Day San Francisco

    Google Scholar 

  • Ross SM (1983) Introduction to stochastic dynamic programming. Academic Press New York

    Google Scholar 

  • Royden HL (1968) Real analysis, 2nd ed. Macmillan New York

    Google Scholar 

  • Stidham S, Weber R (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291–314

    Google Scholar 

  • Tijms HC (1986) Stochastic modelling and analysis: A computational approach. John Wiley Chichester

    Google Scholar 

  • Yushkevich AA (1973) On a class of strategies in general Markov decision models. Theory Prob Applications 18:777–779

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under grant NSF-INT 9201430, and by CONACyT-MEXICO.

Partially supported by the MAXTOR Foundation for applied Probability and Statistics, under grant No. 01-01-56/04-93.

Research partially supported by the Engineering Foundation under grant RI-A-93-10, and by a grant from the AT&T Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Fernández-Gaucherand, E. Denumerable controlled Markov chains with average reward criterion: Sample path optimality. ZOR - Methods and Models of Operations Research 41, 89–108 (1995). https://doi.org/10.1007/BF01415067

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01415067

Key words

Navigation