Denumerable controlled Markov chains with average reward criterion: Sample path optimality

Cavazos-Cadena, Rolando; Fernández-Gaucherand, Emmanuel

doi:10.1007/BF01415067

Denumerable controlled Markov chains with average reward criterion: Sample path optimality

Articles
Published: February 1995

Volume 41, pages 89–108, (1995)
Cite this article

Zeitschrift für Operations Research Aims and scope Submit manuscript

Rolando Cavazos-Cadena¹ &
Emmanuel Fernández-Gaucherand²

66 Accesses
13 Citations
Explore all metrics

Abstract

We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which usesexpected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

Conservative and Semiconservative Random Walks: Recurrence and Transience

Article 27 February 2017

Symmetric Markov Processes with Tightness Property

References

Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with an average cost criterion: A survey. SIAM J Control Optim 31:282–344
Google Scholar
Bertsekas DP (1987) Dynamic programming: Deterministic and stochastic models. Prentice-Hall Englewood Cliffs
Google Scholar
Borkar VS (1991) Topics in controlled Markov chains. Pitman Research Notes in Mathematics Series 240, Longman Scientific & Technical UK
Google Scholar
Cavazos-Cadena R (1991) Recent results on conditions for the existence of average optimal stationary policies. Annals Operat Res 28:3–28
Google Scholar
Cavazos-Cadena R (1992) Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl Math Optim 26:171–194
Google Scholar
Cavazos-Cadena R, Hernández-Lerma O (1992) Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26:113–137
Google Scholar
Cavazos-Cadena R, Fernández-Gaucherand E (1993) Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs. SIE Working paper 93-15, SIE Department The University of Arizona
Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer-Verlag New York
Google Scholar
Ephremides A, Verdú S (1989) Control and optimization methods in communication networks. IEEE Trans Automat Control 34:930–942
Google Scholar
Fernández-Gaucherand E, Arapostathis A, Marcus SI (1990) Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes. Syst Control Lett 15:425–432
Google Scholar
Fernández-Gaucherand E, Ghosh MK, Marcus SI (1994) Controlled Markov processes on the infinite planning horizon: Weighted and overtaking cost criteria.ZOR: Methods and Models of Operations Research 39:131–155
Google Scholar
Flynn J (1980) On optimality criteria for dynamic programs with long finite horizon. J Math Anal Appl 76:202–208
Google Scholar
Foster FG (1953) On the stochastic processes associated with certain queueing processes. Ann Math Stat 24:355–360
Google Scholar
Georgin J-P (1978) Contrôle des chaines de Markov sur des espaces arbitraires. Ann Inst H Poincaré, Sect B, 14:255–277
Google Scholar
Ghosh MK, Marcus SI (1992) On strong average optimality of Markov decision processes with unbounded costs. Operat Res Lett 11:99–104
Google Scholar
Hernández-Lerma O (1989) Adaptive Markov control processes. Springer-Verlag New York
Google Scholar
Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameters. Lect Notes Operat Res Math Syst 33, Springer-Verlag Berlin
Google Scholar
Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum Amsterdam
Google Scholar
Ross SM (1970) Applied probability models with optimization applications. Holden-Day San Francisco
Google Scholar
Ross SM (1983) Introduction to stochastic dynamic programming. Academic Press New York
Google Scholar
Royden HL (1968) Real analysis, 2nd ed. Macmillan New York
Google Scholar
Stidham S, Weber R (1993) A survey of Markov decision models for control of networks of queues. Queueing Syst 13:291–314
Google Scholar
Tijms HC (1986) Stochastic modelling and analysis: A computational approach. John Wiley Chichester
Google Scholar
Yushkevich AA (1973) On a class of strategies in general Markov decision models. Theory Prob Applications 18:777–779
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, 25315, Saltillo, Coah. Mexico
Rolando Cavazos-Cadena
Systems & Industrial Engineering Department, The University of Arizona, 85721, Tucson, AZ, USA
Emmanuel Fernández-Gaucherand

Authors

Rolando Cavazos-Cadena
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Fernández-Gaucherand
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under grant NSF-INT 9201430, and by CONACyT-MEXICO.

Partially supported by the MAXTOR Foundation for applied Probability and Statistics, under grant No. 01-01-56/04-93.

Research partially supported by the Engineering Foundation under grant RI-A-93-10, and by a grant from the AT&T Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Fernández-Gaucherand, E. Denumerable controlled Markov chains with average reward criterion: Sample path optimality. ZOR - Methods and Models of Operations Research 41, 89–108 (1995). https://doi.org/10.1007/BF01415067

Download citation

Received: 15 November 1993
Revised: 15 July 1994
Issue Date: February 1995
DOI: https://doi.org/10.1007/BF01415067

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Denumerable controlled Markov chains with average reward criterion: Sample path optimality

Abstract

Access this article

Similar content being viewed by others

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Conservative and Semiconservative Random Walks: Recurrence and Transience

Symmetric Markov Processes with Tightness Property

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Denumerable controlled Markov chains with average reward criterion: Sample path optimality

Abstract

Access this article

Similar content being viewed by others

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Conservative and Semiconservative Random Walks: Recurrence and Transience

Symmetric Markov Processes with Tightness Property

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation