Stochastic control via direct comparison

Cao, Xi-Ren; Wang, De-Xin; Lu, Tao; Xu, Yifan

doi:10.1007/s10626-010-0093-4

Stochastic control via direct comparison

Published: 01 October 2010

Volume 21, pages 11–38, (2011)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Xi-Ren Cao¹,
De-Xin Wang²,
Tao Lu² &
…
Yifan Xu³

238 Accesses
13 Citations
Explore all metrics

Abstract

The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic Dynamic Programming

References

Bertsekas DP (2007) Dynamic programming and optimal control, vols I and II. Athena Scientific, Belmont
Google Scholar
Bertsekas DP, Tsitsiklis TN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
MATH Google Scholar
Billingsley P (1979) Probability and measure. Wiley, New York
MATH Google Scholar
Brockett R (2009) Stochastic control. Lecture Notes, Harvard University
Cao X-R (2003) From perturbation analysis to Markov decision processes and reinforcement learning. Discrete Event Dyn Syst 13:9–39
Article MATH MathSciNet Google Scholar
Cao X-R (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Automat Contr 49:2129–2142
Article Google Scholar
Cao X-R (2007) Stochastic learning and optimization—a sensitivity-based approach. Springer
Cao X-R (2009a) Stochastic control of continuous-time and continuous-state systems via direct comparison. In: The proceedings of the 48th IEEE conference on decision and control, pp 1593–1598
Cao X-R (2009b) A new model of continuous-time Markov processes and impulse stochastic control. In: The proceedings of the 48th IEEE conference on decision and control, pp 525–530
Cao XR, Zhang JY (2008) The Nth-order bias optimality for multichain Markov decision processes. IEEE Trans Automat Contr 53:496–508
Article MathSciNet Google Scholar
Cao X-R, Yuan XM, Qiu L (1996) A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans Automat Contr 41:1814–1817
Article MATH MathSciNet Google Scholar
Cassandras CG, Lafortune S (1999) Introduction to discrete event systems. Kluwer, Boston
MATH Google Scholar
De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51:850–865
Article MATH MathSciNet Google Scholar
Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. Wiley
Feinberg EA, Shwartz A (eds) (2002) Handbook of Markov decision processes: methods and application. Kluwer, Boston
Google Scholar
Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer
Glynn PW, Meyn SP (1996) A Lyapunov bound for solutions of the Poisson equation. Ann Probab 24:916–931
Article MATH MathSciNet Google Scholar
Hernandez-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Google Scholar
Ho YC, Cao X-R (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer, Boston
MATH Google Scholar
Hojgaard B, Taksar M (2009) Diffusion optimization models in insurance and finance. Preprint
Howard RA (1960) Dynamic programming and Markov processes. Wiley
Karatzas I, Shreve SE (1991) Brownian motion and stochastic calculus, 2nd edn. Springer
Kumar S, Muthuraman K (2004) A numerical method for solving singular stochastic control problems. Oper Res 52:563–582
Article MATH MathSciNet Google Scholar
Meyn SP (1997) The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans Automat Contr 42:1663–1680
Article MATH MathSciNet Google Scholar
Meyn SP, Tweedie RL (1993) Stability of Markovian processes III: Foster–Lyapunov criteria for continuous time processes. Adv Appl Probab 25:518–548
Article MATH MathSciNet Google Scholar
Meyn SP, Tweedie RL (2009) Markov chains and stochastic stability, 2nd edn. Cambridge University Press, London
MATH Google Scholar
Oksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer
Philbrick CR Jr, Kitanidis PK (2001) Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper Res 49:398–468
Article MATH MathSciNet Google Scholar
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New York
Book MATH Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley
Revuz D, Yor M (1991) Continuous martingales and Brownian motion. Springer
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge
Google Scholar
Schweitzer PJ (1968) Perturbation theory and finite Markov chains. J Appl Probab 5(2):401–413
Article MATH MathSciNet Google Scholar
Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40(5):1635–1660
Article MATH MathSciNet Google Scholar
Xia L, Cao X-R (2006) Relationship between perturbation realization factors with queueing models and Markov models. IEEE Trans Automat Contr 51(10):1699–1704
Article MathSciNet Google Scholar
Xia L, Chen X, Cao X-R (2009) Policy iteration of customer-average performance in queueing systems. Automatica 45:1639–1648
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Jiaotong University, Shanghai, China
Xi-Ren Cao
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
De-Xin Wang & Tao Lu
Fudan University, Shanghai, China
Yifan Xu

Authors

Xi-Ren Cao
View author publications
You can also search for this author in PubMed Google Scholar
De-Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi-Ren Cao.

Additional information

Xi-Ren Cao was supported in part by Hong Kong UGC under grant 610809, while Yifan Xu was supported in part under grant NSFC(70771028).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, XR., Wang, DX., Lu, T. et al. Stochastic control via direct comparison. Discrete Event Dyn Syst 21, 11–38 (2011). https://doi.org/10.1007/s10626-010-0093-4

Download citation

Received: 06 August 2010
Accepted: 15 September 2010
Published: 01 October 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10626-010-0093-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic control via direct comparison

Abstract

Access this article

Similar content being viewed by others

Stochastic Dynamic Programming

Stochastic Dynamic Programming

Stochastic Dynamic Programming

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stochastic control via direct comparison

Abstract

Access this article

Similar content being viewed by others

Stochastic Dynamic Programming

Stochastic Dynamic Programming

Stochastic Dynamic Programming

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation