Skip to main content
Log in

Stochastic control via direct comparison

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bertsekas DP (2007) Dynamic programming and optimal control, vols I and II. Athena Scientific, Belmont

    Google Scholar 

  • Bertsekas DP, Tsitsiklis TN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Billingsley P (1979) Probability and measure. Wiley, New York

    MATH  Google Scholar 

  • Brockett R (2009) Stochastic control. Lecture Notes, Harvard University

  • Cao X-R (2003) From perturbation analysis to Markov decision processes and reinforcement learning. Discrete Event Dyn Syst 13:9–39

    Article  MATH  MathSciNet  Google Scholar 

  • Cao X-R (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Automat Contr 49:2129–2142

    Article  Google Scholar 

  • Cao X-R (2007) Stochastic learning and optimization—a sensitivity-based approach. Springer

  • Cao X-R (2009a) Stochastic control of continuous-time and continuous-state systems via direct comparison. In: The proceedings of the 48th IEEE conference on decision and control, pp 1593–1598

  • Cao X-R (2009b) A new model of continuous-time Markov processes and impulse stochastic control. In: The proceedings of the 48th IEEE conference on decision and control, pp 525–530

  • Cao XR, Zhang JY (2008) The Nth-order bias optimality for multichain Markov decision processes. IEEE Trans Automat Contr 53:496–508

    Article  MathSciNet  Google Scholar 

  • Cao X-R, Yuan XM, Qiu L (1996) A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans Automat Contr 41:1814–1817

    Article  MATH  MathSciNet  Google Scholar 

  • Cassandras CG, Lafortune S (1999) Introduction to discrete event systems. Kluwer, Boston

    MATH  Google Scholar 

  • De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51:850–865

    Article  MATH  MathSciNet  Google Scholar 

  • Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. Wiley

  • Feinberg EA, Shwartz A (eds) (2002) Handbook of Markov decision processes: methods and application. Kluwer, Boston

    Google Scholar 

  • Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer

  • Glynn PW, Meyn SP (1996) A Lyapunov bound for solutions of the Poisson equation. Ann Probab 24:916–931

    Article  MATH  MathSciNet  Google Scholar 

  • Hernandez-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York

    Google Scholar 

  • Ho YC, Cao X-R (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer, Boston

    MATH  Google Scholar 

  • Hojgaard B, Taksar M (2009) Diffusion optimization models in insurance and finance. Preprint

  • Howard RA (1960) Dynamic programming and Markov processes. Wiley

  • Karatzas I, Shreve SE (1991) Brownian motion and stochastic calculus, 2nd edn. Springer

  • Kumar S, Muthuraman K (2004) A numerical method for solving singular stochastic control problems. Oper Res 52:563–582

    Article  MATH  MathSciNet  Google Scholar 

  • Meyn SP (1997) The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans Automat Contr 42:1663–1680

    Article  MATH  MathSciNet  Google Scholar 

  • Meyn SP, Tweedie RL (1993) Stability of Markovian processes III: Foster–Lyapunov criteria for continuous time processes. Adv Appl Probab 25:518–548

    Article  MATH  MathSciNet  Google Scholar 

  • Meyn SP, Tweedie RL (2009) Markov chains and stochastic stability, 2nd edn. Cambridge University Press, London

    MATH  Google Scholar 

  • Oksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer

  • Philbrick CR Jr, Kitanidis PK (2001) Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper Res 49:398–468

    Article  MATH  MathSciNet  Google Scholar 

  • Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New York

    Book  MATH  Google Scholar 

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley

  • Revuz D, Yor M (1991) Continuous martingales and Brownian motion. Springer

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge

    Google Scholar 

  • Schweitzer PJ (1968) Perturbation theory and finite Markov chains. J Appl Probab 5(2):401–413

    Article  MATH  MathSciNet  Google Scholar 

  • Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40(5):1635–1660

    Article  MATH  MathSciNet  Google Scholar 

  • Xia L, Cao X-R (2006) Relationship between perturbation realization factors with queueing models and Markov models. IEEE Trans Automat Contr 51(10):1699–1704

    Article  MathSciNet  Google Scholar 

  • Xia L, Chen X, Cao X-R (2009) Policy iteration of customer-average performance in queueing systems. Automatica 45:1639–1648

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi-Ren Cao.

Additional information

Xi-Ren Cao was supported in part by Hong Kong UGC under grant 610809, while Yifan Xu was supported in part under grant NSFC(70771028).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, XR., Wang, DX., Lu, T. et al. Stochastic control via direct comparison. Discrete Event Dyn Syst 21, 11–38 (2011). https://doi.org/10.1007/s10626-010-0093-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-010-0093-4

Keywords

Navigation