Abstract
The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.
Similar content being viewed by others
References
Bertsekas DP (2007) Dynamic programming and optimal control, vols I and II. Athena Scientific, Belmont
Bertsekas DP, Tsitsiklis TN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Billingsley P (1979) Probability and measure. Wiley, New York
Brockett R (2009) Stochastic control. Lecture Notes, Harvard University
Cao X-R (2003) From perturbation analysis to Markov decision processes and reinforcement learning. Discrete Event Dyn Syst 13:9–39
Cao X-R (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Automat Contr 49:2129–2142
Cao X-R (2007) Stochastic learning and optimization—a sensitivity-based approach. Springer
Cao X-R (2009a) Stochastic control of continuous-time and continuous-state systems via direct comparison. In: The proceedings of the 48th IEEE conference on decision and control, pp 1593–1598
Cao X-R (2009b) A new model of continuous-time Markov processes and impulse stochastic control. In: The proceedings of the 48th IEEE conference on decision and control, pp 525–530
Cao XR, Zhang JY (2008) The Nth-order bias optimality for multichain Markov decision processes. IEEE Trans Automat Contr 53:496–508
Cao X-R, Yuan XM, Qiu L (1996) A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans Automat Contr 41:1814–1817
Cassandras CG, Lafortune S (1999) Introduction to discrete event systems. Kluwer, Boston
De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51:850–865
Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. Wiley
Feinberg EA, Shwartz A (eds) (2002) Handbook of Markov decision processes: methods and application. Kluwer, Boston
Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer
Glynn PW, Meyn SP (1996) A Lyapunov bound for solutions of the Poisson equation. Ann Probab 24:916–931
Hernandez-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Ho YC, Cao X-R (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer, Boston
Hojgaard B, Taksar M (2009) Diffusion optimization models in insurance and finance. Preprint
Howard RA (1960) Dynamic programming and Markov processes. Wiley
Karatzas I, Shreve SE (1991) Brownian motion and stochastic calculus, 2nd edn. Springer
Kumar S, Muthuraman K (2004) A numerical method for solving singular stochastic control problems. Oper Res 52:563–582
Meyn SP (1997) The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans Automat Contr 42:1663–1680
Meyn SP, Tweedie RL (1993) Stability of Markovian processes III: Foster–Lyapunov criteria for continuous time processes. Adv Appl Probab 25:518–548
Meyn SP, Tweedie RL (2009) Markov chains and stochastic stability, 2nd edn. Cambridge University Press, London
Oksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer
Philbrick CR Jr, Kitanidis PK (2001) Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper Res 49:398–468
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New York
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley
Revuz D, Yor M (1991) Continuous martingales and Brownian motion. Springer
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge
Schweitzer PJ (1968) Perturbation theory and finite Markov chains. J Appl Probab 5(2):401–413
Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40(5):1635–1660
Xia L, Cao X-R (2006) Relationship between perturbation realization factors with queueing models and Markov models. IEEE Trans Automat Contr 51(10):1699–1704
Xia L, Chen X, Cao X-R (2009) Policy iteration of customer-average performance in queueing systems. Automatica 45:1639–1648
Author information
Authors and Affiliations
Corresponding author
Additional information
Xi-Ren Cao was supported in part by Hong Kong UGC under grant 610809, while Yifan Xu was supported in part under grant NSFC(70771028).
Rights and permissions
About this article
Cite this article
Cao, XR., Wang, DX., Lu, T. et al. Stochastic control via direct comparison. Discrete Event Dyn Syst 21, 11–38 (2011). https://doi.org/10.1007/s10626-010-0093-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10626-010-0093-4