Abstract
In this study, we investigate a continuous-time infinite-horizon linear quadratic stochastic optimal control problem with multiplicative noise in control and state variables. Using the techniques of stochastic stability, exact observability, and stochastic approximation, a value iteration algorithm is developed to solve the corresponding generalized algebraic Riccati equation. Unlike the existing policy iteration algorithm, this algorithm does not rely on an initial stabilizing control. Further, this algorithm can also be used to compute policy evaluation steps that arise in the policy iteration algorithm. Herein, a simulation example is provided to validate the obtained results.
References
Wonham W M. On a matrix Riccati equation of stochastic control. SIAM J Control, 1968, 6: 681–697
Kolmanovsky V B, Shaikhet L E. Control of Systems with Aftereffect. Providence: American Mathematical Society, 1996
Yong J M, Zhou X Y. Stochastic Control: Hamiltonian Systems and HJB Equations. New York: Springer, 1999
Wang B C, Zhang H S, Zhang J F. Linear quadratic mean field social control with common noise: a directly decoupling method. Automatica, 2022, 146: 110619
Wang G C, Wu Z. A maximum principle for mean-field stochastic control system with noisy observation. Automatica, 2022, 137: 110135
Han Y C, Sun Y F. Stochastic linear quadratic optimal control problem for systems driven by fractional Brownian motions. Optim Control Appl Methods, 2019, 40: 900–913
Peng C C, Zhang W H. Multicriteria optimization problems of finite horizon stochastic cooperative linear-quadratic difference games. Sci China Inf Sci, 2022, 65: 172203
Hafayed M, Abba A, Abbas S. On partial-information optimal singular control problem for mean-field stochastic differential equations driven by Teugels martingales measures. Int J Control, 2016, 89: 397–410
Ait Rami M, Zhou X Y. Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans Automat Contr, 2000, 45: 1131–1143
Ait Rami M, Chen X, Moore J B, et al. Solvability and asymptotic behavior of generalized Riccati equations arising in indefinite stochastic LQ controls. IEEE Trans Automat Contr, 2001, 46: 428–440
Ni Y H, Fang H T. Policy iteration algorithm for singular controlled diffusion processes. SIAM J Control Optim, 2013, 51: 3844–3862
Damm T, Hinrichsen D. Newton’s method for a rational matrix equation occurring in stochastic control. Linear Algebra Its Appl, 2001, 332–334: 81–109
Zhang W H. Study on algebraic Riccati equation arising from infinite horizon stochastic LQ optimal control. Dissertation for Ph.D. Degree. Hangzhou: Zhejiang University, 1998
Sun J R, Yong J M. Stochastic linear quadratic optimal control problems in infinite horizon. Appl Math Optim, 2018, 78: 145–183
Vandenberghe L, Boyd S. A primal-dual potential reduction method for problems involving matrix inequalities. Math Programming, 1995, 69: 205–236
Wei Q L, Liu D R, Lin H Q. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern, 2016, 46: 840–853
Wei Q L, Lewis F L, Liu D R, et al. Discrete-time local value iteration adaptive dynamic programming: convergence analysis. IEEE Trans Syst Man Cybern Syst, 2018, 48: 875–891
Kleinman D L. On an iterative technique for Riccati equation computations. IEEE Trans Automat Contr, 1968, 13: 114–115
Bian T, Jiang Z P. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica, 2016, 71: 348–360
Jiang Z P, Bian T, Gao W N. Learning-based control: a tutorial and some recent results. FNT Syst Control, 2020, 8: 176–284
Xie K D, Yu X, Lan W Y. Optimal output regulation for unknown continuous-time linear systems by internal model and adaptive dynamic programming. Automatica, 2022, 146: 110564
Du K, Meng Q X, Zhang F. A Q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization. SIAM J Control Optim, 2022, 60: 1991–2015
Ljung L. Analysis of recursive stochastic algorithms. IEEE Trans Automat Contr, 1977, 22: 551–575
Kushner H J, Clark D S. Stochastic Approximation Methods for Constrained and Unconstrained Systems. New York: Springer, 1978
Kleinman D L. Numerical solution of the state dependent noise problem. IEEE Trans Automat Contr, 1976, 21: 419–420
Zhang W H, Chen B S. On stabilizability and exact observability of stochastic systems with their applications. Automatica, 2004, 40: 87–94
Khalil H K. Nonlinear Systems. Englewood Cliffs: Prentice Hall, 2002
Chen S P, Li X J, Zhou X Y. Stochastic linear quadratic regulators with indefinite control weight costs. SIAM J Control Optim, 1998, 36: 1685–1702
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61925306, 61821004, 11831010), National Key R&D Program of China (Grant No. 2022YFA1006103), and Natural Science Foundation of Shandong Province (Grant Nos. ZR2019ZD42, ZR2020ZD24). The authors greatly appreciate the efforts of anonymous reviewers, which have improved the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, G., Zhang, H. Value iteration algorithm for continuous-time linear quadratic stochastic optimal control problems. Sci. China Inf. Sci. 67, 122204 (2024). https://doi.org/10.1007/s11432-023-3820-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-3820-3