Abstract
The authors propose a data-driven direct adaptive control law based on the adaptive dynamic programming (ADP) algorithm for continuous-time stochastic linear systems with partially unknown system dynamics and infinite horizon quadratic risk-sensitive indices. The authors use online data of the system to iteratively solve the generalized algebraic Riccati equation (GARE) and to learn the optimal control law directly. For the case with measurable system noises, the authors show that the adaptive control law approximates the optimal control law as time goes on. For the case with unmeasurable system noises, the authors use the least-square solution calculated only from the measurable data instead of the real solution of the regression equation to iteratively solve the GARE. The authors also study the influences of the intensity of the system noises, the intensity of the exploration noises, the initial iterative matrix, and the sampling period on the convergence of the ADP algorithm. Finally, the authors present two numerical simulation examples to demonstrate the effectiveness of the proposed algorithms.
Similar content being viewed by others
References
Jacobson D H, Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Trans. Autom. Control, 1973, 18(2): 124–131.
Whittle P, Risk-sensitive linear/quadratic/Gaussian control, Adv. Appl. Prob., 1981, 13(4): 764–777.
Runolfsson T, The equivalence between infinite-horizon optimal control of stochastic systems with exponential-of-integral performance index and stochastic differential games, IEEE Trans. Autom. Control, 1994, 39(8): 1551–1563.
Basar T and Bernhard P, H∞ Optimal Control and Related Minimax Design Problems, Birkhäuser, Boston, 1995.
Basar T, Nash equilibria of risk-sensitive nonlinear stochastic differential games, J. Optimiz. Theory App., 1999, 100(3): 479–498.
Bielecki T R and Pliska S R, Risk-sensitive ICAPM with application to fixed-income management, IEEE Trans. Autom. Control, 2004, 49(3): 420–432.
Hata H, Risk sensitive asset management with lognormal interest rates, Asia-Pac. Financ. Mark., 2021, 28(2): 169–206.
Bensoussan A and Van Schuppen J H, Optimal control of partially observable stochastic systems with an exponential-of-integral performance index, SIAM J. Control Optim., 1985, 23(4): 599–613.
Moon J and Basar T, Linear quadratic risk-sensitive and robust mean field games, IEEE Trans. Autom. Control, 2017, 62(3): 1062–1077.
Brezas P and Smith M C, Linear quadratic optimal and risk-sensitive control for vehicle active suspensions, IEEE Trans. Control Syst. Technol., 2014, 22(2): 543–556.
Lim A E B and Zhou X Y, A new risk-sensitive maximum principle, IEEE Trans. Autom. Control, 2005, 50(7): 958–966.
Moon J, Duncan T E, and Basar T, Risk-sensitive zero-sum differential games, IEEE Trans. Autom. Control, 2019, 64(4): 1503–1518.
Basu A and Ghosh M K, Zero-sum risk-sensitive stochastic differential games, Math. Oper. Res., 2012, 37(3): 437–449.
Laub A J, A Schur method for solving algebraic Riccati equation, IEEE Trans. Autom. Control, 1979, 24(6): 913–921.
Bunse-Gerstner A and Mehrmann V, A symplectic QR like algorithm for the solution of the real algebraic Riccati equation, IEEE Trans. Autom. Control, 1986, 31(12): 1104–1113.
Dieci L, Some numerical considerations and Newton’s method revisited for solving algebraic Riccati equations, IEEE Trans. Autom. Control, 1991, 36(5): 608–616.
Datta B N, Numerical Methods for Linear Control Systems, Elsevier Academic Press, New York, 2004.
Xie X M and Ding F, Adaptive Control System, Tsinghua University Press, Beijing, 2004.
Duncan T E, Guo L, and Pasik-Duncan B, Adaptive continuous-time linear quadratic Gaussian control, IEEE Trans. Autom. Control, 1999, 44(9): 1653–1662.
Li W Q and Krstic M, Stochastic adaptive nonlinear control with filterless least squares, IEEE Trans. Autom. Control, 2021, 66(9): 3839–3905.
Liu N and Guo L, Stochastic adaptive linear quadratic differential games, IEEE Trans. Autom. Control, 2023, DOI: https://doi.org/10.1109/TAC.2023.3274863.
Li X Q, Xu Z G, Cui J R, et al., Suboptimal adaptive tracking control for FIR systems with binary-valued observations, Sci. China Inf. Sci., 2021, 64(7): 172202.
Duncan T E, Pasik-Duncan B, and Stettner L, Risk sensitive adaptive control of discrete time markov processes, Probab. Math. Stat., 2001, 21: 493–512.
Karmakar P and Bhatnagar S, On tight bounds for function approximation error in risk-sensitive reinforcement learning, Syst. Control Lett., 2021, 150: 104899.
Borkar V S, Q-learning for risk-sensitive control, Math. Oper. Res., 2002, 27(2): 294–311.
Ratliff L J and Mazumdar E, Inverse risk-sensitive reinforcement learning, IEEE Trans. Autom. Control, 2020, 65(3): 1256–1263.
Borkar V S and Meyn S P, Risk-sensitive optimal control for markov decision processes with monotone cost, Math. Oper. Res., 2002, 27(1): 192–209.
Werbos P J, Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D. dissertation, Harvard Univ., Cambridge, 1974.
Lewis F L and Vrabie D, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circ. Syst. Mag., 2009, 9(3): 32–50.
Sutton R S and Barto A G, Reinforcement Learning: An Introduction, MIT Press, Boston, 1998.
Zhu J, Wei Y T, Kang Y, et al., Adaptive deep reinforcement learning for non-stationary environments, Sci. China Inf. Sci., 2022, 65(10): 202204.
Bellman R E, Dynamic Programming, Princeton Univ. Press, Princeton, 1957.
Bellman R E, Adaptive Control Processes: A Guided Tour, Princeton Univ. Press, Princeton, 1961.
Murray J J, Cox C J, Lendaris G G, et al., Adaptive dynamic programming, IEEE Trans. Syst. Man, Cybern. C, Appl. Rev., 2002, 32(2): 140–153.
Vrabie D, Pastravanu O, Abu-Khalaf M, et al., Adaptive optimal control for continuous-time linear systems based on policy iteration, Automatica, 2009, 45(2): 477–484.
Jiang Y and Jiang Z P, Approximate dynamic programming for optimal stationary control with control-dependent noise, IEEE Trans. Neural Netw., 2011, 22(12): 2392–2398.
Vamvoudakis K G, Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach, Syst. Control Lett., 2017, 100: 14–20.
Bian T and Jiang Z P, Reinforcement learning for linear continuous-time systems: An incremental learning approach, IEEE/CAA J. Autom. Sinica, 2019, 6(2): 433–440.
Jiang Y and Jiang Z P, Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 2012, 48(10): 2699–2704.
Bian T and Jiang Z P, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica, 2016, 71: 348–360.
Jiang Y and Jiang Z P, Adaptive dynamic programming as a theory of sensorimotor control, Biol. Cybern., 2014, 108(4): 459–473.
Bian T, Jiang Y, and Jiang Z P, Adaptive dynamic programming for stochastic systems with state and control dependent noise, IEEE Trans. Autom. Control, 2016, 61(12): 4170–4175.
Bian T and Jiang Z P, Continuous-time robust dynamic programming, SIAM J. Control Optim., 2019, 57(6): 4150–4174.
Wu H N and Luo B, Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control, Inform. Sciences, 2013, 222: 472–485.
Rall L B, A note on the convergence of Newton’s method, SIAM J. Numer. Anal., 1974, 11(1): 34–36.
Kantorovich L V, Functional analysis and applied mathematics, Uspekhi Mat. Nauk, 1948, 3(6): 89–185.
Bian T and Jiang Z P, Stochastic and adaptive optimal control of uncertain interconnected systems: A data-driven approach, Syst. Control Lett., 2018, 115: 48–54.
Liu X K, Ce Y Y, and Li Y, Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming, Appl. Math. Comput., 2019, 363: 124568.
Jiang Y and Jiang Z P, A robust adaptive dynamic programming principle for sensorimotor control with signal-dependent noise, Journal of Systems Science & Complexity, 2015, 28(2): 261–288.
Wang J X and Li T, Distributed multi-area state estimation for power systems with switching communication graphs, IEEE Trans. Smart Grid, 2021, 12(1): 787–797.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare no conflict of interest.
Additional information
This work was supported in part by the National Natural Science Foundation of China under Grant No. 62261136550 and in part by the Basic Research Project of Shanghai Science and Technology Commission under Grant No. 20JC1414000.
This paper was recommended for publication by Editor YU Chengpu.
Rights and permissions
About this article
Cite this article
Qiao, N., Li, T. Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems. J Syst Sci Complex 37, 1446–1469 (2024). https://doi.org/10.1007/s11424-024-2421-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-024-2421-z