Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems

Qiao, Nan; Li, Tao

doi:10.1007/s11424-024-2421-z

Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems

Published: 11 June 2024

Volume 37, pages 1446–1469, (2024)
Cite this article

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Nan Qiao¹ &
Tao Li²

136 Accesses
Explore all metrics

Abstract

The authors propose a data-driven direct adaptive control law based on the adaptive dynamic programming (ADP) algorithm for continuous-time stochastic linear systems with partially unknown system dynamics and infinite horizon quadratic risk-sensitive indices. The authors use online data of the system to iteratively solve the generalized algebraic Riccati equation (GARE) and to learn the optimal control law directly. For the case with measurable system noises, the authors show that the adaptive control law approximates the optimal control law as time goes on. For the case with unmeasurable system noises, the authors use the least-square solution calculated only from the measurable data instead of the real solution of the regression equation to iteratively solve the GARE. The authors also study the influences of the intensity of the system noises, the intensity of the exploration noises, the initial iterative matrix, and the sampling period on the convergence of the ADP algorithm. Finally, the authors present two numerical simulation examples to demonstrate the effectiveness of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Article 05 April 2023

Comprehensive comparison of online ADP algorithms for continuous-time optimal control

Article 24 February 2017

Optimal Control of Unknown Discrete-Time Linear Systems with Additive Noise

Article 19 April 2023

References

Jacobson D H, Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Trans. Autom. Control, 1973, 18(2): 124–131.
Article MathSciNet Google Scholar
Whittle P, Risk-sensitive linear/quadratic/Gaussian control, Adv. Appl. Prob., 1981, 13(4): 764–777.
Article MathSciNet Google Scholar
Runolfsson T, The equivalence between infinite-horizon optimal control of stochastic systems with exponential-of-integral performance index and stochastic differential games, IEEE Trans. Autom. Control, 1994, 39(8): 1551–1563.
Article MathSciNet Google Scholar
Basar T and Bernhard P, H_∞ Optimal Control and Related Minimax Design Problems, Birkhäuser, Boston, 1995.
Google Scholar
Basar T, Nash equilibria of risk-sensitive nonlinear stochastic differential games, J. Optimiz. Theory App., 1999, 100(3): 479–498.
Article MathSciNet Google Scholar
Bielecki T R and Pliska S R, Risk-sensitive ICAPM with application to fixed-income management, IEEE Trans. Autom. Control, 2004, 49(3): 420–432.
Article MathSciNet Google Scholar
Hata H, Risk sensitive asset management with lognormal interest rates, Asia-Pac. Financ. Mark., 2021, 28(2): 169–206.
Article Google Scholar
Bensoussan A and Van Schuppen J H, Optimal control of partially observable stochastic systems with an exponential-of-integral performance index, SIAM J. Control Optim., 1985, 23(4): 599–613.
Article MathSciNet Google Scholar
Moon J and Basar T, Linear quadratic risk-sensitive and robust mean field games, IEEE Trans. Autom. Control, 2017, 62(3): 1062–1077.
Article MathSciNet Google Scholar
Brezas P and Smith M C, Linear quadratic optimal and risk-sensitive control for vehicle active suspensions, IEEE Trans. Control Syst. Technol., 2014, 22(2): 543–556.
Article Google Scholar
Lim A E B and Zhou X Y, A new risk-sensitive maximum principle, IEEE Trans. Autom. Control, 2005, 50(7): 958–966.
Article MathSciNet Google Scholar
Moon J, Duncan T E, and Basar T, Risk-sensitive zero-sum differential games, IEEE Trans. Autom. Control, 2019, 64(4): 1503–1518.
Article MathSciNet Google Scholar
Basu A and Ghosh M K, Zero-sum risk-sensitive stochastic differential games, Math. Oper. Res., 2012, 37(3): 437–449.
Article MathSciNet Google Scholar
Laub A J, A Schur method for solving algebraic Riccati equation, IEEE Trans. Autom. Control, 1979, 24(6): 913–921.
Article MathSciNet Google Scholar
Bunse-Gerstner A and Mehrmann V, A symplectic QR like algorithm for the solution of the real algebraic Riccati equation, IEEE Trans. Autom. Control, 1986, 31(12): 1104–1113.
Article MathSciNet Google Scholar
Dieci L, Some numerical considerations and Newton’s method revisited for solving algebraic Riccati equations, IEEE Trans. Autom. Control, 1991, 36(5): 608–616.
Article MathSciNet Google Scholar
Datta B N, Numerical Methods for Linear Control Systems, Elsevier Academic Press, New York, 2004.
Google Scholar
Xie X M and Ding F, Adaptive Control System, Tsinghua University Press, Beijing, 2004.
Google Scholar
Duncan T E, Guo L, and Pasik-Duncan B, Adaptive continuous-time linear quadratic Gaussian control, IEEE Trans. Autom. Control, 1999, 44(9): 1653–1662.
Article MathSciNet Google Scholar
Li W Q and Krstic M, Stochastic adaptive nonlinear control with filterless least squares, IEEE Trans. Autom. Control, 2021, 66(9): 3839–3905.
Article MathSciNet Google Scholar
Liu N and Guo L, Stochastic adaptive linear quadratic differential games, IEEE Trans. Autom. Control, 2023, DOI: https://doi.org/10.1109/TAC.2023.3274863.
Li X Q, Xu Z G, Cui J R, et al., Suboptimal adaptive tracking control for FIR systems with binary-valued observations, Sci. China Inf. Sci., 2021, 64(7): 172202.
Article MathSciNet Google Scholar
Duncan T E, Pasik-Duncan B, and Stettner L, Risk sensitive adaptive control of discrete time markov processes, Probab. Math. Stat., 2001, 21: 493–512.
MathSciNet Google Scholar
Karmakar P and Bhatnagar S, On tight bounds for function approximation error in risk-sensitive reinforcement learning, Syst. Control Lett., 2021, 150: 104899.
Article MathSciNet Google Scholar
Borkar V S, Q-learning for risk-sensitive control, Math. Oper. Res., 2002, 27(2): 294–311.
Article MathSciNet Google Scholar
Ratliff L J and Mazumdar E, Inverse risk-sensitive reinforcement learning, IEEE Trans. Autom. Control, 2020, 65(3): 1256–1263.
Article MathSciNet Google Scholar
Borkar V S and Meyn S P, Risk-sensitive optimal control for markov decision processes with monotone cost, Math. Oper. Res., 2002, 27(1): 192–209.
Article MathSciNet Google Scholar
Werbos P J, Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D. dissertation, Harvard Univ., Cambridge, 1974.
Google Scholar
Lewis F L and Vrabie D, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circ. Syst. Mag., 2009, 9(3): 32–50.
Article Google Scholar
Sutton R S and Barto A G, Reinforcement Learning: An Introduction, MIT Press, Boston, 1998.
Google Scholar
Zhu J, Wei Y T, Kang Y, et al., Adaptive deep reinforcement learning for non-stationary environments, Sci. China Inf. Sci., 2022, 65(10): 202204.
Article MathSciNet Google Scholar
Bellman R E, Dynamic Programming, Princeton Univ. Press, Princeton, 1957.
Google Scholar
Bellman R E, Adaptive Control Processes: A Guided Tour, Princeton Univ. Press, Princeton, 1961.
Book Google Scholar
Murray J J, Cox C J, Lendaris G G, et al., Adaptive dynamic programming, IEEE Trans. Syst. Man, Cybern. C, Appl. Rev., 2002, 32(2): 140–153.
Article Google Scholar
Vrabie D, Pastravanu O, Abu-Khalaf M, et al., Adaptive optimal control for continuous-time linear systems based on policy iteration, Automatica, 2009, 45(2): 477–484.
Article MathSciNet Google Scholar
Jiang Y and Jiang Z P, Approximate dynamic programming for optimal stationary control with control-dependent noise, IEEE Trans. Neural Netw., 2011, 22(12): 2392–2398.
Article Google Scholar
Vamvoudakis K G, Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach, Syst. Control Lett., 2017, 100: 14–20.
Article MathSciNet Google Scholar
Bian T and Jiang Z P, Reinforcement learning for linear continuous-time systems: An incremental learning approach, IEEE/CAA J. Autom. Sinica, 2019, 6(2): 433–440.
Article MathSciNet Google Scholar
Jiang Y and Jiang Z P, Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 2012, 48(10): 2699–2704.
Article MathSciNet Google Scholar
Bian T and Jiang Z P, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica, 2016, 71: 348–360.
Article MathSciNet Google Scholar
Jiang Y and Jiang Z P, Adaptive dynamic programming as a theory of sensorimotor control, Biol. Cybern., 2014, 108(4): 459–473.
Article MathSciNet Google Scholar
Bian T, Jiang Y, and Jiang Z P, Adaptive dynamic programming for stochastic systems with state and control dependent noise, IEEE Trans. Autom. Control, 2016, 61(12): 4170–4175.
Article MathSciNet Google Scholar
Bian T and Jiang Z P, Continuous-time robust dynamic programming, SIAM J. Control Optim., 2019, 57(6): 4150–4174.
Article MathSciNet Google Scholar
Wu H N and Luo B, Simultaneous policy update algorithms for learning the solution of linear continuous-time H_∞ state feedback control, Inform. Sciences, 2013, 222: 472–485.
Article MathSciNet Google Scholar
Rall L B, A note on the convergence of Newton’s method, SIAM J. Numer. Anal., 1974, 11(1): 34–36.
Article MathSciNet Google Scholar
Kantorovich L V, Functional analysis and applied mathematics, Uspekhi Mat. Nauk, 1948, 3(6): 89–185.
MathSciNet Google Scholar
Bian T and Jiang Z P, Stochastic and adaptive optimal control of uncertain interconnected systems: A data-driven approach, Syst. Control Lett., 2018, 115: 48–54.
Article MathSciNet Google Scholar
Liu X K, Ce Y Y, and Li Y, Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming, Appl. Math. Comput., 2019, 363: 124568.
Article MathSciNet Google Scholar
Jiang Y and Jiang Z P, A robust adaptive dynamic programming principle for sensorimotor control with signal-dependent noise, Journal of Systems Science & Complexity, 2015, 28(2): 261–288.
Article MathSciNet Google Scholar
Wang J X and Li T, Distributed multi-area state estimation for power systems with switching communication graphs, IEEE Trans. Smart Grid, 2021, 12(1): 787–797.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, East China Normal University, Shanghai, 200241, China
Nan Qiao
Shanghai Key Laboratory of Pure Mathematics and Mathematical Practice, School of Mathematical Sciences, East China Normal University, Shanghai, 200241, China
Tao Li

Authors

Nan Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li.

Ethics declarations

The authors declare no conflict of interest.

Additional information

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62261136550 and in part by the Basic Research Project of Shanghai Science and Technology Commission under Grant No. 20JC1414000.

This paper was recommended for publication by Editor YU Chengpu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiao, N., Li, T. Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems. J Syst Sci Complex 37, 1446–1469 (2024). https://doi.org/10.1007/s11424-024-2421-z

Download citation

Received: 19 October 2022
Revised: 06 October 2023
Published: 11 June 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s11424-024-2421-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Comprehensive comparison of online ADP algorithms for continuous-time optimal control

Optimal Control of Unknown Discrete-Time Linear Systems with Additive Noise

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now