Skip to main content

Advertisement

Log in

Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

The authors propose a data-driven direct adaptive control law based on the adaptive dynamic programming (ADP) algorithm for continuous-time stochastic linear systems with partially unknown system dynamics and infinite horizon quadratic risk-sensitive indices. The authors use online data of the system to iteratively solve the generalized algebraic Riccati equation (GARE) and to learn the optimal control law directly. For the case with measurable system noises, the authors show that the adaptive control law approximates the optimal control law as time goes on. For the case with unmeasurable system noises, the authors use the least-square solution calculated only from the measurable data instead of the real solution of the regression equation to iteratively solve the GARE. The authors also study the influences of the intensity of the system noises, the intensity of the exploration noises, the initial iterative matrix, and the sampling period on the convergence of the ADP algorithm. Finally, the authors present two numerical simulation examples to demonstrate the effectiveness of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Jacobson D H, Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games, IEEE Trans. Autom. Control, 1973, 18(2): 124–131.

    Article  MathSciNet  Google Scholar 

  2. Whittle P, Risk-sensitive linear/quadratic/Gaussian control, Adv. Appl. Prob., 1981, 13(4): 764–777.

    Article  MathSciNet  Google Scholar 

  3. Runolfsson T, The equivalence between infinite-horizon optimal control of stochastic systems with exponential-of-integral performance index and stochastic differential games, IEEE Trans. Autom. Control, 1994, 39(8): 1551–1563.

    Article  MathSciNet  Google Scholar 

  4. Basar T and Bernhard P, H Optimal Control and Related Minimax Design Problems, Birkhäuser, Boston, 1995.

    Google Scholar 

  5. Basar T, Nash equilibria of risk-sensitive nonlinear stochastic differential games, J. Optimiz. Theory App., 1999, 100(3): 479–498.

    Article  MathSciNet  Google Scholar 

  6. Bielecki T R and Pliska S R, Risk-sensitive ICAPM with application to fixed-income management, IEEE Trans. Autom. Control, 2004, 49(3): 420–432.

    Article  MathSciNet  Google Scholar 

  7. Hata H, Risk sensitive asset management with lognormal interest rates, Asia-Pac. Financ. Mark., 2021, 28(2): 169–206.

    Article  Google Scholar 

  8. Bensoussan A and Van Schuppen J H, Optimal control of partially observable stochastic systems with an exponential-of-integral performance index, SIAM J. Control Optim., 1985, 23(4): 599–613.

    Article  MathSciNet  Google Scholar 

  9. Moon J and Basar T, Linear quadratic risk-sensitive and robust mean field games, IEEE Trans. Autom. Control, 2017, 62(3): 1062–1077.

    Article  MathSciNet  Google Scholar 

  10. Brezas P and Smith M C, Linear quadratic optimal and risk-sensitive control for vehicle active suspensions, IEEE Trans. Control Syst. Technol., 2014, 22(2): 543–556.

    Article  Google Scholar 

  11. Lim A E B and Zhou X Y, A new risk-sensitive maximum principle, IEEE Trans. Autom. Control, 2005, 50(7): 958–966.

    Article  MathSciNet  Google Scholar 

  12. Moon J, Duncan T E, and Basar T, Risk-sensitive zero-sum differential games, IEEE Trans. Autom. Control, 2019, 64(4): 1503–1518.

    Article  MathSciNet  Google Scholar 

  13. Basu A and Ghosh M K, Zero-sum risk-sensitive stochastic differential games, Math. Oper. Res., 2012, 37(3): 437–449.

    Article  MathSciNet  Google Scholar 

  14. Laub A J, A Schur method for solving algebraic Riccati equation, IEEE Trans. Autom. Control, 1979, 24(6): 913–921.

    Article  MathSciNet  Google Scholar 

  15. Bunse-Gerstner A and Mehrmann V, A symplectic QR like algorithm for the solution of the real algebraic Riccati equation, IEEE Trans. Autom. Control, 1986, 31(12): 1104–1113.

    Article  MathSciNet  Google Scholar 

  16. Dieci L, Some numerical considerations and Newton’s method revisited for solving algebraic Riccati equations, IEEE Trans. Autom. Control, 1991, 36(5): 608–616.

    Article  MathSciNet  Google Scholar 

  17. Datta B N, Numerical Methods for Linear Control Systems, Elsevier Academic Press, New York, 2004.

    Google Scholar 

  18. Xie X M and Ding F, Adaptive Control System, Tsinghua University Press, Beijing, 2004.

    Google Scholar 

  19. Duncan T E, Guo L, and Pasik-Duncan B, Adaptive continuous-time linear quadratic Gaussian control, IEEE Trans. Autom. Control, 1999, 44(9): 1653–1662.

    Article  MathSciNet  Google Scholar 

  20. Li W Q and Krstic M, Stochastic adaptive nonlinear control with filterless least squares, IEEE Trans. Autom. Control, 2021, 66(9): 3839–3905.

    Article  MathSciNet  Google Scholar 

  21. Liu N and Guo L, Stochastic adaptive linear quadratic differential games, IEEE Trans. Autom. Control, 2023, DOI: https://doi.org/10.1109/TAC.2023.3274863.

  22. Li X Q, Xu Z G, Cui J R, et al., Suboptimal adaptive tracking control for FIR systems with binary-valued observations, Sci. China Inf. Sci., 2021, 64(7): 172202.

    Article  MathSciNet  Google Scholar 

  23. Duncan T E, Pasik-Duncan B, and Stettner L, Risk sensitive adaptive control of discrete time markov processes, Probab. Math. Stat., 2001, 21: 493–512.

    MathSciNet  Google Scholar 

  24. Karmakar P and Bhatnagar S, On tight bounds for function approximation error in risk-sensitive reinforcement learning, Syst. Control Lett., 2021, 150: 104899.

    Article  MathSciNet  Google Scholar 

  25. Borkar V S, Q-learning for risk-sensitive control, Math. Oper. Res., 2002, 27(2): 294–311.

    Article  MathSciNet  Google Scholar 

  26. Ratliff L J and Mazumdar E, Inverse risk-sensitive reinforcement learning, IEEE Trans. Autom. Control, 2020, 65(3): 1256–1263.

    Article  MathSciNet  Google Scholar 

  27. Borkar V S and Meyn S P, Risk-sensitive optimal control for markov decision processes with monotone cost, Math. Oper. Res., 2002, 27(1): 192–209.

    Article  MathSciNet  Google Scholar 

  28. Werbos P J, Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D. dissertation, Harvard Univ., Cambridge, 1974.

    Google Scholar 

  29. Lewis F L and Vrabie D, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circ. Syst. Mag., 2009, 9(3): 32–50.

    Article  Google Scholar 

  30. Sutton R S and Barto A G, Reinforcement Learning: An Introduction, MIT Press, Boston, 1998.

    Google Scholar 

  31. Zhu J, Wei Y T, Kang Y, et al., Adaptive deep reinforcement learning for non-stationary environments, Sci. China Inf. Sci., 2022, 65(10): 202204.

    Article  MathSciNet  Google Scholar 

  32. Bellman R E, Dynamic Programming, Princeton Univ. Press, Princeton, 1957.

    Google Scholar 

  33. Bellman R E, Adaptive Control Processes: A Guided Tour, Princeton Univ. Press, Princeton, 1961.

    Book  Google Scholar 

  34. Murray J J, Cox C J, Lendaris G G, et al., Adaptive dynamic programming, IEEE Trans. Syst. Man, Cybern. C, Appl. Rev., 2002, 32(2): 140–153.

    Article  Google Scholar 

  35. Vrabie D, Pastravanu O, Abu-Khalaf M, et al., Adaptive optimal control for continuous-time linear systems based on policy iteration, Automatica, 2009, 45(2): 477–484.

    Article  MathSciNet  Google Scholar 

  36. Jiang Y and Jiang Z P, Approximate dynamic programming for optimal stationary control with control-dependent noise, IEEE Trans. Neural Netw., 2011, 22(12): 2392–2398.

    Article  Google Scholar 

  37. Vamvoudakis K G, Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach, Syst. Control Lett., 2017, 100: 14–20.

    Article  MathSciNet  Google Scholar 

  38. Bian T and Jiang Z P, Reinforcement learning for linear continuous-time systems: An incremental learning approach, IEEE/CAA J. Autom. Sinica, 2019, 6(2): 433–440.

    Article  MathSciNet  Google Scholar 

  39. Jiang Y and Jiang Z P, Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 2012, 48(10): 2699–2704.

    Article  MathSciNet  Google Scholar 

  40. Bian T and Jiang Z P, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica, 2016, 71: 348–360.

    Article  MathSciNet  Google Scholar 

  41. Jiang Y and Jiang Z P, Adaptive dynamic programming as a theory of sensorimotor control, Biol. Cybern., 2014, 108(4): 459–473.

    Article  MathSciNet  Google Scholar 

  42. Bian T, Jiang Y, and Jiang Z P, Adaptive dynamic programming for stochastic systems with state and control dependent noise, IEEE Trans. Autom. Control, 2016, 61(12): 4170–4175.

    Article  MathSciNet  Google Scholar 

  43. Bian T and Jiang Z P, Continuous-time robust dynamic programming, SIAM J. Control Optim., 2019, 57(6): 4150–4174.

    Article  MathSciNet  Google Scholar 

  44. Wu H N and Luo B, Simultaneous policy update algorithms for learning the solution of linear continuous-time H state feedback control, Inform. Sciences, 2013, 222: 472–485.

    Article  MathSciNet  Google Scholar 

  45. Rall L B, A note on the convergence of Newton’s method, SIAM J. Numer. Anal., 1974, 11(1): 34–36.

    Article  MathSciNet  Google Scholar 

  46. Kantorovich L V, Functional analysis and applied mathematics, Uspekhi Mat. Nauk, 1948, 3(6): 89–185.

    MathSciNet  Google Scholar 

  47. Bian T and Jiang Z P, Stochastic and adaptive optimal control of uncertain interconnected systems: A data-driven approach, Syst. Control Lett., 2018, 115: 48–54.

    Article  MathSciNet  Google Scholar 

  48. Liu X K, Ce Y Y, and Li Y, Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming, Appl. Math. Comput., 2019, 363: 124568.

    Article  MathSciNet  Google Scholar 

  49. Jiang Y and Jiang Z P, A robust adaptive dynamic programming principle for sensorimotor control with signal-dependent noise, Journal of Systems Science & Complexity, 2015, 28(2): 261–288.

    Article  MathSciNet  Google Scholar 

  50. Wang J X and Li T, Distributed multi-area state estimation for power systems with switching communication graphs, IEEE Trans. Smart Grid, 2021, 12(1): 787–797.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Ethics declarations

The authors declare no conflict of interest.

Additional information

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62261136550 and in part by the Basic Research Project of Shanghai Science and Technology Commission under Grant No. 20JC1414000.

This paper was recommended for publication by Editor YU Chengpu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, N., Li, T. Data-Driven Direct Adaptive Risk-Sensitive Control of Stochastic Systems. J Syst Sci Complex 37, 1446–1469 (2024). https://doi.org/10.1007/s11424-024-2421-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-024-2421-z

Keywords