Skip to main content
Log in

Comprehensive comparison of online ADP algorithms for continuous-time optimal control

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Online learning is an important property of adaptive dynamic programming (ADP). Online observations contain plentiful dynamics information, and ADP algorithms can utilize them to learn the optimal control policy. This paper reviews the research of online ADP algorithms for the optimal control of continuous-time systems. With the intensive study, ADP has been developed towards model free and data efficient. After separately introducing the algorithms, we compare their performance on the same problem. This paper is desired to provide a comprehensive understanding of continuous-time online ADP algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791

    Article  MathSciNet  MATH  Google Scholar 

  • Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949

    Article  Google Scholar 

  • Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations. Springer, NewYork

    MATH  Google Scholar 

  • Beard R, McLain T et al (1998) Successive Galerkin approximation algorithms for nonlinear optimal and robust control. Int J Control 71(5):717–743

    Article  MathSciNet  MATH  Google Scholar 

  • Beard RW, Saridis GN, Wen JT (1997) Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation. Automatica 33(12):2159–2177

    Article  MathSciNet  MATH  Google Scholar 

  • Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor–critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92

    Article  MathSciNet  MATH  Google Scholar 

  • Cochocki A, Unbehauen R (1993) Neural networks for optimization and signal processing, 1st edn. Wiley, NewYork, NY

    Google Scholar 

  • Hunt K, Sbarbaro D, Zbikowski R, Gawthrop P (1992) Neural networks for control systemsa survey. Automatica 28(6):1083–1112

    Article  MathSciNet  MATH  Google Scholar 

  • Jiang Y, Jiang ZP (2014) Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(5):882–893

    Article  Google Scholar 

  • Jiang Y, Jiang ZP (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929

    Article  MathSciNet  MATH  Google Scholar 

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  • Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50

    Article  Google Scholar 

  • Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525

    Article  Google Scholar 

  • Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202

    Article  MathSciNet  MATH  Google Scholar 

  • Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153

    Article  Google Scholar 

  • Ribeiro C (2002) Reinforcement learning agents. Artif Intell Rev 17(3):223–250

    Article  MATH  Google Scholar 

  • Song R, Lewis F, Wei Q, Zhang HG, Jiang ZP, Levine D (2015) Multiple actor–critic structures for continuous-time optimal control using input–output data. IEEE Trans Neural Netw Learn Syst 26(4):851–865

    Article  MathSciNet  Google Scholar 

  • Stevens BL, Lewis FL (2003) Aircraft control and simulation. Wiley, Hoboken

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • Vamvoudakis K, Vrabie D, Lewis F (2011) Online adaptive learning of optimal control solutions using integral reinforcement learning. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 250–257

  • Vamvoudakis KG, Lewis FL (2010) Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888

    Article  MathSciNet  MATH  Google Scholar 

  • Vamvoudakis KG, Vrabie D, Lewis FL (2014) Online adaptive algorithm for optimal control with integral reinforcement learning. Int J Robust Nonlinear Control 24(17):2686–2710

    Article  MathSciNet  MATH  Google Scholar 

  • Vrabie D, Lewis F (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246

    Article  MATH  Google Scholar 

  • Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832

    Article  MathSciNet  MATH  Google Scholar 

  • Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47

    Article  Google Scholar 

  • Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearb 22(6):25–38

    Google Scholar 

  • Zhang H, Liu D, Luo Y, Wang D (2012) Adaptive dynamic programming for control: algorithms and stability. Springer, NewYork

    Google Scholar 

  • Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216

    Article  Google Scholar 

  • Zhao D, Zhu Y (2015) MEC—a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst 26(2):346–356

    Article  MathSciNet  Google Scholar 

  • Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865

    Article  Google Scholar 

  • Zhu Y, Zhao D, He H, Ji J (2016a) Event-triggered optimal control for partially-unknown constrained-input systems via adaptive dynamic programming. IEEE Trans Ind Electron PP(99):1

  • Zhu Y, Zhao D, Li X (2016b) Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl 10(12):1339–1347

    Article  MathSciNet  Google Scholar 

  • Zhu Y, Zhao D, Li X (2017a) Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst 28(3):714–725

    Article  MathSciNet  Google Scholar 

  • Zhu Y, Zhao D, Yang X, Zhang Q (2017b) Policy iteration for \({H}_\infty \) optimal control of polynomial nonlinear systems via sum of squares programming. IEEE Trans Cybern PP(99):1–10

Download references

Acknowledgements

This work is supported partly by National Natural Science Foundation of China (61603382, 61573353, 61533017), and partly by the Early Career Development Award of SKLMCCS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongbin Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Zhao, D. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 49, 531–547 (2018). https://doi.org/10.1007/s10462-017-9548-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9548-4

Keywords

Navigation