Skip to main content
Log in

Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper is concerned with online adaptive control strategy for a class of unknown nonlinear discrete-time systems with time delays. The main objective is to establish an online adaptive control strategy based on reinforcement learning (RL) algorithm, so that the nonquadratic performance index can be minimized and the closed-loop system with time delays is stable. In order to simplify the control systems with time delays, a time delay function is designed to eliminate the term of control delays. Then, the online adaptive control algorithm via RL is presented to approach the reasonable control law and optimizes the long-term performance function. On the basis of the Lyapunov theory, it is proved that the design of online adaptive controller is effective and all the signals of control system are ultimately uniformly bounded. The simulation results indicate the validity and feasibility of the proposed adaptive control strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Kau S, Lee H, Yang C, Lee C, Lin L, Fang C (2007) Robust \({H_\infty }\) fuzzy static output feedback control of T-S fuzzy systems with parametric uncertainties. Fuzzy Sets Syst 158(2):135–146

    Article  MathSciNet  MATH  Google Scholar 

  2. Fang J, Ren Y (2011) High-precision control for a single-gimbal magnetically suspended control moment gyro based on inverse system method. IEEE Trans Ind Electron 58(9):4331–4342

    Article  Google Scholar 

  3. Erenturk K (2010) Gray-fuzzy control of a nonlinear two-mass system. J Frankl Inst 347(7):1171–1185

    Article  MathSciNet  MATH  Google Scholar 

  4. Labiod S, Guerra T (2007) Adaptive fuzzy control of a class of SISO nonaffine nonlinear Systems. Fuzzy Sets Syst 158(10):1126–1137

    Article  MathSciNet  MATH  Google Scholar 

  5. Yang X, Peng K, Tong C (2013) Robust backstepping control for cold rolling main drive system with nonlinear uncertainties. Abstr Appl Anal 2013(4):4339–4344

    MathSciNet  MATH  Google Scholar 

  6. Astrom K, Wittenmark B (1995) Adaptive control. Technometrics 33(4):649–654

    Google Scholar 

  7. Ma H, Lum K, Ge S (2008) Adaptive control for a discrete-time first-order nonlinear system with both parametric and non-parametric uncertainties. In: Decision and control, pp 4839–4844

  8. Su C, Stepanenko Y (1994) Adaptive control of a class of nonlinear systems with fuzzy logic. IEEE Trans Fuzzy Syst 2(4):285–294

    Article  Google Scholar 

  9. Wang M, Chen B, Liu K, Liu X, Zhang S (2008) Adaptive fuzzy tracking control of nonlinear time-delay systems with unknown virtual control coefficients. Inf Sci 178(22):4326–4340

    Article  MathSciNet  MATH  Google Scholar 

  10. Liu D, Wang D, Yang X (2013) An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inf Sci 220:331C342

    MathSciNet  Google Scholar 

  11. wang F, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\varepsilon\)-error bound. IEEE Trans Neural Netw 22(1):24–36

    Article  Google Scholar 

  12. Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832

    Article  MathSciNet  MATH  Google Scholar 

  13. Wang D, Liu D, Li H, Ma H (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167C179

    MathSciNet  Google Scholar 

  14. Gao W, Jiang Y, Jiang Z, Chai T (2016) Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 72:37–45

    Article  MathSciNet  MATH  Google Scholar 

  15. Busoniu L, Babuska R, Schutter B et al (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press Inc, Cambridge

    Book  Google Scholar 

  16. Yang Q, Jagannathan S (2012) Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators. IEEE Trans Syst Man Cybern Part B 42(2):377–390

    Article  Google Scholar 

  17. Vamvoudakis K, Lewis F (2009) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888

    Article  MathSciNet  MATH  Google Scholar 

  18. Yang X, Liu D, Wang D (2014) Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 87(3):553–566

    Article  MathSciNet  MATH  Google Scholar 

  19. Bhasina S, Kamalapurkar R, Johnsonb M, Vamvoudakis K, Lewis F, Dixonb W (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92

    Article  MathSciNet  Google Scholar 

  20. Dierks T, Jagannathan S (2012) Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans Neural Netw Learn Syst 23(7):1118–1129

    Article  Google Scholar 

  21. He H, Ni Z, Fu J (2012) A three-network architecture for online learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13

    Article  Google Scholar 

  22. Vamvoudakis K, Lewis F (2009) Online actor-critic algorithm to solve the continuous. Automatica 46(5):3180–3187

    Google Scholar 

  23. Zavarei M, Jamshidi M (1987) Time-delay systems analysis, optimization and applications. North-Holland, New York

    MATH  Google Scholar 

  24. HALE J J (1997) Theory of function differential equations. Springer, New York

    Google Scholar 

  25. Iggidr A, Bensoubaya M (1998) New results on the stability of discrete-time systems and applications to control problems. J Math Anal Appl 219(2):392–414

    Article  MathSciNet  MATH  Google Scholar 

  26. Song R, Wei Q, Sun Q (2015) Nearly finite-horizon optimal control for a class of nonaffine time-delay nonlinear systems based on adaptive dynamic programming. Neurocomputing 156:166–175

    Article  Google Scholar 

  27. Deolia V, Deolia S, Sharma T (2012) Stabilization of unknown nonlinear discrete-time delay systems based on neural network. Intell Control Autom 3(4):337–345

    Article  Google Scholar 

  28. Sharma N, Bhasin S, Wang Q et al (2012) RISE-based adaptive control of a control affine uncertain nonlinear system with unknown state delays. IEEE Trans Autom Control 57(1):255–259

    Article  MathSciNet  MATH  Google Scholar 

  29. Wei Q, Zhang H, Liu D et al (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. ACTA Autom Sin 36(1):121–129

    Article  MathSciNet  MATH  Google Scholar 

  30. Wang B, Zhao D, Alippi C et al (2014) Dual heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay. Neurocomputing 134(9):222–229

    Article  Google Scholar 

  31. Zhang H, Song R, Wei Q et al (2011) Optimal tracking control for a class of nonlinear discrete-Time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–62

    Article  Google Scholar 

  32. Yang Q, Vance J, Jagannathan S (2008) Control of nonaffine nonlinear discrete-Time systems using reinforcement-learning-based linearly parameterized neural networks. IEEE Trans Syst Man Cybernet B Cybernet 38(4):994–1001

    Article  Google Scholar 

  33. Song R, Xiao W, Wei Q (2013) Multi-objective optimal control for a class of nonlinear time-delay systems via adaptive dynamic programming. Soft Comput 17(11):2109–2115

    Article  MATH  Google Scholar 

  34. Liu L, Wang Z, Zhang H (2017) Neural-network-based robust optimal tracking control for MIMO discrete-time systems with unknown uncertainty using adaptive critic design. IEEE Trans Neural Netw Learn Syst 29(4):1239–1251

    Article  Google Scholar 

  35. Wang D, Mu C, Liu D (2017) Data-driven nonlinear near-optimal regulation based on iterative neural dynamic programming. ACTA Autom Sin 43(3):366–375

    MATH  Google Scholar 

  36. Cui X, Zhang H, Luo Y et al (2017) Finite-horizon optimal control of unknown nonlinear time-delay systems. Neurocomputing 238:277–285

    Article  Google Scholar 

  37. Hsu M, Ho W, Chou J (2013) Stable and quadratic optimal control for TS fuzzy-model-based time-delay control systems. IEEE Trans Syst Man Cybernet Part A Syst Hum 38(4):933–944

    Article  Google Scholar 

  38. Mao Z, Xiao X (2011) Decentralized adaptive tracking control of nonaffine nonlinear large-scale systems with time delays. Inf Sci 181(23):5291–5303

    Article  MathSciNet  MATH  Google Scholar 

  39. Liu C, Loxton R, Teo K (2014) A computational method for solving time-delay optimal control problems with free terminal time. Syst Control Lett 72:53–60

    Article  MathSciNet  MATH  Google Scholar 

  40. Jajarmi A, Hajipour M (2016) An efficient recursive shooting method for the optimal control of time-varying systems with state time-delay. Appl Math Model 40(4):2756–2769

    Article  MathSciNet  Google Scholar 

  41. Xu B, Yang C, Shi Z (2014) Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Trans Neural Netw Learn Syst 25(3):635

    Article  Google Scholar 

  42. He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans Syst Man Cybernet B 37(2):425–36

    Article  Google Scholar 

  43. Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320C1329

    Article  Google Scholar 

  44. Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55(3):30

    Article  MATH  Google Scholar 

  45. Liu Y, Tang L, Tong S, Chen P, Li D (2014) Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Netw Learn Syst 26(1):165–176

    Article  MathSciNet  Google Scholar 

  46. Prokhorov D, Wunsch D (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997

    Article  Google Scholar 

  47. Jameson J (2012) A neurocontroller based on model feedback and the adaptive heuristic critic. IEEE IJCNN Int Joint Conf Neural Netw 2:37–44

    Google Scholar 

  48. Liu Y, Tang L, Tong S, Philip Chen L, Li D (2015) Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Netw Learn Syst 26(1):165

    Article  MathSciNet  Google Scholar 

  49. Ornelas-Tellez F, Sanchez EN, Loukianov AG et al (2014) Robust inverse optimal control for discrete-time nonlinear system stabilization. Eur J Control 20(1):38–44

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61433004, 61627809, 61621004), and IAPI Fundamental Research Funds 2013ZCX14.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaguang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to this work.

Appendix

Appendix

1.1 Proof of Theorem 1

Consider the Lyapunov candidate function defined as

$$\begin{aligned} \begin{aligned} L(k)&= \sum \limits _{i = 1}^5 {{L_i}(k)} \\&=\displaystyle \frac{{{\gamma _1}}}{3}x_{}^T(k)x(k) + \frac{{{\gamma _2}}}{{{\eta _a}}}{\tilde{w}}_a^T(k){\tilde{w}}_a^{}(k) + \frac{{{\gamma _3}}}{{{\eta _c}}}{\tilde{w}}_c^T(k){\tilde{w}}_c^{}(k) \\&\quad +\displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}{\tilde{w}}_M^T(k){\tilde{w}}_M^{}(k) + {\gamma _5}{\left\| {{\zeta _c}(k - 1)} \right\| ^2} \end{aligned} \end{aligned}$$
(40)

where \({\gamma _i} \in {R^ + },{{i = 1,2,3,4,5,}}\) are the design parameters.

Let \({L_1}(k) = \displaystyle \frac{{{\gamma _1}}}{3}x_{}^T(k)x(k)\) and the first-order derivation of \({L_1}(k)\) is

$$\begin{aligned} \begin{aligned} \Delta {L_1}(k)&= \displaystyle \frac{{{\gamma _1}}}{3}(x_{}^T(k + 1)x(k + 1) - x_{}^T(k)x(k)) \\&\le {\gamma _1}{\bar{g}}_M^2\zeta _a^2 + {\gamma _1}{\bar{g}}_M^2\delta _a^2 + {\gamma _1}\lambda _{\max }^2({\alpha _1}){\left\| {x(k)} \right\| ^2} - \displaystyle \frac{{{\gamma _1}}}{3}{\left\| {x(k)} \right\| ^2} \end{aligned} \end{aligned}$$
(41)

where \({\lambda _{\, \, \max }}({\alpha _1})\) is the maximum eigenvalue of matrix \({\alpha _1}\).

Let \({L_2}(k) = \displaystyle \frac{{{\gamma _2}}}{{{\eta _a}}}\tilde{w}_a^T(k){\tilde{w}}_a^{}(k)\) and the first-order derivation of \({L_2}(k)\) is expressed by

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k) = \frac{{{\gamma _2}}}{{{\eta _a}}}{\left\| {{{{\tilde{w}}}_a}(k + 1)} \right\| ^2} - \frac{{{\gamma _2}}}{{{\eta _a}}}{\left\| {{{{\tilde{w}}}_a}(k)} \right\| ^2}. \end{aligned} \end{aligned}$$
(42)

Subtracting the term \({w_a}\) on both sides of (34), one gets

$$\begin{aligned} {{\tilde{w}}_a}(k + 1) = {{\tilde{w}}_a}(k) - {\eta _a}{\psi _a}({s_a}(k))(x(k + 1) - {\alpha _1}x(k) + {\hat{J}}(k)). \end{aligned}$$
(43)

Substituting (43) into (42), it yields

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k)&\le \displaystyle \frac{{{\gamma _2}}}{{{\eta _a}}}[\eta _a^2{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{({\alpha _1}x(k) - x(k + 1) - {\hat{J}}(k))^2}] \\&\quad -\displaystyle \frac{{2{\gamma _2}}}{{{\eta _a}}}{{{\tilde{w}}}_a}(k)[{\eta _a}{\psi _a}({s_a}(k))({\hat{J}}(k) + x(k + 1) - {\alpha _1}x(k))] \\&= {\eta _a}{\gamma _2}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{({\alpha _1}x(k) - x(k + 1) - {\hat{J}}(k))^2} - 2{\gamma _2}{\zeta _a}(k) \\&\quad \times ({\hat{J}}(k) + x(k + 1) - {\alpha _1}x(k))\\&\le {\eta _a}{\gamma _2}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{({\alpha _1}x(k) - x(k + 1) - {\hat{J}}(k))^2} - 2{\gamma _2}{\zeta _a}(k)\\&\quad \times ({g_M}(x(k),x(k - \sigma )){\zeta _a}(k) \\&\quad - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))\, \, \\&\le {\gamma _2}\left\{ { - ({{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\underline{g}} _M^2)} \right. \zeta _a^2(k) - 2\zeta _a^{}(k) \\&\quad \times [1 - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{g_M}(x(k),x(k - \sigma ))]\\&\quad \times [\, - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)] - {{{\underline{g}} }_M} \zeta _a^2(k)\\&\quad \left. { + {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{[\, - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^2}} \right\} . \end{aligned} \end{aligned}$$
(44)

Subtracting and adding the term \({{({{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2})} \big / {({{{\underline{g}} }_M} - {\eta _a}}}\) \(\times {\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2) \times {[ - {g_M}(x(k),x(k - \sigma )){\delta _a} + {\hat{J}}(k)]^2}\) into the above equation, it obtains

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k)&\le {\gamma _2}\left\{ { - {{{\underline{g}} }_M}\zeta _a^2 + \frac{{{{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad \times {[ - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]^2} \\&\quad - ({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\\&\quad \times \left[ {{\zeta _a} + \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad {\left. { \times ( - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))} \right] ^2}\\&\quad + \left. {{\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{[ - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^2}} \right\} .\\ \end{aligned} \end{aligned}$$
(45)

Define \({\gamma _2} = {\gamma _{21}}{\gamma _{22}}\), where \({\gamma _{22}}[{{({{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2}} \big / {({{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}}}\) \(\times \bar{g}_M^2)) + {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}] \le {1 / 2}\). Thus, it becomes

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k)&\le -{\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\left[ {{\zeta _a} + \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad {\left. { \times ( - g_M(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))} \right] ^2}\\&\quad - {\gamma _2}{{\underline{g}} _M}\zeta _a^2 + \frac{{{\gamma _{21}}}}{2}{[ - g_M(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]^2}\\&\le - {\gamma _2}{{\underline{g}} _M}\zeta _a^2 - {\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\\&\quad \times \, \left[ {{\zeta _a} + \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad {\left. { \times ( - g_M(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))} \right] ^2}\\&\quad + {\gamma _{21}}\zeta _c^2 + {\gamma _{21}}{[ - g_M(x(k),x(k - \sigma )){\delta _a}(k) + J(k)]^2}. \end{aligned} \end{aligned}$$
(46)

Defining \({{\tilde{w}}_c}(k) = {{\hat{w}}_c}(k) - {w_c}\) and subtracting \({w_c}\) on both sides of (28), one gets

$$\begin{aligned} {{\tilde{w}}_c}(k + 1)&= {{\tilde{w}}_c}(k) - {\eta _c}{\psi _c}({s_c}(k))\times [{\beta ^{N - k + 1}}r(k) \nonumber \\&\quad + {\hat{w}}_c^T(k){\psi _c}(s(k)) - {\hat{w}}_c^T(k - 1){\psi _c}(s(k - 1))]. \end{aligned}$$
(47)

Thus, the first-order derivative of \({L_3}(k)\) can be obtained

$$\begin{aligned} \begin{aligned} \Delta {L_3}(k)&=\displaystyle \frac{{{\gamma _3}}}{{{\eta _c}}}{\left\| {{{{\tilde{w}}}_c}(k + 1)} \right\| ^2} - \frac{{{\gamma _c}}}{{{\eta _c}}}{\left\| {{{{\tilde{w}}}_c}(k)} \right\| ^2}\\&= \frac{{{\gamma _3}}}{{{\eta _c}}}\left\{ { - 2{\eta _c}{\tilde{w}}_c^T{\psi _c}({s_c}(k))({\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k))} \right. \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))) + \eta _c^2{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}[{\beta ^{N - k + 1}}p(k)\\&\quad \left. { + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2}} \right\} \\&= -\, 2{\gamma _3}{\zeta _c}(k)({\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k))\, \, \, \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))) + {\eta _c}{\gamma _3}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2} \\&\quad \times [{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k))\, \, \, \, \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))){]^2}. \end{aligned} \end{aligned}$$
(48)

Subtracting and adding the term \({\gamma _3}{[{\beta ^{N - k + 1}}r(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1)}\) \(\times {{\psi _c}({s_c}(k - 1))]^2}\) into (48), \(\Delta L(3)\) becomes

$$\begin{aligned} \begin{aligned} \Delta {L_3}(k)&= -\, {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}][{\beta ^{N - k + 1}}p(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2}\\&\quad + {\gamma _3}[{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))][ - 2{\zeta _c}(k) + {\beta ^{N - k + 1}}p(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))] \\&= -\, {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}][{\beta ^{N - k + 1}}p(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} \\&\quad + {\gamma _3}[{\beta ^{N - k + 1}}p(k) + w_c^T(k){\psi _c}({s_c}(k)) \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} - {\gamma _3}\zeta _c^2(k). \end{aligned} \end{aligned}$$
(49)

According to the equality \({\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)) = w_c^T(k - 1){\psi _c}({s_c}(k - 1)) + {\zeta _c}(k - 1)\) and the fact \({(m \pm n)^2} \le 2{m^2} + 2{n^2}\), one can deduce

$$\begin{aligned} \begin{aligned} \Delta {L_3}(k)&\le - {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}][{\beta ^{N - k + 1}}r(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} \\&\quad + 2{\gamma _3}[{\beta ^{N - k + 1}}p(k) + w_c^T(k){\psi _c}({s_c}(k)) \\&\quad - w_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} - {\gamma _3}\zeta _c^2(k) + 2{\gamma _3}\zeta _c^2(k - 1). \end{aligned} \end{aligned}$$
(50)

Let \({L_4}(k) = \displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}\tilde{w}_M^T(k){\tilde{w}}_M^{}(k)\) and the first-order derivation of \({L_4}(k)\) is expressed by

$$\begin{aligned} \Delta {L_4}(k) = \displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}{\left\| {{{{\tilde{w}}}_M}(k + 1)} \right\| ^2} - \frac{{{\gamma _4}}}{{{\eta _M}}}{\left\| {{{{\tilde{w}}}_M}(k)} \right\| ^2}. \end{aligned}$$
(51)

Subtracting term \({w_M}\) on both sides of (15), one gets

$$\begin{aligned} {{\tilde{w}}_M}(k + 1) = {{\tilde{w}}_M}(k) - {\eta _M}{\psi _M}(u(k))({\hat{u}}(k - \tau ) - u(k - \tau )). \end{aligned}$$
(52)

Substituting (52) into (51) yields

$$\begin{aligned} \begin{aligned} \Delta {L_4}(k)&=\displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}\left( { - 2{\eta _M}{{{\tilde{w}}}_M}(k){\psi _M}(u(k))({\hat{u}}(k - \tau ) - u(k - \tau ))} \right. \,\\&\quad + \left. {\eta _M^2\psi _M^2(u(k))({\hat{u}}(k - \tau ) - u(k - \tau ))\, {\, ^2}} \right) . \end{aligned} \end{aligned}$$
(53)

Subtracting and adding the term \({\gamma _4}{\left[ {{\hat{u}}(k - \tau ) - u(k - \tau )} \right] ^2}\), the \(\Delta {L_4}(k)\) becomes

$$\begin{aligned} \begin{aligned} \Delta {L_4}(k)\, \, \,&= -\, {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{[{\hat{u}}(k - \tau ) - u(k - \tau )]^2} \\&\quad + {\gamma _4}[ - 2{{{\tilde{w}}}_M}(k){\psi _M}(u(k)) + ({\hat{u}}(k - \tau ) - u(k - \tau ))\, ] \\&\quad \times [{\hat{u}}(k - \tau ) - u(k - \tau )] \\&= -\, {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{[{\hat{u}}(k - \tau ) - u(k - \tau )]^2} \\&\quad +{\gamma _4} [{w_M}{\psi _M}(u(k)) - u(k - \tau ) - {\zeta _M}(k)] \\&\quad \times [{w_M}{\psi _M}(u(k)) - u(k - \tau ) + {\zeta _M}(k)] \\&\le - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{[{\hat{u}}(k - \tau ) - u(k - \tau )]^2} \\&\quad + {\gamma _4}{[{w_M}{\psi _M}(u(k)) - u(k - \tau )]^2} - {\gamma _4}\zeta _M^2(k). \end{aligned} \end{aligned}$$
(54)

By invoking (7) and (8), one obtains

$$\begin{aligned} \begin{aligned} \Delta {L_4}(k)&\le - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{e_M^2} \\&\quad + {\gamma _4}\delta _M^2(k) - {\gamma _4}\zeta _M^2(k) \end{aligned} \end{aligned}$$
(55)

where \({\zeta _M}(k) = {{\tilde{w}}_M}{\psi _M}(u(k))\).

In addition, let \({L_5} = {\gamma _5}{\left\| {{\zeta _c}(k - 1)} \right\| ^2}\), it is obviously that

$$\begin{aligned} \Delta {L_5}(k) = {\gamma _5}\zeta _c^2(k) - {\gamma _5}\zeta _c^2(k - 1). \end{aligned}$$
(56)

Combining (41), (46), (50), (55) with (56), the first derivative of L(k) becomes

$$\begin{aligned} \begin{aligned} \Delta L(k)&= \sum \limits _{i = 1}^5 {\Delta {L_i}(k)} \\&\le ({\gamma _1}{\bar{g}}_M^2 - {\gamma _2}{{\underline{g}} _M})\zeta _a^2 + ({\gamma _1}\lambda _{\max }^2({\alpha _1}) - \displaystyle \frac{{{\gamma _1}}}{3}){\left\| {{x_{}}(k)} \right\| ^2} \\&\quad - {\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\, \\&\quad \times {\left[ {{\zeta _a} + \displaystyle \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{\underline{g}} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}{{[ - g(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^{}}} \right] ^2} \\&\quad + ({\gamma _{21}} - {\gamma _3} + {\gamma _5})\zeta _c^2(k)\, - {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}] \\&\quad \times {[{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({S_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))]^2} \\&\quad - ({\gamma _5} - 2{\gamma _3})\zeta _c^2(k - 1) - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]e_M^2\\&\quad - {\gamma _4}\zeta _M^2(k) - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]e_M^2 + {R_1} \\&= -\, ({\gamma _2}{{\underline{g}} _M} - {\gamma _1}{\bar{g}}_M^2)\zeta _a^2 - {\gamma _1}(\frac{1}{3} - \lambda _{\max }^2({\alpha _1})){\left\| {{x_{}}(k)} \right\| ^2} \\&\quad - ({\gamma _3} - {\gamma _{21}} - {\gamma _5})\zeta _c^2(k)\, \, \, - {\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2) \\&\quad \times {\left[ {{\zeta _a} + \displaystyle \displaystyle \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{\underline{g}} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}{{[ - g(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^{}}} \right] ^2} \\&\quad - {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}] \\&\quad \times {[{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({S_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))]^2}\\&\quad - ({\gamma _5} - 2{\gamma _3})\zeta _c^2(k - 1) - {\gamma _4}\zeta _M^2(k) - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]e_M^2 + {R_1} \end{aligned} \end{aligned}$$
(57)

with

$$\begin{aligned} {R_1}&= {\gamma _1}g_M^2\delta _a^2 + {\gamma _{21}}{[ - g(x(k),x(k - \sigma )){\delta _a}(k) + J(k)]^2} \\&\quad + 2{\gamma _3}[{\beta ^{N - k + 1}}p(k) + \omega _c^T(k){\psi _c}({s_c}(k)) \\&\quad - w_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} + {\gamma _4}\delta _M^2(k) \\ &\le {\gamma _1}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}J_{m}^2 \\&\quad + 12{\gamma _3}w_{cm}^2\psi _{cm}^2 + 6{\gamma _3} + {\gamma _4}{\bar{\delta }} _M^2, \end{aligned}$$

where \({{\bar{g}}_M}\) and \({{\underline{g}} _M}\) stand for the maximum and minimum eigenvalues of the matrix \({g_M}\), respectively.

Choose the parameters as

$$\begin{aligned} \begin{array}{l} {\gamma _1} \le \displaystyle \frac{{{{\underline{\mathrm{g}} }_M}{\gamma _2}}}{{{\bar{g}}_M^2}},0 \le {\lambda _{{\mathrm{max}}}}({\alpha _1}) \le \displaystyle \frac{{\sqrt{3} }}{3},{\gamma _3}> {\gamma _{21}} + {\gamma _5},{\gamma _5} > 2{\gamma _3},0<{\eta _a}<\displaystyle \frac{{{{{\underline{g}} }_M}}}{{{\psi _{am}^2}{\bar{g}}_M^2}}, \\ 0<{\eta _c}< \displaystyle \frac{1}{{\psi _{cm}^2}},0< {\eta _M} <\displaystyle \frac{1}{{{\bar{\psi }} _M^2}} \\ \end{array} \end{aligned}$$

Hence, \(\Delta L(k)\) becomes

$$\begin{aligned} \Delta L(k)\le & {} - ({\gamma _2}{{\underline{g}} _M} - {\gamma _1}{\bar{g}}_M^2)\zeta _a^2 - {\gamma _1}\left( \displaystyle \frac{1}{3} - \lambda _{\max }^2({\alpha _1})\right) {\left\| {x(k)} \right\| ^2} \\&\quad - ({\gamma _3} - {\gamma _{21}} - {\gamma _5})\zeta _c^2(k) - {\gamma _4}\zeta _M^2(k) + {{{\bar{R}}}_1} \end{aligned}$$

where

$$\begin{aligned} {{{\bar{R}}}_1}= & {} {\gamma _1}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}J_{m}^2 \\&\quad + 12{\gamma _3}w_{cm}^2\psi _{cm}^2 + 6{\gamma _3} + {\gamma _4}{\bar{\delta }} _M^2. \end{aligned}$$

Thus, if one of the following inequalities holds, \(\Delta L(k) < 0.\)

$$\begin{aligned} \left\| {x(k)} \right\| \ge \sqrt{\frac{{3{{{\bar{R}}}_1}}}{{1 - 3\lambda _{\max }^2({\alpha _1})}}} \end{aligned}$$

or

$$\begin{aligned} \left\| {{\zeta _a}(k)} \right\| \ge \sqrt{\frac{{{{{\bar{R}}}_1}}}{{{\gamma _2}{{{\underline{g}} }_M} - {\gamma _2}{\bar{g}}_M^2}}} \end{aligned}$$

or

$$\begin{aligned} \left\| {{\zeta _c}(k)} \right\| \ge \sqrt{\frac{{{{{\bar{R}}}_1}}}{{{\gamma _3} - {\gamma _{21}} - {\gamma _5}}}} \end{aligned}$$

or

$$\begin{aligned} \left\| {{\zeta _M}(k)} \right\| \ge \sqrt{\frac{{{{{\bar{R}}}_1}}}{{{\gamma _4}}}}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Y., Zhang, H., Xiao, G. et al. Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays. Neural Comput & Applic 30, 1733–1745 (2018). https://doi.org/10.1007/s00521-018-3537-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3537-7

Keywords

Navigation