Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays

Liang, Yuling; Zhang, Huaguang; Xiao, Geyang; Jiang, He

doi:10.1007/s00521-018-3537-7

Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays

Review
Published: 31 July 2018

Volume 30, pages 1733–1745, (2018)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yuling Liang¹,
Huaguang Zhang ORCID: orcid.org/0000-0001-5702-4845¹,
Geyang Xiao¹ &
…
He Jiang¹

934 Accesses
8 Citations
Explore all metrics

Abstract

This paper is concerned with online adaptive control strategy for a class of unknown nonlinear discrete-time systems with time delays. The main objective is to establish an online adaptive control strategy based on reinforcement learning (RL) algorithm, so that the nonquadratic performance index can be minimized and the closed-loop system with time delays is stable. In order to simplify the control systems with time delays, a time delay function is designed to eliminate the term of control delays. Then, the online adaptive control algorithm via RL is presented to approach the reasonable control law and optimizes the long-term performance function. On the basis of the Lyapunov theory, it is proved that the design of online adaptive controller is effective and all the signals of control system are ultimately uniformly bounded. The simulation results indicate the validity and feasibility of the proposed adaptive control strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement-Learning-Based Controller Design for Nonaffine Nonlinear Systems

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Article 24 August 2021

Optimal Control of Nonlinear Time-Delay Systems with Input Constraints Using Reinforcement Learning

References

Kau S, Lee H, Yang C, Lee C, Lin L, Fang C (2007) Robust ${H_\infty }$ fuzzy static output feedback control of T-S fuzzy systems with parametric uncertainties. Fuzzy Sets Syst 158(2):135–146
Article MathSciNet MATH Google Scholar
Fang J, Ren Y (2011) High-precision control for a single-gimbal magnetically suspended control moment gyro based on inverse system method. IEEE Trans Ind Electron 58(9):4331–4342
Article Google Scholar
Erenturk K (2010) Gray-fuzzy control of a nonlinear two-mass system. J Frankl Inst 347(7):1171–1185
Article MathSciNet MATH Google Scholar
Labiod S, Guerra T (2007) Adaptive fuzzy control of a class of SISO nonaffine nonlinear Systems. Fuzzy Sets Syst 158(10):1126–1137
Article MathSciNet MATH Google Scholar
Yang X, Peng K, Tong C (2013) Robust backstepping control for cold rolling main drive system with nonlinear uncertainties. Abstr Appl Anal 2013(4):4339–4344
MathSciNet MATH Google Scholar
Astrom K, Wittenmark B (1995) Adaptive control. Technometrics 33(4):649–654
Google Scholar
Ma H, Lum K, Ge S (2008) Adaptive control for a discrete-time first-order nonlinear system with both parametric and non-parametric uncertainties. In: Decision and control, pp 4839–4844
Su C, Stepanenko Y (1994) Adaptive control of a class of nonlinear systems with fuzzy logic. IEEE Trans Fuzzy Syst 2(4):285–294
Article Google Scholar
Wang M, Chen B, Liu K, Liu X, Zhang S (2008) Adaptive fuzzy tracking control of nonlinear time-delay systems with unknown virtual control coefficients. Inf Sci 178(22):4326–4340
Article MathSciNet MATH Google Scholar
Liu D, Wang D, Yang X (2013) An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inf Sci 220:331C342
MathSciNet Google Scholar
wang F, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with $\varepsilon$-error bound. IEEE Trans Neural Netw 22(1):24–36
Article Google Scholar
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Article MathSciNet MATH Google Scholar
Wang D, Liu D, Li H, Ma H (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167C179
MathSciNet Google Scholar
Gao W, Jiang Y, Jiang Z, Chai T (2016) Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 72:37–45
Article MathSciNet MATH Google Scholar
Busoniu L, Babuska R, Schutter B et al (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press Inc, Cambridge
Book Google Scholar
Yang Q, Jagannathan S (2012) Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators. IEEE Trans Syst Man Cybern Part B 42(2):377–390
Article Google Scholar
Vamvoudakis K, Lewis F (2009) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Article MathSciNet MATH Google Scholar
Yang X, Liu D, Wang D (2014) Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 87(3):553–566
Article MathSciNet MATH Google Scholar
Bhasina S, Kamalapurkar R, Johnsonb M, Vamvoudakis K, Lewis F, Dixonb W (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Article MathSciNet Google Scholar
Dierks T, Jagannathan S (2012) Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans Neural Netw Learn Syst 23(7):1118–1129
Article Google Scholar
He H, Ni Z, Fu J (2012) A three-network architecture for online learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13
Article Google Scholar
Vamvoudakis K, Lewis F (2009) Online actor-critic algorithm to solve the continuous. Automatica 46(5):3180–3187
Google Scholar
Zavarei M, Jamshidi M (1987) Time-delay systems analysis, optimization and applications. North-Holland, New York
MATH Google Scholar
HALE J J (1997) Theory of function differential equations. Springer, New York
Google Scholar
Iggidr A, Bensoubaya M (1998) New results on the stability of discrete-time systems and applications to control problems. J Math Anal Appl 219(2):392–414
Article MathSciNet MATH Google Scholar
Song R, Wei Q, Sun Q (2015) Nearly finite-horizon optimal control for a class of nonaffine time-delay nonlinear systems based on adaptive dynamic programming. Neurocomputing 156:166–175
Article Google Scholar
Deolia V, Deolia S, Sharma T (2012) Stabilization of unknown nonlinear discrete-time delay systems based on neural network. Intell Control Autom 3(4):337–345
Article Google Scholar
Sharma N, Bhasin S, Wang Q et al (2012) RISE-based adaptive control of a control affine uncertain nonlinear system with unknown state delays. IEEE Trans Autom Control 57(1):255–259
Article MathSciNet MATH Google Scholar
Wei Q, Zhang H, Liu D et al (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. ACTA Autom Sin 36(1):121–129
Article MathSciNet MATH Google Scholar
Wang B, Zhao D, Alippi C et al (2014) Dual heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay. Neurocomputing 134(9):222–229
Article Google Scholar
Zhang H, Song R, Wei Q et al (2011) Optimal tracking control for a class of nonlinear discrete-Time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–62
Article Google Scholar
Yang Q, Vance J, Jagannathan S (2008) Control of nonaffine nonlinear discrete-Time systems using reinforcement-learning-based linearly parameterized neural networks. IEEE Trans Syst Man Cybernet B Cybernet 38(4):994–1001
Article Google Scholar
Song R, Xiao W, Wei Q (2013) Multi-objective optimal control for a class of nonlinear time-delay systems via adaptive dynamic programming. Soft Comput 17(11):2109–2115
Article MATH Google Scholar
Liu L, Wang Z, Zhang H (2017) Neural-network-based robust optimal tracking control for MIMO discrete-time systems with unknown uncertainty using adaptive critic design. IEEE Trans Neural Netw Learn Syst 29(4):1239–1251
Article Google Scholar
Wang D, Mu C, Liu D (2017) Data-driven nonlinear near-optimal regulation based on iterative neural dynamic programming. ACTA Autom Sin 43(3):366–375
MATH Google Scholar
Cui X, Zhang H, Luo Y et al (2017) Finite-horizon optimal control of unknown nonlinear time-delay systems. Neurocomputing 238:277–285
Article Google Scholar
Hsu M, Ho W, Chou J (2013) Stable and quadratic optimal control for TS fuzzy-model-based time-delay control systems. IEEE Trans Syst Man Cybernet Part A Syst Hum 38(4):933–944
Article Google Scholar
Mao Z, Xiao X (2011) Decentralized adaptive tracking control of nonaffine nonlinear large-scale systems with time delays. Inf Sci 181(23):5291–5303
Article MathSciNet MATH Google Scholar
Liu C, Loxton R, Teo K (2014) A computational method for solving time-delay optimal control problems with free terminal time. Syst Control Lett 72:53–60
Article MathSciNet MATH Google Scholar
Jajarmi A, Hajipour M (2016) An efficient recursive shooting method for the optimal control of time-varying systems with state time-delay. Appl Math Model 40(4):2756–2769
Article MathSciNet Google Scholar
Xu B, Yang C, Shi Z (2014) Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Trans Neural Netw Learn Syst 25(3):635
Article Google Scholar
He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans Syst Man Cybernet B 37(2):425–36
Article Google Scholar
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320C1329
Article Google Scholar
Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55(3):30
Article MATH Google Scholar
Liu Y, Tang L, Tong S, Chen P, Li D (2014) Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Netw Learn Syst 26(1):165–176
Article MathSciNet Google Scholar
Prokhorov D, Wunsch D (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997
Article Google Scholar
Jameson J (2012) A neurocontroller based on model feedback and the adaptive heuristic critic. IEEE IJCNN Int Joint Conf Neural Netw 2:37–44
Google Scholar
Liu Y, Tang L, Tong S, Philip Chen L, Li D (2015) Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Netw Learn Syst 26(1):165
Article MathSciNet Google Scholar
Ornelas-Tellez F, Sanchez EN, Loukianov AG et al (2014) Robust inverse optimal control for discrete-time nonlinear system stabilization. Eur J Control 20(1):38–44
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61433004, 61627809, 61621004), and IAPI Fundamental Research Funds 2013ZCX14.

Author information

Authors and Affiliations

Northeastern University, Shenyang, Liaoning, People’s Republic of China
Yuling Liang, Huaguang Zhang, Geyang Xiao & He Jiang

Authors

Yuling Liang
View author publications
You can also search for this author in PubMed Google Scholar
Huaguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Geyang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
He Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaguang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to this work.

Appendix

1.1 Proof of Theorem 1

Consider the Lyapunov candidate function defined as

$$\begin{aligned} \begin{aligned} L(k)&= \sum \limits _{i = 1}^5 {{L_i}(k)} \\&=\displaystyle \frac{{{\gamma _1}}}{3}x_{}^T(k)x(k) + \frac{{{\gamma _2}}}{{{\eta _a}}}{\tilde{w}}_a^T(k){\tilde{w}}_a^{}(k) + \frac{{{\gamma _3}}}{{{\eta _c}}}{\tilde{w}}_c^T(k){\tilde{w}}_c^{}(k) \\&\quad +\displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}{\tilde{w}}_M^T(k){\tilde{w}}_M^{}(k) + {\gamma _5}{\left\| {{\zeta _c}(k - 1)} \right\| ^2} \end{aligned} \end{aligned}$$

(40)

where ${\gamma _i} \in {R^ + },{{i = 1,2,3,4,5,}}$ are the design parameters.

Let ${L_1}(k) = \displaystyle \frac{{{\gamma _1}}}{3}x_{}^T(k)x(k)$ and the first-order derivation of ${L_1}(k)$ is

$$\begin{aligned} \begin{aligned} \Delta {L_1}(k)&= \displaystyle \frac{{{\gamma _1}}}{3}(x_{}^T(k + 1)x(k + 1) - x_{}^T(k)x(k)) \\&\le {\gamma _1}{\bar{g}}_M^2\zeta _a^2 + {\gamma _1}{\bar{g}}_M^2\delta _a^2 + {\gamma _1}\lambda _{\max }^2({\alpha _1}){\left\| {x(k)} \right\| ^2} - \displaystyle \frac{{{\gamma _1}}}{3}{\left\| {x(k)} \right\| ^2} \end{aligned} \end{aligned}$$

(41)

where ${\lambda _{\, \, \max }}({\alpha _1})$ is the maximum eigenvalue of matrix ${\alpha _1}$.

Let ${L_2}(k) = \displaystyle \frac{{{\gamma _2}}}{{{\eta _a}}}\tilde{w}_a^T(k){\tilde{w}}_a^{}(k)$ and the first-order derivation of ${L_2}(k)$ is expressed by

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k) = \frac{{{\gamma _2}}}{{{\eta _a}}}{\left\| {{{{\tilde{w}}}_a}(k + 1)} \right\| ^2} - \frac{{{\gamma _2}}}{{{\eta _a}}}{\left\| {{{{\tilde{w}}}_a}(k)} \right\| ^2}. \end{aligned} \end{aligned}$$

(42)

Subtracting the term ${w_a}$ on both sides of (34), one gets

$$\begin{aligned} {{\tilde{w}}_a}(k + 1) = {{\tilde{w}}_a}(k) - {\eta _a}{\psi _a}({s_a}(k))(x(k + 1) - {\alpha _1}x(k) + {\hat{J}}(k)). \end{aligned}$$

(43)

Substituting (43) into (42), it yields

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k)&\le \displaystyle \frac{{{\gamma _2}}}{{{\eta _a}}}[\eta _a^2{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{({\alpha _1}x(k) - x(k + 1) - {\hat{J}}(k))^2}] \\&\quad -\displaystyle \frac{{2{\gamma _2}}}{{{\eta _a}}}{{{\tilde{w}}}_a}(k)[{\eta _a}{\psi _a}({s_a}(k))({\hat{J}}(k) + x(k + 1) - {\alpha _1}x(k))] \\&= {\eta _a}{\gamma _2}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{({\alpha _1}x(k) - x(k + 1) - {\hat{J}}(k))^2} - 2{\gamma _2}{\zeta _a}(k) \\&\quad \times ({\hat{J}}(k) + x(k + 1) - {\alpha _1}x(k))\\&\le {\eta _a}{\gamma _2}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{({\alpha _1}x(k) - x(k + 1) - {\hat{J}}(k))^2} - 2{\gamma _2}{\zeta _a}(k)\\&\quad \times ({g_M}(x(k),x(k - \sigma )){\zeta _a}(k) \\&\quad - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))\, \, \\&\le {\gamma _2}\left\{ { - ({{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\underline{g}} _M^2)} \right. \zeta _a^2(k) - 2\zeta _a^{}(k) \\&\quad \times [1 - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{g_M}(x(k),x(k - \sigma ))]\\&\quad \times [\, - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)] - {{{\underline{g}} }_M} \zeta _a^2(k)\\&\quad \left. { + {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{[\, - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^2}} \right\} . \end{aligned} \end{aligned}$$

(44)

Subtracting and adding the term ${{({{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2})} \big / {({{{\underline{g}} }_M} - {\eta _a}}}$ $\times {\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2) \times {[ - {g_M}(x(k),x(k - \sigma )){\delta _a} + {\hat{J}}(k)]^2}$ into the above equation, it obtains

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k)&\le {\gamma _2}\left\{ { - {{{\underline{g}} }_M}\zeta _a^2 + \frac{{{{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad \times {[ - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]^2} \\&\quad - ({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\\&\quad \times \left[ {{\zeta _a} + \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad {\left. { \times ( - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))} \right] ^2}\\&\quad + \left. {{\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{[ - {g_M}(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^2}} \right\} .\\ \end{aligned} \end{aligned}$$

(45)

Define ${\gamma _2} = {\gamma _{21}}{\gamma _{22}}$, where ${\gamma _{22}}[{{({{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2}} \big / {({{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}}}$ $\times \bar{g}_M^2)) + {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}] \le {1 / 2}$. Thus, it becomes

$$\begin{aligned} \begin{aligned} \Delta {L_2}(k)&\le -{\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\left[ {{\zeta _a} + \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad {\left. { \times ( - g_M(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))} \right] ^2}\\&\quad - {\gamma _2}{{\underline{g}} _M}\zeta _a^2 + \frac{{{\gamma _{21}}}}{2}{[ - g_M(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]^2}\\&\le - {\gamma _2}{{\underline{g}} _M}\zeta _a^2 - {\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\\&\quad \times \, \left[ {{\zeta _a} + \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}} \right. \\&\quad {\left. { \times ( - g_M(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k))} \right] ^2}\\&\quad + {\gamma _{21}}\zeta _c^2 + {\gamma _{21}}{[ - g_M(x(k),x(k - \sigma )){\delta _a}(k) + J(k)]^2}. \end{aligned} \end{aligned}$$

(46)

Defining ${{\tilde{w}}_c}(k) = {{\hat{w}}_c}(k) - {w_c}$ and subtracting ${w_c}$ on both sides of (28), one gets

$$\begin{aligned} {{\tilde{w}}_c}(k + 1)&= {{\tilde{w}}_c}(k) - {\eta _c}{\psi _c}({s_c}(k))\times [{\beta ^{N - k + 1}}r(k) \nonumber \\&\quad + {\hat{w}}_c^T(k){\psi _c}(s(k)) - {\hat{w}}_c^T(k - 1){\psi _c}(s(k - 1))]. \end{aligned}$$

(47)

Thus, the first-order derivative of ${L_3}(k)$ can be obtained

$$\begin{aligned} \begin{aligned} \Delta {L_3}(k)&=\displaystyle \frac{{{\gamma _3}}}{{{\eta _c}}}{\left\| {{{{\tilde{w}}}_c}(k + 1)} \right\| ^2} - \frac{{{\gamma _c}}}{{{\eta _c}}}{\left\| {{{{\tilde{w}}}_c}(k)} \right\| ^2}\\&= \frac{{{\gamma _3}}}{{{\eta _c}}}\left\{ { - 2{\eta _c}{\tilde{w}}_c^T{\psi _c}({s_c}(k))({\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k))} \right. \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))) + \eta _c^2{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}[{\beta ^{N - k + 1}}p(k)\\&\quad \left. { + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2}} \right\} \\&= -\, 2{\gamma _3}{\zeta _c}(k)({\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k))\, \, \, \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))) + {\eta _c}{\gamma _3}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2} \\&\quad \times [{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k))\, \, \, \, \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))){]^2}. \end{aligned} \end{aligned}$$

(48)

Subtracting and adding the term ${\gamma _3}{[{\beta ^{N - k + 1}}r(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1)}$ $\times {{\psi _c}({s_c}(k - 1))]^2}$ into (48), $\Delta L(3)$ becomes

$$\begin{aligned} \begin{aligned} \Delta {L_3}(k)&= -\, {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}][{\beta ^{N - k + 1}}p(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2}\\&\quad + {\gamma _3}[{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))][ - 2{\zeta _c}(k) + {\beta ^{N - k + 1}}p(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))] \\&= -\, {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}][{\beta ^{N - k + 1}}p(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} \\&\quad + {\gamma _3}[{\beta ^{N - k + 1}}p(k) + w_c^T(k){\psi _c}({s_c}(k)) \\&\quad - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} - {\gamma _3}\zeta _c^2(k). \end{aligned} \end{aligned}$$

(49)

According to the equality ${\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)) = w_c^T(k - 1){\psi _c}({s_c}(k - 1)) + {\zeta _c}(k - 1)$ and the fact ${(m \pm n)^2} \le 2{m^2} + 2{n^2}$, one can deduce

$$\begin{aligned} \begin{aligned} \Delta {L_3}(k)&\le - {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}][{\beta ^{N - k + 1}}r(k) \\&\quad + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} \\&\quad + 2{\gamma _3}[{\beta ^{N - k + 1}}p(k) + w_c^T(k){\psi _c}({s_c}(k)) \\&\quad - w_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} - {\gamma _3}\zeta _c^2(k) + 2{\gamma _3}\zeta _c^2(k - 1). \end{aligned} \end{aligned}$$

(50)

Let ${L_4}(k) = \displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}\tilde{w}_M^T(k){\tilde{w}}_M^{}(k)$ and the first-order derivation of ${L_4}(k)$ is expressed by

$$\begin{aligned} \Delta {L_4}(k) = \displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}{\left\| {{{{\tilde{w}}}_M}(k + 1)} \right\| ^2} - \frac{{{\gamma _4}}}{{{\eta _M}}}{\left\| {{{{\tilde{w}}}_M}(k)} \right\| ^2}. \end{aligned}$$

(51)

Subtracting term ${w_M}$ on both sides of (15), one gets

$$\begin{aligned} {{\tilde{w}}_M}(k + 1) = {{\tilde{w}}_M}(k) - {\eta _M}{\psi _M}(u(k))({\hat{u}}(k - \tau ) - u(k - \tau )). \end{aligned}$$

(52)

Substituting (52) into (51) yields

$$\begin{aligned} \begin{aligned} \Delta {L_4}(k)&=\displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}\left( { - 2{\eta _M}{{{\tilde{w}}}_M}(k){\psi _M}(u(k))({\hat{u}}(k - \tau ) - u(k - \tau ))} \right. \,\\&\quad + \left. {\eta _M^2\psi _M^2(u(k))({\hat{u}}(k - \tau ) - u(k - \tau ))\, {\, ^2}} \right) . \end{aligned} \end{aligned}$$

(53)

Subtracting and adding the term ${\gamma _4}{\left[ {{\hat{u}}(k - \tau ) - u(k - \tau )} \right] ^2}$, the $\Delta {L_4}(k)$ becomes

$$\begin{aligned} \begin{aligned} \Delta {L_4}(k)\, \, \,&= -\, {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{[{\hat{u}}(k - \tau ) - u(k - \tau )]^2} \\&\quad + {\gamma _4}[ - 2{{{\tilde{w}}}_M}(k){\psi _M}(u(k)) + ({\hat{u}}(k - \tau ) - u(k - \tau ))\, ] \\&\quad \times [{\hat{u}}(k - \tau ) - u(k - \tau )] \\&= -\, {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{[{\hat{u}}(k - \tau ) - u(k - \tau )]^2} \\&\quad +{\gamma _4} [{w_M}{\psi _M}(u(k)) - u(k - \tau ) - {\zeta _M}(k)] \\&\quad \times [{w_M}{\psi _M}(u(k)) - u(k - \tau ) + {\zeta _M}(k)] \\&\le - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{[{\hat{u}}(k - \tau ) - u(k - \tau )]^2} \\&\quad + {\gamma _4}{[{w_M}{\psi _M}(u(k)) - u(k - \tau )]^2} - {\gamma _4}\zeta _M^2(k). \end{aligned} \end{aligned}$$

(54)

By invoking (7) and (8), one obtains

$$\begin{aligned} \begin{aligned} \Delta {L_4}(k)&\le - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]{e_M^2} \\&\quad + {\gamma _4}\delta _M^2(k) - {\gamma _4}\zeta _M^2(k) \end{aligned} \end{aligned}$$

(55)

where ${\zeta _M}(k) = {{\tilde{w}}_M}{\psi _M}(u(k))$.

In addition, let ${L_5} = {\gamma _5}{\left\| {{\zeta _c}(k - 1)} \right\| ^2}$, it is obviously that

$$\begin{aligned} \Delta {L_5}(k) = {\gamma _5}\zeta _c^2(k) - {\gamma _5}\zeta _c^2(k - 1). \end{aligned}$$

(56)

Combining (41), (46), (50), (55) with (56), the first derivative of L(k) becomes

$$\begin{aligned} \begin{aligned} \Delta L(k)&= \sum \limits _{i = 1}^5 {\Delta {L_i}(k)} \\&\le ({\gamma _1}{\bar{g}}_M^2 - {\gamma _2}{{\underline{g}} _M})\zeta _a^2 + ({\gamma _1}\lambda _{\max }^2({\alpha _1}) - \displaystyle \frac{{{\gamma _1}}}{3}){\left\| {{x_{}}(k)} \right\| ^2} \\&\quad - {\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2)\, \\&\quad \times {\left[ {{\zeta _a} + \displaystyle \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{\underline{g}} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}{{[ - g(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^{}}} \right] ^2} \\&\quad + ({\gamma _{21}} - {\gamma _3} + {\gamma _5})\zeta _c^2(k)\, - {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}] \\&\quad \times {[{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({S_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))]^2} \\&\quad - ({\gamma _5} - 2{\gamma _3})\zeta _c^2(k - 1) - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]e_M^2\\&\quad - {\gamma _4}\zeta _M^2(k) - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]e_M^2 + {R_1} \\&= -\, ({\gamma _2}{{\underline{g}} _M} - {\gamma _1}{\bar{g}}_M^2)\zeta _a^2 - {\gamma _1}(\frac{1}{3} - \lambda _{\max }^2({\alpha _1})){\left\| {{x_{}}(k)} \right\| ^2} \\&\quad - ({\gamma _3} - {\gamma _{21}} - {\gamma _5})\zeta _c^2(k)\, \, \, - {\gamma _2}({{\underline{g}} _M} - {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2) \\&\quad \times {\left[ {{\zeta _a} + \displaystyle \displaystyle \frac{{1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M}}}{{{\underline{g}} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{\bar{g}}_M^2}}{{[ - g(x(k),x(k - \sigma )){\delta _a}(k) + {\hat{J}}(k)]}^{}}} \right] ^2} \\&\quad - {\gamma _3}[1 - {\eta _c}{\left\| {{\psi _c}({s_c}(k))} \right\| ^2}] \\&\quad \times {[{\beta ^{N - k + 1}}p(k) + {\hat{w}}_c^T(k){\psi _c}({S_c}(k)) - {\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1))]^2}\\&\quad - ({\gamma _5} - 2{\gamma _3})\zeta _c^2(k - 1) - {\gamma _4}\zeta _M^2(k) - {\gamma _4}[1 - {\eta _M}\psi _M^2(u(k))]e_M^2 + {R_1} \end{aligned} \end{aligned}$$

(57)

with

$$\begin{aligned} {R_1}&= {\gamma _1}g_M^2\delta _a^2 + {\gamma _{21}}{[ - g(x(k),x(k - \sigma )){\delta _a}(k) + J(k)]^2} \\&\quad + 2{\gamma _3}[{\beta ^{N - k + 1}}p(k) + \omega _c^T(k){\psi _c}({s_c}(k)) \\&\quad - w_c^T(k - 1){\psi _c}({s_c}(k - 1)){]^2} + {\gamma _4}\delta _M^2(k) \\ &\le {\gamma _1}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}J_{m}^2 \\&\quad + 12{\gamma _3}w_{cm}^2\psi _{cm}^2 + 6{\gamma _3} + {\gamma _4}{\bar{\delta }} _M^2, \end{aligned}$$

where ${{\bar{g}}_M}$ and ${{\underline{g}} _M}$ stand for the maximum and minimum eigenvalues of the matrix ${g_M}$, respectively.

Choose the parameters as

$$\begin{aligned} \begin{array}{l} {\gamma _1} \le \displaystyle \frac{{{{\underline{\mathrm{g}} }_M}{\gamma _2}}}{{{\bar{g}}_M^2}},0 \le {\lambda _{{\mathrm{max}}}}({\alpha _1}) \le \displaystyle \frac{{\sqrt{3} }}{3},{\gamma _3}> {\gamma _{21}} + {\gamma _5},{\gamma _5} > 2{\gamma _3},0<{\eta _a}<\displaystyle \frac{{{{{\underline{g}} }_M}}}{{{\psi _{am}^2}{\bar{g}}_M^2}}, \\ 0<{\eta _c}< \displaystyle \frac{1}{{\psi _{cm}^2}},0< {\eta _M} <\displaystyle \frac{1}{{{\bar{\psi }} _M^2}} \\ \end{array} \end{aligned}$$

Hence, $\Delta L(k)$ becomes

$$\begin{aligned} \Delta L(k)\le & {} - ({\gamma _2}{{\underline{g}} _M} - {\gamma _1}{\bar{g}}_M^2)\zeta _a^2 - {\gamma _1}\left( \displaystyle \frac{1}{3} - \lambda _{\max }^2({\alpha _1})\right) {\left\| {x(k)} \right\| ^2} \\&\quad - ({\gamma _3} - {\gamma _{21}} - {\gamma _5})\zeta _c^2(k) - {\gamma _4}\zeta _M^2(k) + {{{\bar{R}}}_1} \end{aligned}$$

where

$$\begin{aligned} {{{\bar{R}}}_1}= & {} {\gamma _1}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}{\bar{g}}_M^2\delta _{am}^2 + 2{\gamma _{21}}J_{m}^2 \\&\quad + 12{\gamma _3}w_{cm}^2\psi _{cm}^2 + 6{\gamma _3} + {\gamma _4}{\bar{\delta }} _M^2. \end{aligned}$$

Thus, if one of the following inequalities holds, $\Delta L(k) < 0.$

$$\begin{aligned} \left\| {x(k)} \right\| \ge \sqrt{\frac{{3{{{\bar{R}}}_1}}}{{1 - 3\lambda _{\max }^2({\alpha _1})}}} \end{aligned}$$

or

$$\begin{aligned} \left\| {{\zeta _a}(k)} \right\| \ge \sqrt{\frac{{{{{\bar{R}}}_1}}}{{{\gamma _2}{{{\underline{g}} }_M} - {\gamma _2}{\bar{g}}_M^2}}} \end{aligned}$$

or

$$\begin{aligned} \left\| {{\zeta _c}(k)} \right\| \ge \sqrt{\frac{{{{{\bar{R}}}_1}}}{{{\gamma _3} - {\gamma _{21}} - {\gamma _5}}}} \end{aligned}$$

or

$$\begin{aligned} \left\| {{\zeta _M}(k)} \right\| \ge \sqrt{\frac{{{{{\bar{R}}}_1}}}{{{\gamma _4}}}}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, Y., Zhang, H., Xiao, G. et al. Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays. Neural Comput & Applic 30, 1733–1745 (2018). https://doi.org/10.1007/s00521-018-3537-7

Download citation

Received: 10 October 2017
Accepted: 11 May 2018
Published: 31 July 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00521-018-3537-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays

Abstract

Access this article

Similar content being viewed by others

Reinforcement-Learning-Based Controller Design for Nonaffine Nonlinear Systems

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Optimal Control of Nonlinear Time-Delay Systems with Input Constraints Using Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix

1.1 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays

Abstract

Access this article

Similar content being viewed by others

Reinforcement-Learning-Based Controller Design for Nonaffine Nonlinear Systems

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Optimal Control of Nonlinear Time-Delay Systems with Input Constraints Using Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix

Appendix

1.1 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation