Abstract
This paper is concerned with online adaptive control strategy for a class of unknown nonlinear discrete-time systems with time delays. The main objective is to establish an online adaptive control strategy based on reinforcement learning (RL) algorithm, so that the nonquadratic performance index can be minimized and the closed-loop system with time delays is stable. In order to simplify the control systems with time delays, a time delay function is designed to eliminate the term of control delays. Then, the online adaptive control algorithm via RL is presented to approach the reasonable control law and optimizes the long-term performance function. On the basis of the Lyapunov theory, it is proved that the design of online adaptive controller is effective and all the signals of control system are ultimately uniformly bounded. The simulation results indicate the validity and feasibility of the proposed adaptive control strategy.
Similar content being viewed by others
References
Kau S, Lee H, Yang C, Lee C, Lin L, Fang C (2007) Robust \({H_\infty }\) fuzzy static output feedback control of T-S fuzzy systems with parametric uncertainties. Fuzzy Sets Syst 158(2):135–146
Fang J, Ren Y (2011) High-precision control for a single-gimbal magnetically suspended control moment gyro based on inverse system method. IEEE Trans Ind Electron 58(9):4331–4342
Erenturk K (2010) Gray-fuzzy control of a nonlinear two-mass system. J Frankl Inst 347(7):1171–1185
Labiod S, Guerra T (2007) Adaptive fuzzy control of a class of SISO nonaffine nonlinear Systems. Fuzzy Sets Syst 158(10):1126–1137
Yang X, Peng K, Tong C (2013) Robust backstepping control for cold rolling main drive system with nonlinear uncertainties. Abstr Appl Anal 2013(4):4339–4344
Astrom K, Wittenmark B (1995) Adaptive control. Technometrics 33(4):649–654
Ma H, Lum K, Ge S (2008) Adaptive control for a discrete-time first-order nonlinear system with both parametric and non-parametric uncertainties. In: Decision and control, pp 4839–4844
Su C, Stepanenko Y (1994) Adaptive control of a class of nonlinear systems with fuzzy logic. IEEE Trans Fuzzy Syst 2(4):285–294
Wang M, Chen B, Liu K, Liu X, Zhang S (2008) Adaptive fuzzy tracking control of nonlinear time-delay systems with unknown virtual control coefficients. Inf Sci 178(22):4326–4340
Liu D, Wang D, Yang X (2013) An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inf Sci 220:331C342
wang F, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\varepsilon\)-error bound. IEEE Trans Neural Netw 22(1):24–36
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Wang D, Liu D, Li H, Ma H (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167C179
Gao W, Jiang Y, Jiang Z, Chai T (2016) Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 72:37–45
Busoniu L, Babuska R, Schutter B et al (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press Inc, Cambridge
Yang Q, Jagannathan S (2012) Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators. IEEE Trans Syst Man Cybern Part B 42(2):377–390
Vamvoudakis K, Lewis F (2009) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Yang X, Liu D, Wang D (2014) Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 87(3):553–566
Bhasina S, Kamalapurkar R, Johnsonb M, Vamvoudakis K, Lewis F, Dixonb W (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Dierks T, Jagannathan S (2012) Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans Neural Netw Learn Syst 23(7):1118–1129
He H, Ni Z, Fu J (2012) A three-network architecture for online learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13
Vamvoudakis K, Lewis F (2009) Online actor-critic algorithm to solve the continuous. Automatica 46(5):3180–3187
Zavarei M, Jamshidi M (1987) Time-delay systems analysis, optimization and applications. North-Holland, New York
HALE J J (1997) Theory of function differential equations. Springer, New York
Iggidr A, Bensoubaya M (1998) New results on the stability of discrete-time systems and applications to control problems. J Math Anal Appl 219(2):392–414
Song R, Wei Q, Sun Q (2015) Nearly finite-horizon optimal control for a class of nonaffine time-delay nonlinear systems based on adaptive dynamic programming. Neurocomputing 156:166–175
Deolia V, Deolia S, Sharma T (2012) Stabilization of unknown nonlinear discrete-time delay systems based on neural network. Intell Control Autom 3(4):337–345
Sharma N, Bhasin S, Wang Q et al (2012) RISE-based adaptive control of a control affine uncertain nonlinear system with unknown state delays. IEEE Trans Autom Control 57(1):255–259
Wei Q, Zhang H, Liu D et al (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. ACTA Autom Sin 36(1):121–129
Wang B, Zhao D, Alippi C et al (2014) Dual heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay. Neurocomputing 134(9):222–229
Zhang H, Song R, Wei Q et al (2011) Optimal tracking control for a class of nonlinear discrete-Time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–62
Yang Q, Vance J, Jagannathan S (2008) Control of nonaffine nonlinear discrete-Time systems using reinforcement-learning-based linearly parameterized neural networks. IEEE Trans Syst Man Cybernet B Cybernet 38(4):994–1001
Song R, Xiao W, Wei Q (2013) Multi-objective optimal control for a class of nonlinear time-delay systems via adaptive dynamic programming. Soft Comput 17(11):2109–2115
Liu L, Wang Z, Zhang H (2017) Neural-network-based robust optimal tracking control for MIMO discrete-time systems with unknown uncertainty using adaptive critic design. IEEE Trans Neural Netw Learn Syst 29(4):1239–1251
Wang D, Mu C, Liu D (2017) Data-driven nonlinear near-optimal regulation based on iterative neural dynamic programming. ACTA Autom Sin 43(3):366–375
Cui X, Zhang H, Luo Y et al (2017) Finite-horizon optimal control of unknown nonlinear time-delay systems. Neurocomputing 238:277–285
Hsu M, Ho W, Chou J (2013) Stable and quadratic optimal control for TS fuzzy-model-based time-delay control systems. IEEE Trans Syst Man Cybernet Part A Syst Hum 38(4):933–944
Mao Z, Xiao X (2011) Decentralized adaptive tracking control of nonaffine nonlinear large-scale systems with time delays. Inf Sci 181(23):5291–5303
Liu C, Loxton R, Teo K (2014) A computational method for solving time-delay optimal control problems with free terminal time. Syst Control Lett 72:53–60
Jajarmi A, Hajipour M (2016) An efficient recursive shooting method for the optimal control of time-varying systems with state time-delay. Appl Math Model 40(4):2756–2769
Xu B, Yang C, Shi Z (2014) Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Trans Neural Netw Learn Syst 25(3):635
He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans Syst Man Cybernet B 37(2):425–36
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320C1329
Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55(3):30
Liu Y, Tang L, Tong S, Chen P, Li D (2014) Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Netw Learn Syst 26(1):165–176
Prokhorov D, Wunsch D (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997
Jameson J (2012) A neurocontroller based on model feedback and the adaptive heuristic critic. IEEE IJCNN Int Joint Conf Neural Netw 2:37–44
Liu Y, Tang L, Tong S, Philip Chen L, Li D (2015) Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Netw Learn Syst 26(1):165
Ornelas-Tellez F, Sanchez EN, Loukianov AG et al (2014) Robust inverse optimal control for discrete-time nonlinear system stabilization. Eur J Control 20(1):38–44
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61433004, 61627809, 61621004), and IAPI Fundamental Research Funds 2013ZCX14.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest to this work.
Appendix
Appendix
1.1 Proof of Theorem 1
Consider the Lyapunov candidate function defined as
where \({\gamma _i} \in {R^ + },{{i = 1,2,3,4,5,}}\) are the design parameters.
Let \({L_1}(k) = \displaystyle \frac{{{\gamma _1}}}{3}x_{}^T(k)x(k)\) and the first-order derivation of \({L_1}(k)\) is
where \({\lambda _{\, \, \max }}({\alpha _1})\) is the maximum eigenvalue of matrix \({\alpha _1}\).
Let \({L_2}(k) = \displaystyle \frac{{{\gamma _2}}}{{{\eta _a}}}\tilde{w}_a^T(k){\tilde{w}}_a^{}(k)\) and the first-order derivation of \({L_2}(k)\) is expressed by
Subtracting the term \({w_a}\) on both sides of (34), one gets
Substituting (43) into (42), it yields
Subtracting and adding the term \({{({{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2})} \big / {({{{\underline{g}} }_M} - {\eta _a}}}\) \(\times {\left\| {{\psi _a}({s_a}(k))} \right\| ^2}{\bar{g}}_M^2) \times {[ - {g_M}(x(k),x(k - \sigma )){\delta _a} + {\hat{J}}(k)]^2}\) into the above equation, it obtains
Define \({\gamma _2} = {\gamma _{21}}{\gamma _{22}}\), where \({\gamma _{22}}[{{({{(1 - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}{{{\underline{g}} }_M})}^2}} \big / {({{{\underline{g}} }_M} - {\eta _a}{{\left\| {{\psi _a}({s_a}(k))} \right\| }^2}}}\) \(\times \bar{g}_M^2)) + {\eta _a}{\left\| {{\psi _a}({s_a}(k))} \right\| ^2}] \le {1 / 2}\). Thus, it becomes
Defining \({{\tilde{w}}_c}(k) = {{\hat{w}}_c}(k) - {w_c}\) and subtracting \({w_c}\) on both sides of (28), one gets
Thus, the first-order derivative of \({L_3}(k)\) can be obtained
Subtracting and adding the term \({\gamma _3}{[{\beta ^{N - k + 1}}r(k) + {\hat{w}}_c^T(k){\psi _c}({s_c}(k)) - {\hat{w}}_c^T(k - 1)}\) \(\times {{\psi _c}({s_c}(k - 1))]^2}\) into (48), \(\Delta L(3)\) becomes
According to the equality \({\hat{w}}_c^T(k - 1){\psi _c}({s_c}(k - 1)) = w_c^T(k - 1){\psi _c}({s_c}(k - 1)) + {\zeta _c}(k - 1)\) and the fact \({(m \pm n)^2} \le 2{m^2} + 2{n^2}\), one can deduce
Let \({L_4}(k) = \displaystyle \frac{{{\gamma _4}}}{{{\eta _M}}}\tilde{w}_M^T(k){\tilde{w}}_M^{}(k)\) and the first-order derivation of \({L_4}(k)\) is expressed by
Subtracting term \({w_M}\) on both sides of (15), one gets
Substituting (52) into (51) yields
Subtracting and adding the term \({\gamma _4}{\left[ {{\hat{u}}(k - \tau ) - u(k - \tau )} \right] ^2}\), the \(\Delta {L_4}(k)\) becomes
By invoking (7) and (8), one obtains
where \({\zeta _M}(k) = {{\tilde{w}}_M}{\psi _M}(u(k))\).
In addition, let \({L_5} = {\gamma _5}{\left\| {{\zeta _c}(k - 1)} \right\| ^2}\), it is obviously that
Combining (41), (46), (50), (55) with (56), the first derivative of L(k) becomes
with
where \({{\bar{g}}_M}\) and \({{\underline{g}} _M}\) stand for the maximum and minimum eigenvalues of the matrix \({g_M}\), respectively.
Choose the parameters as
Hence, \(\Delta L(k)\) becomes
where
Thus, if one of the following inequalities holds, \(\Delta L(k) < 0.\)
or
or
or
Rights and permissions
About this article
Cite this article
Liang, Y., Zhang, H., Xiao, G. et al. Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays. Neural Comput & Applic 30, 1733–1745 (2018). https://doi.org/10.1007/s00521-018-3537-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3537-7