Elsevier

Neurocomputing

Volume 198, 19 July 2016, Pages 91-99
Neurocomputing

Neural network-based online H control for discrete-time affine nonlinear system using adaptive dynamic programming

https://doi.org/10.1016/j.neucom.2015.08.120Get rights and content

Abstract

In this paper, the problem of H control design for affine nonlinear discrete-time systems is addressed by using adaptive dynamic programming (ADP). First, the nonlinear H control problem is transformed into solving the two-player zero-sum differential game problem of the nonlinear system. Then, the critic, action and disturbance networks are designed by using neural networks to solve online the Hamilton–Jacobi–Isaacs (HJI) equation associating with the two-player zero-sum differential game. When novel weight update laws for the critic, action and disturbance networks are tuned online by using data generated in real-time along the system trajectories, it is shown that the system states, all neural networks weight estimation errors are uniformly ultimately bounded by using Lyapunov techniques. Further, it is shown that the output of the action network approaches the optimal control input with small bounded error and the output of the disturbance network approaches the worst disturbance with small bounded error. At last, simulation results are presented to demonstrate the effectiveness of the new ADP-based method.

Introduction

It is well known that the control performance for practical systems is often affected by the presence of unknown disturbances such as measurement noise, input disturbances and other exogenous signals, which invariably occur in most applications because of plant interactions with the environment. H control is one of the most powerful control methods for attenuating the effect of disturbances in dynamical systems [1]. The formulation of the H Control for dynamical systems was studied in the frame work of Hamilton–Jacobi equations by van der Schaft [2] and Isidori and Astolfi [3]. It is worth noting that conditions for the existence of smooth solutions of the Hamilton–Jacobi equation were studied through invariant manifolds of Hamiltonian vector fields and the relation with the Hamiltonian matrices of the corresponding Riccati equation for the linearized problem [2]. Some of these conditions were relaxed into critical and noncritical cases by Isidori and Astolfi [3]. Later, Basar and Bernhard in [4] stated that the H control problem could be posed as the zero-sum two-person differential game, in which the input controller is a minimizing player and the unknown disturbance is a maximizing player. Although the formulation of the nonlinear H control theory has been well developed, the main bottleneck for its practical application is the need to solve the Hamilton–Jacobi–Isaacs (HJI) equation, which is difficult or impossible to solve and may not have global analytic solutions [5]. Therefore, solving the HJI equation remains a challenge.

Over the past decades, some methods have been proposed to solve the HJI equation [6], [7], [8]. The smooth solution of the HJI equation has been determined directly by solving for the coefficients of the Taylor series expansion of the value function in a very efficient manner, as it has been presented in [6]. Beard and McLain [7] proposed an iterative-based policy to successively solve the HJI equation by breaking the nonlinear differential equation to a sequence of linear differential equations. On the basis of the work [6] and [7], a similar iterative-based policy was proposed in [8] to the HJI equation for nonlinear systems with input constraints.

In recent years, adaptive dynamic programming (ADP) [9], [10], [11], [12], [13] has appeared to be promising methodologies for solving H control problems [15], [16], [17], [18], [19], [20], [21], [22], [23]. Adaptive dynamic programming is a kind of machine learning method for learning the feedback control laws online in real time based on system performance without necessarily knowing the system dynamics, which overcomes the curse of dimensionality [14] of dynamic programming. Al-Tamimi et al. in [15] derived adaptive critic designs corresponding to heuristic dynamic programming and dual heuristic dynamic programming to solve online the H control problem of the linear discrete-time system in a forward-in-time manner. Based on this work, authors in [16] proposed an iterative adaptive critic design algorithm to find the optimal controller of a class of discrete-time two-person zero-sum games for Roesser types 2-D systems. Further, a novel data-based adaptive critic design was proposed by using output feedback of unknown discrete-time zero-sum games [17]. Besides, optimal strategies based Q-learning were proposed for the H optimal control problem without knowing the system dynamical matrices in [18] and [19]. For the nonlinear case, Mehraeen et al. [20], [21] developed an off-line iterative approach to solve the HJI equation by using a successive approximation approach. Liu et al. in [22] proposed value iteration methods corresponding to heuristic dynamic programming and dual heuristic dynamic programming to solve the HJI equation for constrained input systems. Later, Liu et al. [23] proposed an iterative adaptive dynamic programming algorithm to solve the zero-sum game problems for affine nonlinear discrete-time systems. Nevertheless, a common feature of the above ADP-based results for solving the H control problem is that sequential iterative approaches are utilized to solve the HJI equation, which contain more than one iteration loop, i.e., the value function and the control and disturbance policies are asynchronously updated. However, such a procedure may lead to redundant iterations, and result in low efficiency [24], which motivates us to carry out the work of this paper.

In this paper, a new ADP-based method is proposed to solve online the H control problem of the nonlinear system, in which three online parametric structures are designed by using three neural networks for solving online the Hamilton–Jacobi–Isaacs equation appearing in the H control problem of the nonlinear system. The main contributions of this paper have two folds. First, we present a new ADP-based method in which the weights of three online parametric structures are tuned simultaneously along the system trajectories to converge to the solution of the HJI equation, which is different from the sequential algorithms in [15], [16], [17], [18], [19], [20], [21], [22], [23]. Second, while explicitly considering the neural network approximation errors in contrast to the works [20], [22], Lyapunov theory is utilized to demonstrate that the system states and the weight estimation errors of three online parametric structures are uniformly ultimately bounded. Besides, it is shown that the pair of the approximated control signal and the disturbance input signal converges to the approximate Nash equilibrium solution of the two-player zero-sum differential game.

The remainder of this paper is organized as follows. In Section 2, the problem statement is shown. In Section 3, we present a new ADP-based method for solving HJI equation of nonlinear discrete-time systems and the rigorous proof of convergence is given. Section 4 presents an example to demonstrate the effectiveness of the proposed method. Finally, conclusions are drawn in Section 5.

Section snippets

Problem formulation

In this paper, we consider the following affine nonlinear discrete-time system in the presence of the disturbance d(k):xk+1=f(xk)+g(xk)u(k)+d(k)z(k)=[CxkDu(k)]Twhere xkRn is the system state, u(k)Rm is the system control input, d(k)Rn is the disturbance signal with d(k)L2[0,], z(k) is the system fictitious output. Assume that g(xk)FgM [25], where ·F denotes the Frobenius norm.

The H control for the nonlinear discrete-time system (1), (2) is to find a state feedback controlu(k)=u(xk)

Main results

In the following section, we will present a new ADP-based scheme based on neural networks for solving online the HJI Eq. (13). Since Eq. (13) is a nonlinear partial difference equation, it is difficult to get the exact solution V(xk) from (13). However, we can find the approximate solution V(xk) from (13) by using the universal approximation property of neural networks [25]. Then, the value function V(xk) can be approximated by a neural network as follows:V(xk)=WcTψc(xk)+εc(xk),where Wc is

Simulation results

In this section, we solve a numerical example using the algorithm developed in Section 3. Consider the following nonlinear discrete-time system studied in [21]:[x1(k+1)x2(k+1)]=[0.8x2(k)sin(0.8x1(k)x2(k))+1.8x2(k)]+[01x2(k)]u(k)+[01]d(k)with the initial states are taken as x(0)=[0.10.1]T. The corresponding cost function is defined as (6), where Q and R are chosen as identity matrices of appropriate dimensions, γ=20. The initial stabilizing controller is defined as u0(k)=x1(k)+1.5x2(k).

To

Conclusion

In this paper, we have proposed a new ADP-based method to solve online the H control problem for affine nonlinear discrete-time systems. The importance of the proposed method relies on simultaneous tuning the weights of the critic, action, and disturbance networks by using data generated in real-time along the system trajectories, and then the solution of the HJI equation has been obtained online without solving this equation. The convergence analysis of the new ADP-based method has been

Acknowledgments

The work described in this paper was supported by the National Natural Science Foundation of China (61034005, 61273027, 61304132, U1504615), the National High Technology Research and Development Program of China (2012AA040104), the Liaoning Industry Program (2013219005) and the China Postdoctoral Science Foundation funded project (2015M572104).

Chunbin Qin received the B.S. degree and the M.S. degree in School of Computer and Information Engineering from the Henan University, Kaifeng, China, in 2004 and 2009, respectively. He received the Ph.D. degree in Power Electronics and Power Transmission from the Northeastern University, Shenyang, China, in 2014. He is currently working in the Henan University as a lecturer. His current research interests include adaptive dynamic programming, neural network; adaptive control, optimal control,

References (30)

  • M. Abu-Khalaf et al.

    Policy iterations and the Hamilton–Jacobi–Isaacs equation for H state feedback control with input saturation

    IEEE Trans. Autom. Control

    (2006)
  • H. Zhang et al.

    Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constrains

    IEEE Trans. Neural Netw.

    (2009)
  • M. Hamidreza et al.

    Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained input continuous-time systems

    Automatica

    (2014)
  • Y. Jiang et al.

    Robust adaptive dynamic programming and feedback stabilization of nonlinear systems

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • H. Zhang et al.

    Adaptive Dynamic Programming for Control: Algorithms and Stability

    (2013)
  • Cited by (0)

    Chunbin Qin received the B.S. degree and the M.S. degree in School of Computer and Information Engineering from the Henan University, Kaifeng, China, in 2004 and 2009, respectively. He received the Ph.D. degree in Power Electronics and Power Transmission from the Northeastern University, Shenyang, China, in 2014. He is currently working in the Henan University as a lecturer. His current research interests include adaptive dynamic programming, neural network; adaptive control, optimal control, game theory, and their industrial applications.

    Huaguang Zhang received the B.S. degree and the M.S. degree in Control Engineering from the Northeast Dianli University of China, Jilin City, China, in 1982 and 1985, respectively. He received the Ph.D. degree in Thermal Power Engineering and Automation from the Southeast University, Nanjing, China, in 1991. His main research interests are fuzzy control, stochastic system control; neural networks based control, adaptive dynamic programming, nonlinear control, and their applications.

    Zhang is an Associate Editor of Automatica, IEEE Transactions on Fuzzy Systems, IEEE Transactions on Cybernetics and Neurocomputing, respectively. He was awarded the Outstanding Youth Science Foundation Award from the National Natural Science Foundation Committee of China in 2003. He was named the Cheung Kong Scholar by the Education Ministry of China in 2005.

    Yingchun Wang was born in Liaoning Province, China, in 1974. He received the B.S., M.S., and Ph.D. degrees from the Northeastern University, Shenyang, China, in 1997, 2003, and 2006, respectively. Since 2006, he has been with the School of Information Science and Engineering, the Northeastern University. He is also currently with the Key Laboratory of Integrated Automation of Process Industry (Northeastern University), National Education Ministry, Shenyang. His research interests include fuzzy control and fuzzy systems, stochastic control, time-delay systems, and nonlinear systems.

    Yanhong Luo received the B.S. degree in Automation Control, the M.S. degree and the Ph.D. degree in Control Theory and Control Engineering from the Northeastern University, Shenyang, China, in 2003, 2006 and 2009, respectively. She is currently working in the Northeastern University as an Associate Professor. Her research interests include approximate dynamic programming, neural networks adaptive control, fuzzy control, and their industrial application.

    View full text