Value iteration based integral reinforcement learning approach for H∞ controller design of continuous-time nonlinear systems
Introduction
In various industrial applications, disturbance exists in many situations and always influences the controlled systems negatively. To handle this control problem, H∞ control has been widely investigated and becomes an essential part of robust control. The goal of H∞ control is to find a feedback controller for a given system while considering the robustness and control performance. In the early years, the H∞ control problem was studied for the linear systems [1], [2]. Later, some researchers [3], [4], [5], [6], [7], [8] well developed the H∞ control theory arising in the nonlinear systems. The work of [6] indicated that the H∞ control problem could be equivalent to a two-player zero-sum differential game. The Nash equilibrium solution of the game could be solved by a equation called Hamilton–Jacobi–Isaacs (HJI), which is a nonlinear partial differential equation (PDE). For the linear case, the HJI equation reduces to a Riccati equation which can be efficiently solved. However, for the nonlinear case, there is still no approach to solve the HJI equation analytically. This has inspired researchers to study approaches for solving the HJI equation approximately, and some direct approaches have been proposed in early period [4], [9]. Unfortunately, the proposed direct approaches were restricted by computational load. In recent years, some researchers developed an indirect approach to approximate the solution of HJI equation by introducing reinforcement learning (RL) technique.
Over the last several decades, RL has been widely studied [10], [11], [12], [13], which attempts to imitate the natural law of learning in mammals. The concept of RL is learning how to map situations to actions, so as to maximize a numerical reward signal [12]. Unlike most forms in machine learning, the learner is not told which actions to take, but instead discover which actions can result in the most wanted reward by trying them. Actually, according to the RL technique, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. Because of this important distinguishing feature, some researchers [14], [15], [16] introduced the idea of RL into solving the optimal problem arising in nonlinear control, and proposed an actor-critic structure to solve a nonlinear PDE called Hamilton–Jacobi–Bellman (HJB) equation approximately to derive the solution. This RL-based technique is named as approximate dynamic programming, or adaptive dynamic programming (ADP). Since the HJI equation is also a nonlinear PDE, much attention have been attracted to introduce this RL-based technique to seek for the solution of HJI [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32]. Generally, there are two typical way in the ADP framework to solve for the PDE, the policy iteration (PI) and the value iteration (VI) [14].
For the H∞ control problem arising in the continuous-time (CT) nonlinear systems, various works have been studied based on PI method. One feature of the PI is that it requires to solve a value function associated with an admissible control policy in the policy evaluation step [33], [34]. In [17], [20], the authors proved that the HJI equation can be solved by using PI, and the iterative convergence to the available storage function associated with a given L2-gain was proposed. In [18], [19], the H∞ control problem with finite-horizon was studied by using PI. In [21], [26], a developed PI based method was proposed which can deal with the systems with unknown drift dynamics and be implemented in an on-line manner. In [22], a PI based method was proposed to shown that the mixed optimum of the zero-sum game can be derived even the saddle point solution does not exist. In [23], [24], the authors attempted to design PI based algorithms to seek for the solution of the HJI equation by using only one neural network. In [25], the authors developed a PI based integral reinforcement learning algorithm [34] for the H∞ control of unknown CT linear systems. In [28], a novel PI based technique called off-policy was introduced to solve the HJI equation and arbitrary policies can be applied to generate the system data to tune the algorithm rather than the evaluating policy. The authors of [29], [31] developed the off-policy technique to design the H∞ controller for unknown CT nonlinear systems. Although various well developed methods were proposed for the H∞ controller design of CT nonlinear systems, all of them were based on PI, and thus the initial admissible control is assumed [33]. From a mathematical point of view, an admissible control can be regarded as a suboptimal control which requires to solve the nonlinear partial differential equations analytically. Thus, to ensure the admissibility may be a serious restrictive condition actually. To the best of our knowledge, there is still no approach to obtain such a control, especially for the nonlinear systems with the existence of disturbance.
On the other hand, the learning mechanism of VI ensures more free in the initial condition than PI, where the admissible control assumption is not required [35], [36], [37], [38], [39], [40], [41]. In [35], the convergence of VI method was proved with an initial zero value function for the optimal control arising in the discrete-time (DT) nonlinear systems. In [39], the authors discussed the convergence of VI in a more general way for the optimal control problem of DT nonlinear systems, where the algorithm can be initialized with an arbitrary positive value function. Since the benefits of initial condition, some researchers introduced VI to solve the H∞ control problem arising in DT systems [42], [43]. In [42], the authors introduced the VI learning mechanism into the Q-learning method for the H∞ control problem of DT linear systems. In [43], the authors developed a VI based algorithm to seek for the solution of the zero-sum game for DT nonlinear systems, which is equal to the solution of the HJI equation associated with H∞ control problems. However, the above works were proposed for the DT nonlinear systems. The discussions on solving the H∞ control problem by VI method for CT nonlinear systems are scarce, which motives our research.
In this paper, a novel VI based integral reinforcement learning method is proposed to design the H∞ controller for CT nonlinear systems. First, the algorithm is proposed by introducing the VI learning mechanism into the integral reinforcement learning to solve the HJI equation arising in H∞ control problems for CT nonlinear systems. Since the proposed method is based on VI learning mechanism, it satisfies a more general initial condition than the works based on PI which requires an initial admissible control for implementation. The iterative property of the value function is analysed with an arbitrary initial positive function, and the H∞ controller can be derived as the iteration converges. For the implementation of the proposed method, three neural networks are introduced to approximate the iterative value function, the iterative control policy and the iterative disturbance policy, respectively. At last, two simulation cases are presented to illustrate the effectiveness of the proposed method.
Section snippets
Problem statement
Consider the CT nonlinear system described as where is the system state vector, is the control input, is the external disturbance and is the output. The dynamics of the system and are Lipschitz continuous on a set and satisfy . The output dynamic satisfies the zero-state observability.
The control objective of H∞ controller design is to seek for a control policy u(x) to ensure the asymptotically stability of
Main results
First, for convenience, the following definition is proposed.
Definition 1 Define as the state of system (1) integrated by time length T from x(t) with control policy u(x) and disturbance policy d(x), i.e.,
Based on Definition 1, inspired by the work of [34], the VI based integral reinforcement learning algorithm for H∞ control of CT nonlinear systems can be described as:
Value function update
Implementation of the proposed method
For the implementation of the ADP based methods, approximate tools, such as neural network, fuzzy basis function and so on, are required to do the approximations of the solutions in (14)–(16), respectively. In this paper, three three-layer back propagation neural networks are introduced named as critic neural network, actor neural network and disturbance neural network, which approximate Vi(x), ui(x) and di(x), respectively.
The three-layer back propagation neural network can be described as
Simulation
Two simulation cases are carried out to show the effectiveness of the VI based integral reinforcement learning method in this section. The first one is a linear case, thus the optimal solution can be solved by Algebraic Riccati Equation (ARE), and we attempt to compare it with the obtained results. The second one is a nonlinear case to validate the theoretical results for nonlinear systems.
Conclusions
In this paper, a novel integral reinforcement learning approach was developed based on VI for designing the H∞ controller of CT nonlinear systems. The proposed algorithm does not require the admissible control for the implementation and thus satisfies a more general initial condition than the works based on PI. The iterative property of the value function was analysed with an arbitrary initial positive function, and the H∞ controller could be derived as the iteration converges. For the
Acknowledgment
This work was supported by the National Natural Science Foundation of China (61433004), and IAPI Fundamental Research Funds 2013ZCX14. This work was supported also by the Development Project of Key Laboratory of Liaoning province.
Geyang Xiao received the B.S. degree in Automation Control from Northeastern University, Shenyang, China, in 2012. He has been pursuing the Ph.D. degree with Northeastern University, Shenyang, China, since 2012. His current research interests include reinforcement learning neural networks-based controls, non-linear optimal controls, adaptive dynamic programming, and their industrial applications.
References (47)
- et al.
An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games
Automatica
(2011) - et al.
H∞ control of linear discrete-time systems: Off-policy reinforcement learning
Automatica
(2017) - et al.
Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
Automatica
(2005) - et al.
Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
Neural Netw.
(2009) - et al.
Data-driven approximate value iteration with optimality error bound analysis
Automatica
(2017) - et al.
Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
Automatica
(2007) - et al.
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
Neurocomputing
(2013) - et al.
Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
Automatica
(2014) - et al.
Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning
Automatica
(2014) Feedback and optimal sensitivity: model reference transformations, multiplicative seminorms, and approximate inverses
IEEE Trans. Autom. Control
(1981)
State-space solutions to standard H2 and H∞ control problems
IEEE Trans. Autom. Control
L2-gain analysis of nonlinear systems and nonlinear state-feedback H∞ control
IEEE Trans. Autom. Control
Numerical approach to computing nonlinear H-infinity control laws
J. Guid. Control Dyn.
H∞ control via measurement feedback for general nonlinear systems
IEEE Trans. Autom. Control
H∞ Optimal Control and Related Minimax Design Problems
Successive Galerkin approximation algorithms for nonlinear optimal and robust control
Int. J. Control
Dynamic Noncooperative Game Theory
Users guide to viscosity solutions of second order partial differential equations
Bull. Am. Math. Soc.
Neuro-Dynamic Programming
Reinforcement Learning: An Introduction
Mastering the game of go with deep neural networks and tree search
Nature
Cited by (22)
Integral reinforcement learning-based guaranteed cost control for unknown nonlinear systems subject to input constraints and uncertainties
2021, Applied Mathematics and ComputationIntegral reinforcement learning based event-triggered control with input saturation
2020, Neural NetworksCritic-only adaptive dynamic programming algorithms’ applications to the secure control of cyber–physical systems
2020, ISA TransactionsCitation Excerpt :Different from PI-based methods, VI-based methods can start without the initial admissible condition. Motivated by the significant works [24,33,42,43], the VI method is presented in the following Algorithm 2. In this section, the secure control scheme is designed based on the solution of the zero-sum game, and the tuning laws of parameters are derived through Lyapunov stability theory.
Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system
2020, Journal of the Franklin InstituteCitation Excerpt :Lately, many researches on optimal control in multi-player game problem have been published. As a prior attempt, zero-sum game for two players interested a number of researchers, which has been addressed as H∞ control problem [26–31]. For zero-sum game, differing from the original HJB equation, a new formula depending on two control policies is developed and called Hamilton-Jacobi-Isaacs equation, where the control policies are treated as input control policy and disturbances.
Synchronous optimal control method for nonlinear systems with saturating actuators and unknown dynamics using off-policy integral reinforcement learning
2019, NeurocomputingCitation Excerpt :It should be indicated that the integral reinforcement learning (IRL) algorithm is able to develop the Bellman equation, regardless of the system dynamics [12,13]. Xio et al. [14] developed a novel IRL approach, based on the value iteration for designing and obtaining the H∞ controller for nonlinear continuous-time systems. A number of IRL methods are based on off-policy methods.
Value Iteration-Based Cooperative Adaptive Optimal Control for Multi-Player Differential Games With Incomplete Information
2024, IEEE/CAA Journal of Automatica Sinica
Geyang Xiao received the B.S. degree in Automation Control from Northeastern University, Shenyang, China, in 2012. He has been pursuing the Ph.D. degree with Northeastern University, Shenyang, China, since 2012. His current research interests include reinforcement learning neural networks-based controls, non-linear optimal controls, adaptive dynamic programming, and their industrial applications.
Huaguang Zhang received the B.S. degree and the M.S. degree in control engineering from Northeast Dianli University of China, Jilin City, China, in 1982 and 1985, respectively. He received the Ph.D. degree in thermal power engineering and automation from Southeast University, Nanjing, China, in 1991. He joined the Department of Automatic Control, Northeastern University, Shenyang, China, in 1992, as a Postdoctoral Fellow for two years. Since 1994, he has been a Professor and Head of the Institute of Electric Automation, School of Information Science and Engineering, Northeastern University, Shenyang, China. His main research interests are fuzzy control, stochastic system control, neural networks based control, nonlinear control, and their applications. He has authored and coauthored over 200 journal and conference papers, four monographs and co-invented 20 patents.
Kun Zhang received the B.S. degree in mathematics and applied mathematics from Hebei Normal University, Shijiazhuang, China, in 2012 and the M.S. degree in management science and engineering from Northwest University for Nationalities, Lanzhou, China, in 2015. He is currently pursuing the Ph.D. degree in control theory and control engineering at Northeastern University, Shenyang, China. His main research interests include reinforcement learning, dynamic programming, neural networks-based controls and their industrial applications.
Yinlei Wen received the B.S. degree in automation control in 2012 and the M.S. degree in control engineering in 2014 from Northeastern University, Shenyang, China. He has been pursuing the Ph.D. degree since 2015 in Northeastern University, Shenyang, China. His current research covers neural adaptive dynamic programming, neural networks, non-linear controls and their industrial applications.