Online optimal control of unknown discrete-time nonlinear systems by using time-based adaptive dynamic programming
Introduction
The theory of optimal control is concerned with finding a control law for a given system and user defined optimality criterion. Traditional optimal control design methods are generally offline and require complete knowledge of the system dynamics [1]. Adaptive control techniques on the other side are designed for online use of uncertain systems. However, classical adaptive control methods are generally far from optimal.
During the last few decades, reinforcement learning (RL) [2], [3], [4] has successfully provided a way to bring together the advantages of adaptive and optimal control. A class of RL-based adaptive optimal controllers, called approximate/adaptive dynamic programming (ADP), was first developed by Werbos [5], [6]. Extensions of the RL-based controllers to DT systems have been considered by many researchers [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]. In [7], the authors attempted to solve the DT nonlinear optimal control problem offline using ADP approaches and neural networks by assuming that there are no NN reconstruction errors. Based on the results of [7], other researchers developed offline ADP approaches in some complicated situations, such as optimal tracking problem [8], [14], [21], optimal control with control constraints [10], optimal control with time delays [11], [12], optimal control with finite approximation errors [15]. However, the above works are all required the knowledge of system dynamics and using offline tuning law.
Since the mathematical models of real-world system dynamics are often difficult to build, it has become one of the main foci of control practitioners to design the optimal controller for nonlinear systems with unknown dynamics. The work of [9] analyzed the convergence of unknown DT nonlinear systems using offline-trained neural networks, but this method introduced the Lebesgue integral [7], which required data of a subset of the plant, in the tuning law and thus spent too much time on off-line training. In [20], the authors developed one way to control the unknown DT nonlinear systems using globalized dual heuristic programming, and others employed the single network dual heuristic dynamic programming (SN-DHP) technique in the ADP algorithm in [19]. Both of them introduced the gradient-based adaptation tuning law instead of the way in [9]. However, without using recorded system data, iterations were needed in the tuning law [19], [20] and the critic NN and actor NN could not be updated with respect to time at each sampling interval. Moreover, although [9], [20], [21] constructed a NN to identify the unknown system dynamics, they assumed that the NN identification error approached to zero, and thus the effects of the estimation error on the convergence of the actor–critic algorithms were not considered.
On the other hand, online adaptive-optimal controller designs were presented in [17], [18], [22], [23], [24] to overcome the iterative offline training methodology. The central theme of the approaches in [24], [23] as well as several works in [22] is that the cost function and optimal control signal are approximated by online parametric structures, such as NN. Although the proposed methods in [22], [23], [24] are verified via numerical simulations, the approximation errors are not considered and proofs of convergence are not demonstrated. The work of [17] presented a novel approach that relied on current and recorded system data for adaptation and proved the convergence while the approximation errors are considered, and recently the authors in [18] improved this method in the presence of unknown internal dynamics and called it time-based ADP algorithm. However, since the requirement of the knowledge of control coefficient matrix in the tuning law, the general time-based ADP algorithm [17], [18] becomes invalid while dealing with the unknown DT nonlinear system. Meanwhile, most of the online adaptive optimal control algorithms with ADP require a persistence of excitation (PE) condition [25], [26], [27] that is important in NN identification. Refs. [17], [18] proposed a similar condition to ensure the PE requirement, but they did not give the lower bound in the proof.
The contributions of this paper lie in the development of an online adaptive learning algorithm to solve an infinite horizon optimal control problem for unknown DT nonlinear systems. By performing identification process, the time-based ADP algorithm, which makes use of current and recorded system data, is applicable to deal with the optimal control problem of unknown nonlinear systems. However, the general time-based ADP technique requires knowing the system dynamics. By using current and recorded system information, the PE condition is ensured by a new criterion with explicit lower bound and the unknown nonlinear DT system can be controlled once at the sampling instant. Convergence of the system states and NN implementation is demonstrated while explicitly considering all the NN reconstruction errors in contrast to previous works [9], [20], [21].
Section snippets
Background
Consider the affine DT nonlinear system described bywhere , , and . Without loss of generality, assume that the system is controllable, sufficiently smooth, drift free, and that x=0 is a unique equilibrium point on a compact set while the states are considered measurable. In the following part, is denoted by uk for simplicity.
Define the infinite horizon cost functionwhere
NN identification of the unknown nonlinear system
To begin the NN identifier construction, the system dynamics (1) are rewritten asThe function has a NN representation on a compact set S according to the universal approximation property of NN, which can be written aswhere and are the constant ideal weight matrices. is the number of hidden layer neurons. is the NN activation function, is the NN input and let
Optimal control of unknown nonlinear DT systems
From the background section, it could be seen that the essential part of solving the DTHJB equation is to compute (3), (4) iteratively. However, since the system dynamics are unknown, the general time-based ADP algorithm [17], [18] cannot be implemented to solve the above two equations. To circumvent the deficiency, the general optimal control (4) can be substituted by (15) based on the NN identification of the unknown nonlinear system. And then, we propose a novel time-based ADP algorithm
Simulations
This section presents a simulation example to illustrate the effectiveness of the proposed optimal adaptive control algorithm. Consider a two order nonlinear discrete time systemwhere the system dynamics are unknown. The quadratic cost functions are chosen as and R=1.
We choose three-layer feedforward NNs as model neural network, critic neural network, and action neural network with the structures 3–8–2, 2–8–1, 2–8–1,
Conclusions
A new time-based ADP algorithm for unknown DT nonlinear systems, which is independent to the knowledge of the system dynamics, has been proposed in this paper. By using current and recorded system information, the PE requirement has been ensured by a new assumption with explicit lower bound and the unknown nonlinear DT system could be controlled once at the sampling instant. By considering an appropriate Lyapunov function, we have proven UUB of the overall closed loop system under the effects
Acknowledgement
This work was supported by the National Natural Science Foundation of China (61034005, 61433004), and the National High Technology Research and Development Program of China (2012AA040104) and IAPI Fundamental Research Funds 2013ZCX14. This work was supported also by the development project of Key Laboratory of Liaoning Province.
Geyang Xiao received the B.S. degree in Automation Control from Northeastern University, Shenyang, China, in 2012. He has been pursuing the Ph.D. degree with Northeastern University, Shenyang, China, since 2012. His current research interests include neural networks-based controls, non-linear controls, adaptive dynamic programming, and their industrial applications.
References (27)
- et al.
Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence
Neural Netw.
(2009) - et al.
An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming
Acta Autom. Sin.
(2010) - et al.
Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach
Neurocomputing
(2012) - et al.
Neuro-optimal control for a class of unknown nonlinear dynamic systems using sn-dhp technique
Neurocomputing
(2013) - et al.
Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative adp algorithm
Neurocomputing
(2014) - et al.
Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica
(2010) - et al.
Identification of nonlinear dynamical systems using multilayered neural networks
Automatica
(1996) - et al.
Optimal Control
(2012) - et al.
Reinforcement Learning: An Introduction
(1998) Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley Series in Probability and Statistics
(2007)
Reinforcement learning and feedback controlusing natural decision methods to design optimal adaptive controllers
IEEE Control Syst.
Cited by (0)
Geyang Xiao received the B.S. degree in Automation Control from Northeastern University, Shenyang, China, in 2012. He has been pursuing the Ph.D. degree with Northeastern University, Shenyang, China, since 2012. His current research interests include neural networks-based controls, non-linear controls, adaptive dynamic programming, and their industrial applications.
Huaguang Zhang received the B.S. degree and the M.S. degree in control engineering from Northeast Dianli University of China, Jilin City, China, in 1982 and 1985, respectively. He received the Ph.D. degree in thermal power engineering and automation from Southeast University, Nanjing, China, in 1991. He joined the Department of Automatic Control, Northeastern University, Shenyang, China, in 1992, as a Postdoctoral Fellow for two years. Since 1994, he has been a Professor and Head of the Institute of Electric Automation, School of Information Science and Engineering, Northeastern University, Shenyang, China. His main research interests are fuzzy control, stochastic system control, neural networks based control, nonlinear control, and their applications. He has authored and coauthored over 200 journal and conference papers, four monographs and co-invented 20 patents.
Yanhong Luo received the B.S. degree in automation control, M.S. degree and Ph.D. degree in control theory and control engineering from Northeastern University, Shenyang, China, in 2003, 2006 and 2009, respectively. She is currently working in Northeastern University as an Associate Professor. Her research interests include approximate dynamic programming, neural networks adaptive control, fuzzy control and their industrial application.