Fixed-final-time optimal tracking control of input-affine nonlinear systems

doi:10.1016/j.neucom.2013.09.006

Neurocomputing

Volume 129, 10 April 2014, Pages 528-539

https://doi.org/10.1016/j.neucom.2013.09.006 Get rights and content

Abstract

In this study, approximate dynamics programming framework is utilized for solving the Bellman equation related to the fixed-final-time optimal tracking problem of input-affine nonlinear systems. Convergence of the weights of the neurocontroller in the proposed successive approximation based algorithms is provided and the network is trained to provide the optimal solution to the problems with (a) unspecified initial conditions (b) different time horizons, and (c) different reference trajectories under certain general conditions. Numerical simulations illustrate the versatility of the proposed neurocontroller.

Introduction

Approximate dynamics programming (ADP) has shown a lot of promise in solving optimal control problems with neural networks (NN) as the enabling structure [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Mechanism for ADP is usually provided through a dual network architecture called the Adaptive Critics (AC) [3], [2]. In the heuristic dynamic programming (HDP) class with ACs, one network, called the ‘critic’ network, maps the input states to output the cost and another network, called the ‘action’ network, outputs the control with states of the system as its inputs [4], [5]. In the dual heuristic programming (DHP) formulation, while the action network remains the same as the HDP, the critic network outputs the costates with the current states as inputs [2], [6], [7]. The convergence proof of DHP for linear systems is presented in [8] and that of HDP for general case is presented in [4]. The Single Network Adaptive Critics (SNAC) architecture developed in [9] is shown to be able to eliminate the need for the second network and perform DHP using only one network. This process results in a considerable decrease in the offline training effort and the resulting simplicity makes it attractive for online implementation requiring less computational resources and storage memory. Similarly, the cost function based SNAC (J-SNAC) eliminates the need for the action network in an HDP scheme [10]. While [2], [3], [4], [5], [6], [7], [8], [9], [10] deal with discrete-time systems, some researchers have recently focused on continuous time problems [11], [12], [13].

Note that these developments in the neural network (NN) literature have mainly addressed only the infinite horizon or regulator type problems. Finite-horizon optimal control is relatively more difficult due to the time varying Hamilton–Jacobi–Bellman (HJB) equation resulting in a time-to-go dependent optimal cost function and costates. Using numerical methods [14] a two-point boundary value problem (TPBVP) needs to be solved for each set of initial condition for a given final time and it will provide only an open loop solution. The control loop can be closed using techniques like Model Predictive Control (MPC) as done in [15], however, the result will be valid only for one set of initial conditions and final time. This limitation holds for the method developed in [16] also. Ref. [17] develops a dynamics optimization scheme which gives an open-loop solution, then, optimal tracking is used for rejecting the online perturbation and deviations from the optimal trajectory. Authors of [18] used power series to solve the problem with small nonlinearities, and in [19] an approximated solution is given through the so-called Finite-horizon State Dependent Riccati Equation (Finite-SDRE) method.

Neural networks are used for solving finite-horizon optimal control problem in [20], [21], [22], [23], [24], [25]. Authors of [20] developed a neurocontroller for a scalar problem with terminal constraint using AC. Continuous-time problems are considered in [21] and [22] where the time-dependent weights are calculated through a backward integration. The finite-horizon problem with unspecified terminal time and a fixed terminal state is considered in [23], [24]. The neurocontroller developed in [23] can work only with one set of initial conditions and if the initial state is changed, the network needs to be re-trained to give the optimal solution for the new state. This limitation holds for [24] as well.

In many practical systems one is interested in tracking a desired signal. Examples of such systems are contour tracking in machining processes [34], [35] and control of robotic manipulators [36]. In some systems the tracking is required to be carried out in a given time, see [37] for an example of such a case in an autopilot design. The constraint of final time being fixed makes the problem very difficult to solve. Missile guidance problems and launch vehicle problems are some other applications in this class of problems. Solving optimal tracking problems for nonlinear systems using adaptive critics has been investigated by researchers in [26], [27], [28], [29], [30], [31], [32]. In [26] the authors have developed a tracking controller for the system whose input gain matrix is invertible. In [27] the reference signal is limited to those which satisfy the dynamics of the system. Developments in [28], [29], [30], [31] solve the tracking problem for the systems of nonlinear Brunovsky canonical form. Finally, the finite-horizon tracking neurocontroller developed in [32] can control only one set of initial conditions and requires the input gain matrix of the dynamics to be invertible.

In this paper, a single neural network based solution, called Finite-horizon Single Network Adaptive Critics (Finite-SNAC), is developed which embeds an approximate solution to the discrete-time HJB equation for fixed-final-time optimal tracking problems. The approximation can be made as accurate as desired using rich enough basis functions. Consequently, the offline trained network can be used to generate online feedback control in real-time. The neurocontroller is able to solve optimal tracking problem of general nonlinear control-affine dynamics for tracking either a given arbitrary trajectory or a family of trajectories which share the same, possibly nonlinear, dynamics. Once the network is trained, it will give optimal solution for every initial condition as long as the resulting trajectory lies on the domain for which the network is trained, hence, Finite-SNAC does not have the restrictions of some of the cited references in the field. Furthermore, a major advantage of the proposed technique is that this network provides optimal feedback solutions to any different final time as long as it is less than the final time for which the network is synthesized. An innovative proof is developed which shows the successive approximation based training algorithm is a contraction mapping [33].

Comparing the developed controller in this paper with the available intelligent controllers in the literature, the closest ones are [20], [25]. As compared to [20], in this study, only one network is needed for computing the control, and this idea has been generalized to tracking with free final state. Moreover, convergence proofs are provided. The differences between this study and [25] are (a) solving tracking problem versus the problem of brining the states to zero in [25] (b) using time varying weights for the neural networks as opposed to the time invariant weights in that reference, (c) developing a ‘backward in time’ training algorithm versus the pure ‘iterative’ algorithm in [25], and (d) providing a completely different convergence proof in this study. The advantages of the development in this study versus the available finite-horizon optimal tracking methods in the literature [32], [37] are providing solutions for different initial conditions and different final-times, without needing to retrain the network as in [32], or needing to recalculate the series of differential Riccati equation till they converge as in [37]. Moreover, the restriction of requiring an invertible input-gain matrix in [32] does not exist here. Finally the advantage of this study versus the MPC approach utilized in [15] for optimal tracking is having a negligible computational load in here versus the huge real-time computational load in MPC for online numerical calculation of the optimal solution at each instant, as detailed in [15]. In here, however, once the networks are trained offline, the online calculation of the control is as simple as a feeding the states to the network to get the costate vector and hence, the control.

The rest of the paper is organized as follows: Finite-SNAC is developed in section II. Relevant convergence theorems are presented in Section 3. A modified version of the controller for higher versatility is proposed in Section 4, and the numerical results and analyses are presented in Section 5. Finally, the conclusions are given in Section 6.

Section snippets

Theory of Finite-SNAC

Consider the nonlinear continuous-time input-affine system $\dot{x} (t) = f_{c} (x (t)) + g_{c} (x (t)) u (t),$ where $x (t) \in ℝ^{n}$ and $u (t) \in ℝ^{l}$ denote the state and the control vectors at time $t$ , respectively, and parameters $n$ and $l$ are the dimension of the state and control vectors. Smooth functions $f_{c} : ℝ^{n} \to ℝ^{n}$ and $g_{c} : ℝ^{n} \to ℝ^{n \times l}$ are the system dynamics and the initial states are given by $x (0)$ . Given reference signal $r (t) \in ℝ^{n}$ for $t \in [0 t_{f}]$ , where the initial time is selected as zero and the final time is denoted by $t_{f}$ , the objective

Convergence theorems

The proposed algorithm for Finite-SNAC training is based on DHP, that is, it learns the optimal costate vector through a successive approximation scheme. Starting with an initial value for the costate vector one iterates to converge to the optimal costate. Here, the proof of convergence of the weights using the training scheme given in Algorithm 1 to the optimal weights is given.

Theorem 1

Selecting any finite initial guess on $W_{k}^{0}$ for $k \in {0, 1, \dots, N - 1}$ , there exists some sampling time $Δ t$ for discretization of

A modification on the Finite-SNAC for optimal tracking of a family of reference signals

The finite-time optimal tracking control at each time step $k$ is a function of the system states, the reference signal, and the time-to-go. As seen in (9), the network structure is selected such that only the system state is fed to the network, however, dependency of the control on the reference signal and the time-to-go are being learned by the time-step dependent weight matrix. This synthesis is suitable for cases where the reference signal is fixed and, hence, the network can be trained

Numerical analysis

Example 1

As the first example, the NN structure given in (9) is simulated based on Algorithm 1. A nonlinear benchmark system, namely Van der Pol's oscillator is selected with the dynamics of

\begin{matrix} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = (1 - x_{1}^{2}) x_{2} - x_{1} + u \end{matrix}

where

x_{i}

i = 1, 2

, denotes the state vector elements. The reference trajectory to be tracked is selected as

r (t) = {[r_{1} (t) r_{2} (t)]}^{T}

which is generated using the dynamics of

\dot{r} (t) = [\begin{matrix} \sin (π t) \\ π \cos (π t) \end{matrix}]

and the initial condition of

r (0) = {[0 0]}^{T}

. The fixed horizon is selected as

3 s

and the sampling time of

Δ t =

Conclusions

An approximate dynamics programing based neurocontroller is developed for the optimal tracking control of nonlinear systems and the proofs are given for the convergence of the weights and the optimality of the results. The controller does not have the limitations of the cited optimal tracking controllers and can learn to track either a given reference trajectory or a family of trajectories which share the same dynamics, e.g., different trajectories resulted from different initial conditions.

Acknowledgment

This work was partially support by a Grant from the National Science Foundation.

Ali Heydari received his Ph.D. degree in mechanical engineering from the Missouri University of Science and Technology in 2013. He is currently an Assistant Professor of Mechanical Engineering at the South Dakota School of Mines and Technology. He was the recipient of the Outstanding M.Sc. Thesis Award from the Iranian Aerospace Society, the Best Student Paper Runner-Up Award from the AIAA Guidance, Navigation and Control Conference, and the Outstanding Graduate Teaching Award from the Academy

References (45)

T. Dierks et al.
Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence
Neural Networks
(2009)
R. Padhi et al.
A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems
Neural Networks
(2006)
D. Vrabie et al.
Adaptive optimal control for continuous-time linear systems based on policy iteration
Automatica
(2009)
K. Vamvoudakis et al.
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica
(2010)
T. Cimen et al.
Global optimal feedback control for general nonlinear systems with nonquadratic performance criteria
Syst. Control Lett.
(2004)
V. Costanza et al.
Finite-horizon dynamic optimization of nonlinear systems in real time
Automatica
(2008)
T. Cheng et al.
A neural network solution for fixed-final time optimal control of nonlinear systems
Automatica
(2007)
D. Wang et al.
Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach
Neurocomputing
(2012)
T. Cimen et al.
Nonlinear optimal tracking control with application to super-tankers for autopilot design
Automatica
(2004)
J.G. Attali et al.
Approximations of functions by a multilayer perceptron: a new approach
Neural Networks
(1997)

P.J. Werbos

Approximate dynamic programming for real-time control and Neural modeling

S.N. Balakrishnan et al.

Adaptive-critic based neural networks for aircraft optimal control

J. Guidance, Control Dyn.

(1996)

D.V. Prokhorov et al.

Adaptive critic designs

IEEE Trans. Neural Networks

(1997)

A. Al-Tamimi et al.

Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof

IEEE Trans. Syst., Man, Cybern. B, Cybern.

(2008)

S. Ferrari et al.

Online adaptive critic flight control

J. Guidance, Control Dyn.

(2004)

G.K. Venayagamoorthy et al.

Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator

IEEE Trans. Neural Networks

(2002)

X. Liu, S.N. Balakrishnan, Convergence analysis of adaptive critic based optimal control, in: Procedings of the...

J. Ding, S.N. Balakrishnan, An online nonlinear optimal controller synthesis for aircraft with model uncertainties, in:...

T. Dierks, S. Jagannathan, Optimal control of affine nonlinear continuous-time systems, in: Proceedings of the American...

D.E. Kirk

Optimal Control Theory: An Introduction

(2004)

J. Liang

Optimal magnetic attitude control of small spacecraft

(2005)

S.R. Vadali et al.

Optimal finite-time feedback controllers for nonlinear systems with terminal constraints

J. Guidance, Control, Dyn.

(2006)

Cited by (42)

Model-free finite-horizon optimal tracking control of discrete-time linear systems
2022, Applied Mathematics and Computation
Conventionally, the finite-horizon linear quadratic tracking (FHLQT) problem relies on solving the time-varying Riccati equations and the time-varying non-causal difference equations as the system dynamics is known. In this paper, with unknown system dynamics being considered, a Q-function-based model-free method is developed to solve the FHLQT problem. First, an augmented system consisting of the controlled system and the desired trajectory system is formulated, and the FHLQT problem transforms to the finite-horizon linear quadratic regulator (FHLQR) problem with the augmented system. Then, a time-varying Q-function which depends explicitly on the control input is defined. With the defined time-varying Q-function, a model-free finite-horizon control method is developed to approximate the solutions of the time-varying Riccati equations of the transformed FHLQR problem. At last, simulation studies are carried out to verify the validity of the developed method.
Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming
2022, Neurocomputing
Although robust regulation problem has been well studied, solving robust tracking control via online learning has not been fully solved, in particular for nonlinear systems. This paper develops an online adaptive learning technique to complete the robust tracking control design for nonlinear uncertain systems, which uses the ideas of adaptive dynamic programming (ADP) proposed for optimal control. An augmented system is first constructed using the tracking error and reference trajectory, so as to reformulate the tracking control into a modified robust regulation problem. Then, an equivalence between the robust control and the optimal control is established by using a constructive discounted cost function, which allows to design the robust control by tackling the optimal control of its nominal system. Then, the derived Hamilton–Jacobi-Bellman (HJB) equation is solved by training a critic neural network (NN). Finally, an adaptive learning algorithm is adopted to online directly update the unknown NN weights, where the convergence can be guaranteed. The closed-loop system stability is rigorously proved and extensive simulation results are given to show the effectiveness of the developed learning algorithm.
Sub-optimal tracking in switched systems with fixed final time and fixed mode sequence using reinforcement learning
2021, Neurocomputing
Citation Excerpt :
In the lower level, one finds the optimal policies as (16). In this section, a SNAC solution introduced in [19] is adapted to solve the optimal tracking in switched systems. The main idea is using function approximators to approximate/predict the costates.
Optimization of sampling intervals for tracking control of nonlinear systems: A game theoretic approach
2019, Neural Networks
Citation Excerpt :
Adaptive critic framework, commonly referred to as actor-critic in ADP/RL literature, is widely used to solve the optimal control problem where the critic NN is used to approximate the optimal value function and the actor NN is employed to approximate the optimal control policy. A great deal of research results under ADP/RL schemes, such as, value iteration (Bertsekas et al., 1995), policy iteration(Bertsekas, 2017), online learning (Dierks & Jagannathan, 2009), robust ADP (Wang, Liu, Zhang, & Li, 2018), and data driven ADP (Zhang, Cui, Zhang, & Luo, 2011) are available in the literature for both optimal regulation (Heydari, 0000) and tracking (Heydari & Balakrishnan, 2014; Kamalapurkar, Dinh, Bhasin, & Dixon, 2015; Modares & Lewis, 2014; Mu, Sun, Song, & Yu, 2016; Wang et al., 2018; Yang & He, 2018; Zhang et al., 2011) control problems. For the tracking control problem, the authors in Mu et al. (2016) presented an iterative globalized dual heuristic programming (GDHP) based solution for discrete time systems.
This paper presents a near optimal adaptive event-based sampling scheme for tracking control of an affine nonlinear continuous-time system. A zero-sum game approach is proposed by introducing a novel performance index. The optimal value function, i.e., the solution to the associatedHamilton–Jacobi–Issac (HJI) equation is approximated using a functional link neural network (FLNN) with event-based aperiodic state feedback information as inputs. The saddle point approximated optimal solution is employed to design the near optimal event-based control policy and the sampling condition. An impulsive weight update scheme is designed to guarantee local ultimate boundedness of the closed-loop parameters, which is analyzed via extension of Lyapunov stability theory for the impulsive hybrid dynamical systems. Zeno-freeness of the event-sampling scheme is enforced and its effect on stability is analyzed. Finally, numerical simulation results are included to corroborate the analytical design, which shows a 48.82% reduction of feedback communication and computational load.
Adaptive Dynamic Programming Based Visual Servoing Tracking Control for Mobile Robots
2023, Zidonghua Xuebao/Acta Automatica Sinica
Optimal Tracking in Switched Systems With Free Final Time and Fixed Mode Sequence Using Approximate Dynamic Programming
2023, IEEE Transactions on Neural Networks and Learning Systems

View all citing articles on Scopus

S. N. Balakrishnan received his Ph.D. degree in Aerospace Engineering from the University of Texas in Austin. He is currently Curators' Professor of Aerospace Engineering in the Department of Mechanical and Aerospace Engineering at Missouri University of Science Technology in Rolla, Missouri. His research interests include neural networks, optimal control, estimation, nonlinear control, control of large scale systems and impulse driven systems. His papers mainly deal with development of algorithms in control and estimation and applications to aircrafts, missile, spacecraft, launch vehicles, robots, manufacturing and other interesting systems.

View full text

Fixed-final-time optimal tracking control of input-affine nonlinear systems

Abstract

Introduction

Section snippets

Theory of Finite-SNAC

Convergence theorems

A modification on the Finite-SNAC for optimal tracking of a family of reference signals

Numerical analysis

Conclusions

Acknowledgment

Neural Networks

Neural Networks

Automatica

Automatica

Syst. Control Lett.

Automatica

Automatica

Neurocomputing

Automatica

Neural Networks

Approximate dynamic programming for real-time control and Neural modeling

Adaptive-critic based neural networks for aircraft optimal control

J. Guidance, Control Dyn.

Adaptive critic designs

IEEE Trans. Neural Networks

Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof

IEEE Trans. Syst., Man, Cybern. B, Cybern.

Online adaptive critic flight control

J. Guidance, Control Dyn.

Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator

IEEE Trans. Neural Networks

Optimal Control Theory: An Introduction

Optimal magnetic attitude control of small spacecraft

Optimal finite-time feedback controllers for nonlinear systems with terminal constraints

J. Guidance, Control, Dyn.