Elsevier

Neurocomputing

Volume 129, 10 April 2014, Pages 528-539
Neurocomputing

Fixed-final-time optimal tracking control of input-affine nonlinear systems

https://doi.org/10.1016/j.neucom.2013.09.006Get rights and content

Abstract

In this study, approximate dynamics programming framework is utilized for solving the Bellman equation related to the fixed-final-time optimal tracking problem of input-affine nonlinear systems. Convergence of the weights of the neurocontroller in the proposed successive approximation based algorithms is provided and the network is trained to provide the optimal solution to the problems with (a) unspecified initial conditions (b) different time horizons, and (c) different reference trajectories under certain general conditions. Numerical simulations illustrate the versatility of the proposed neurocontroller.

Introduction

Approximate dynamics programming (ADP) has shown a lot of promise in solving optimal control problems with neural networks (NN) as the enabling structure [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Mechanism for ADP is usually provided through a dual network architecture called the Adaptive Critics (AC) [3], [2]. In the heuristic dynamic programming (HDP) class with ACs, one network, called the ‘critic’ network, maps the input states to output the cost and another network, called the ‘action’ network, outputs the control with states of the system as its inputs [4], [5]. In the dual heuristic programming (DHP) formulation, while the action network remains the same as the HDP, the critic network outputs the costates with the current states as inputs [2], [6], [7]. The convergence proof of DHP for linear systems is presented in [8] and that of HDP for general case is presented in [4]. The Single Network Adaptive Critics (SNAC) architecture developed in [9] is shown to be able to eliminate the need for the second network and perform DHP using only one network. This process results in a considerable decrease in the offline training effort and the resulting simplicity makes it attractive for online implementation requiring less computational resources and storage memory. Similarly, the cost function based SNAC (J-SNAC) eliminates the need for the action network in an HDP scheme [10]. While [2], [3], [4], [5], [6], [7], [8], [9], [10] deal with discrete-time systems, some researchers have recently focused on continuous time problems [11], [12], [13].

Note that these developments in the neural network (NN) literature have mainly addressed only the infinite horizon or regulator type problems. Finite-horizon optimal control is relatively more difficult due to the time varying Hamilton–Jacobi–Bellman (HJB) equation resulting in a time-to-go dependent optimal cost function and costates. Using numerical methods [14] a two-point boundary value problem (TPBVP) needs to be solved for each set of initial condition for a given final time and it will provide only an open loop solution. The control loop can be closed using techniques like Model Predictive Control (MPC) as done in [15], however, the result will be valid only for one set of initial conditions and final time. This limitation holds for the method developed in [16] also. Ref. [17] develops a dynamics optimization scheme which gives an open-loop solution, then, optimal tracking is used for rejecting the online perturbation and deviations from the optimal trajectory. Authors of [18] used power series to solve the problem with small nonlinearities, and in [19] an approximated solution is given through the so-called Finite-horizon State Dependent Riccati Equation (Finite-SDRE) method.

Neural networks are used for solving finite-horizon optimal control problem in [20], [21], [22], [23], [24], [25]. Authors of [20] developed a neurocontroller for a scalar problem with terminal constraint using AC. Continuous-time problems are considered in [21] and [22] where the time-dependent weights are calculated through a backward integration. The finite-horizon problem with unspecified terminal time and a fixed terminal state is considered in [23], [24]. The neurocontroller developed in [23] can work only with one set of initial conditions and if the initial state is changed, the network needs to be re-trained to give the optimal solution for the new state. This limitation holds for [24] as well.

In many practical systems one is interested in tracking a desired signal. Examples of such systems are contour tracking in machining processes [34], [35] and control of robotic manipulators [36]. In some systems the tracking is required to be carried out in a given time, see [37] for an example of such a case in an autopilot design. The constraint of final time being fixed makes the problem very difficult to solve. Missile guidance problems and launch vehicle problems are some other applications in this class of problems. Solving optimal tracking problems for nonlinear systems using adaptive critics has been investigated by researchers in [26], [27], [28], [29], [30], [31], [32]. In [26] the authors have developed a tracking controller for the system whose input gain matrix is invertible. In [27] the reference signal is limited to those which satisfy the dynamics of the system. Developments in [28], [29], [30], [31] solve the tracking problem for the systems of nonlinear Brunovsky canonical form. Finally, the finite-horizon tracking neurocontroller developed in [32] can control only one set of initial conditions and requires the input gain matrix of the dynamics to be invertible.

In this paper, a single neural network based solution, called Finite-horizon Single Network Adaptive Critics (Finite-SNAC), is developed which embeds an approximate solution to the discrete-time HJB equation for fixed-final-time optimal tracking problems. The approximation can be made as accurate as desired using rich enough basis functions. Consequently, the offline trained network can be used to generate online feedback control in real-time. The neurocontroller is able to solve optimal tracking problem of general nonlinear control-affine dynamics for tracking either a given arbitrary trajectory or a family of trajectories which share the same, possibly nonlinear, dynamics. Once the network is trained, it will give optimal solution for every initial condition as long as the resulting trajectory lies on the domain for which the network is trained, hence, Finite-SNAC does not have the restrictions of some of the cited references in the field. Furthermore, a major advantage of the proposed technique is that this network provides optimal feedback solutions to any different final time as long as it is less than the final time for which the network is synthesized. An innovative proof is developed which shows the successive approximation based training algorithm is a contraction mapping [33].

Comparing the developed controller in this paper with the available intelligent controllers in the literature, the closest ones are [20], [25]. As compared to [20], in this study, only one network is needed for computing the control, and this idea has been generalized to tracking with free final state. Moreover, convergence proofs are provided. The differences between this study and [25] are (a) solving tracking problem versus the problem of brining the states to zero in [25] (b) using time varying weights for the neural networks as opposed to the time invariant weights in that reference, (c) developing a ‘backward in time’ training algorithm versus the pure ‘iterative’ algorithm in [25], and (d) providing a completely different convergence proof in this study. The advantages of the development in this study versus the available finite-horizon optimal tracking methods in the literature [32], [37] are providing solutions for different initial conditions and different final-times, without needing to retrain the network as in [32], or needing to recalculate the series of differential Riccati equation till they converge as in [37]. Moreover, the restriction of requiring an invertible input-gain matrix in [32] does not exist here. Finally the advantage of this study versus the MPC approach utilized in [15] for optimal tracking is having a negligible computational load in here versus the huge real-time computational load in MPC for online numerical calculation of the optimal solution at each instant, as detailed in [15]. In here, however, once the networks are trained offline, the online calculation of the control is as simple as a feeding the states to the network to get the costate vector and hence, the control.

The rest of the paper is organized as follows: Finite-SNAC is developed in section II. Relevant convergence theorems are presented in Section 3. A modified version of the controller for higher versatility is proposed in Section 4, and the numerical results and analyses are presented in Section 5. Finally, the conclusions are given in Section 6.

Section snippets

Theory of Finite-SNAC

Consider the nonlinear continuous-time input-affine systemẋ(t)=fc(x(t))+gc(x(t))u(t),where x(t)n and u(t)l denote the state and the control vectors at time t, respectively, and parameters n and l are the dimension of the state and control vectors. Smooth functions fc:nn and gc:nn×l are the system dynamics and the initial states are given by x(0). Given reference signal r(t)n for t[0tf], where the initial time is selected as zero and the final time is denoted by tf, the objective

Convergence theorems

The proposed algorithm for Finite-SNAC training is based on DHP, that is, it learns the optimal costate vector through a successive approximation scheme. Starting with an initial value for the costate vector one iterates to converge to the optimal costate. Here, the proof of convergence of the weights using the training scheme given in Algorithm 1 to the optimal weights is given.

Theorem 1

Selecting any finite initial guess on Wk0 for k{0,1,,N1}, there exists some sampling time Δt for discretization of

A modification on the Finite-SNAC for optimal tracking of a family of reference signals

The finite-time optimal tracking control at each time step k is a function of the system states, the reference signal, and the time-to-go. As seen in (9), the network structure is selected such that only the system state is fed to the network, however, dependency of the control on the reference signal and the time-to-go are being learned by the time-step dependent weight matrix. This synthesis is suitable for cases where the reference signal is fixed and, hence, the network can be trained

Numerical analysis

Example 1

As the first example, the NN structure given in (9) is simulated based on Algorithm 1. A nonlinear benchmark system, namely Van der Pol's oscillator is selected with the dynamics of

ẋ1=x2ẋ2=(1x12)x2x1+uwhere xi, i=1,2, denotes the state vector elements. The reference trajectory to be tracked is selected as r(t)=[r1(t)r2(t)]T which is generated using the dynamics ofṙ(t)=[sin(πt)πcos(πt)]and the initial condition of r(0)=[00]T. The fixed horizon is selected as 3s and the sampling time of Δt=

Conclusions

An approximate dynamics programing based neurocontroller is developed for the optimal tracking control of nonlinear systems and the proofs are given for the convergence of the weights and the optimality of the results. The controller does not have the limitations of the cited optimal tracking controllers and can learn to track either a given reference trajectory or a family of trajectories which share the same dynamics, e.g., different trajectories resulted from different initial conditions.

Acknowledgment

This work was partially support by a Grant from the National Science Foundation.

Ali Heydari received his Ph.D. degree in mechanical engineering from the Missouri University of Science and Technology in 2013. He is currently an Assistant Professor of Mechanical Engineering at the South Dakota School of Mines and Technology. He was the recipient of the Outstanding M.Sc. Thesis Award from the Iranian Aerospace Society, the Best Student Paper Runner-Up Award from the AIAA Guidance, Navigation and Control Conference, and the Outstanding Graduate Teaching Award from the Academy

References (45)

  • P.J. Werbos

    Approximate dynamic programming for real-time control and Neural modeling

  • S.N. Balakrishnan et al.

    Adaptive-critic based neural networks for aircraft optimal control

    J. Guidance, Control Dyn.

    (1996)
  • D.V. Prokhorov et al.

    Adaptive critic designs

    IEEE Trans. Neural Networks

    (1997)
  • A. Al-Tamimi et al.

    Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof

    IEEE Trans. Syst., Man, Cybern. B, Cybern.

    (2008)
  • S. Ferrari et al.

    Online adaptive critic flight control

    J. Guidance, Control Dyn.

    (2004)
  • G.K. Venayagamoorthy et al.

    Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator

    IEEE Trans. Neural Networks

    (2002)
  • X. Liu, S.N. Balakrishnan, Convergence analysis of adaptive critic based optimal control, in: Procedings of the...
  • J. Ding, S.N. Balakrishnan, An online nonlinear optimal controller synthesis for aircraft with model uncertainties, in:...
  • T. Dierks, S. Jagannathan, Optimal control of affine nonlinear continuous-time systems, in: Proceedings of the American...
  • D.E. Kirk

    Optimal Control Theory: An Introduction

    (2004)
  • J. Liang

    Optimal magnetic attitude control of small spacecraft

    (2005)
  • S.R. Vadali et al.

    Optimal finite-time feedback controllers for nonlinear systems with terminal constraints

    J. Guidance, Control, Dyn.

    (2006)
  • Cited by (42)

    • Sub-optimal tracking in switched systems with fixed final time and fixed mode sequence using reinforcement learning

      2021, Neurocomputing
      Citation Excerpt :

      In the lower level, one finds the optimal policies as (16). In this section, a SNAC solution introduced in [19] is adapted to solve the optimal tracking in switched systems. The main idea is using function approximators to approximate/predict the costates.

    • Optimization of sampling intervals for tracking control of nonlinear systems: A game theoretic approach

      2019, Neural Networks
      Citation Excerpt :

      Adaptive critic framework, commonly referred to as actor-critic in ADP/RL literature, is widely used to solve the optimal control problem where the critic NN is used to approximate the optimal value function and the actor NN is employed to approximate the optimal control policy. A great deal of research results under ADP/RL schemes, such as, value iteration (Bertsekas et al., 1995), policy iteration(Bertsekas, 2017), online learning (Dierks & Jagannathan, 2009), robust ADP (Wang, Liu, Zhang, & Li, 2018), and data driven ADP (Zhang, Cui, Zhang, & Luo, 2011) are available in the literature for both optimal regulation (Heydari, 0000) and tracking (Heydari & Balakrishnan, 2014; Kamalapurkar, Dinh, Bhasin, & Dixon, 2015; Modares & Lewis, 2014; Mu, Sun, Song, & Yu, 2016; Wang et al., 2018; Yang & He, 2018; Zhang et al., 2011) control problems. For the tracking control problem, the authors in Mu et al. (2016) presented an iterative globalized dual heuristic programming (GDHP) based solution for discrete time systems.

    View all citing articles on Scopus

    Ali Heydari received his Ph.D. degree in mechanical engineering from the Missouri University of Science and Technology in 2013. He is currently an Assistant Professor of Mechanical Engineering at the South Dakota School of Mines and Technology. He was the recipient of the Outstanding M.Sc. Thesis Award from the Iranian Aerospace Society, the Best Student Paper Runner-Up Award from the AIAA Guidance, Navigation and Control Conference, and the Outstanding Graduate Teaching Award from the Academy of Mechanical and Aerospace Engineers at Missouri S&T. His research interests include optimal control, nonlinear control, approximate dynamic programming, and control of hybrid and switching systems.

    S. N. Balakrishnan received his Ph.D. degree in Aerospace Engineering from the University of Texas in Austin. He is currently Curators' Professor of Aerospace Engineering in the Department of Mechanical and Aerospace Engineering at Missouri University of Science Technology in Rolla, Missouri. His research interests include neural networks, optimal control, estimation, nonlinear control, control of large scale systems and impulse driven systems. His papers mainly deal with development of algorithms in control and estimation and applications to aircrafts, missile, spacecraft, launch vehicles, robots, manufacturing and other interesting systems.

    View full text