Elsevier

Systems & Control Letters

Volume 62, Issue 2, February 2013, Pages 97-103
Systems & Control Letters

Approximate finite-horizon optimal control without PDEs

https://doi.org/10.1016/j.sysconle.2012.08.015Get rights and content

Abstract

The problem of controlling the state of a system, from a given initial condition, during a fixed time interval minimizing at the same time a criterion of optimality is commonly referred to as finite-horizon optimal control problem. One of the standard approaches to the finite-horizon optimal control problem relies upon the solution of the Hamilton–Jacobi–Bellman (HJB) partial differential equation, which may be difficult or impossible to obtain in closed-form. Herein we propose a methodology to avoid the explicit solution of the HJB pde exploiting a dynamic extension. This results in a dynamic time-varying state feedback yielding an approximate solution to the finite-horizon optimal control problem.

Introduction

The problem of controlling the state of a system during a desired time interval, which is generally set a priori, and to minimize, along the trajectory of the resulting closed-loop system, a criterion of optimality is crucial in control system applications [1], [2], [3]. The problem informally defined above is commonly referred to as finite-horizon optimal control problem.

Two different approaches are available in the literature to solve the problem, namely the Minimum Principle and the Dynamic Programming approach [4], [2], [5], [6], [7], [8]. The former hinges upon the definition of the Hamiltonian associated to the optimal control problem that must be minimized by the optimal control law. The latter is based on the principle of optimality [1], which formalizes the intuitive requirement that a truncation of the optimal control law must be optimal with respect to the resulting truncated problem. Obviously, each approach has its own advantages and drawbacks.

The solution relying on the Minimum Principle–which provides only necessary conditions for optimality–yields in general an open-loop control law defined in terms of an adjoint state, or costate, satisfying an ordinary differential equation with a boundary condition imposed on the value of the costate at the terminal time. Therefore, determining the trajectories of the state and the costate consists in finding the solution of a two-point boundary value problem.

The Dynamic Programming approach, on the other hand, provides necessary and sufficient conditions for optimality and the resulting control policy is a memoryless time-varying feedback. This methodology hinges upon the computation of the solution of the Hamilton–Jacobi–Bellman partial differential equation, solution which may be in general difficult or impossible to compute in closed-form.

The main contribution of this paper is a method to construct dynamically, i.e. by means of a dynamic extension, an exact solution of a (modified) HJB pde for input-affine nonlinear systems without actually solving any partial differential equation. The extended closed-loop system is a system of ordinary differential equations with two-point boundary conditions. However, differently from the Minimum Principle, if an approximate solution is sought then the solution of the two-point boundary value problem can be avoided and the cost of this approximation can be explicitly quantified. A preliminary version of this work has been published in [9].

The rest of the paper is organized as follows. In Section 2 the definition of the problem is given and the basic notation is introduced. A notion of solution of the Hamilton–Jacobi–Bellman partial differential equation is provided in Section 3. The design of a dynamic time-varying state feedback that approximates the solution of the finite-horizon optimal control problem is presented in Section 4. The paper is completed by a numerical example and by some conclusions in Sections 5 Numerical example, 6 Conclusions, respectively.

Section snippets

Definition of the problem

Consider a nonlinear system described by equations of the form ẋ=f(x)+g(x)u, where x(t)Rn is the state of the system, while u(t)Rm denotes the control input. The mappings f:RnRn and g:RnRn×m are assumed to be sufficiently smooth. Moreover suppose that f(0)=0, hence there exists a non-unique continuous matrix-valued function F:RnRn×n such that f(x)=F(x)x for all xRn. The finite-horizon optimal control problem consists in determining a control input u that minimizes the cost functional1

Algebraic solution and value function

In this section a notion of solution of the Hamilton–Jacobi–Bellman equation (3) is presented, see [10], [11], [12] for a similar notation in the case T=+. To this end, consider the augmented system ż=F(z)+G(z)u, with z(t)=(x(t),τ(t))Rn+1, F(z)=(f(x),1) and G(z)=(g(x),0). Note that the partial differential equation (3) can be rewritten as Vz(z)F(z)+12q(x)12Vz(z)G(z)G(z)Vz(z)=0. Following [11], [12], consider the HJB equation (13) and suppose that it can be solved algebraically, as

Main results

We first present the main ideas of the proposed approach in the case of linear systems. To this end consider the system (10) and the cost (2) with q(x)=xQx. From the definition of algebraic P̄ solution of the Eq. (13), we expect p to approximate (in the sense defined in (14)) the partial derivative of the value function with respect to the state x, whereas r represents the partial derivative of V with respect to time. Therefore, in the linear case an algebraic P̄ solution is given by P(τ,x)=[x

Numerical example

To illustrate the results of the paper consider the nonlinear system ẋ1=x2,ẋ2=x1x2+u, with x(t)=(x1(t),x2(t))R2, u(t)R and the cost J(u)=120Tu(t)2dt+12[x1(T)2+x2(T)2]. Note that no running cost is imposed on the state of the system, hence only the position of the state at the terminal time is penalized, together with the control effort. Let P̄(τ)=[p̄11(τ)p̄12(τ)p̄12(τ)p̄22(τ)] be the solution of the differential Riccati equation (11) with the boundary condition P̄(T)=I. Note that the

Conclusions

The finite-horizon optimal control problem for input-affine nonlinear systems is studied within the framework of dynamic programming. It is shown that the explicit computation of the solution of the Hamilton–Jacobi–Bellman partial differential equation can be avoided provided an additional cost is paid. The methodology makes use of a dynamic extension yielding a dynamic time-varying control law that solves the approximate regional dynamic finite-horizon optimal control problem. The closed-loop

References (13)

  • R.E. Bellman

    Dynamic Programming

    (1957)
  • D.P. Bertsekas

    Dynamic Programming and Optimal Control—Volume I

    (2005)
  • A.E. Bryson et al.

    Applied Optimal Control: Optimization, Estimation, and Control

    (1975)
  • B.D.O. Anderson et al.

    Optimal Control: Linear Quadratic Methods

    (1989)
  • G. Knowles

    An Introduction to Applied Optimal Control

    (1981)
  • L.S. Pontryagin et al.

    The Mathematical Theory of Optimal Processes

    (1962)
There are more references available in the full text version of this article.

Cited by (14)

View all citing articles on Scopus

This work is partially supported by the Austrian Center of Competence in Mechatronics (ACCM), by the EPSRC Programme Grant Control for Energy and Sustainability EP/G066477 and by the MIUR under PRIN Project Advanced Methods for Feedback Control of Uncertain Nonlinear Systems.

View full text