Elsevier

Systems & Control Letters

Volume 111, January 2018, Pages 49-57
Systems & Control Letters

Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics

https://doi.org/10.1016/j.sysconle.2017.11.002Get rights and content

Abstract

Achieving optimal performance over a finite-time horizon has gained a lot of attention in many engineering applications. Among them, the Finite Horizon Linear Quadratic Regulator (FHLQR) formulation for continuous-time linear time-varying systems has been well studied, with an optimal solution characterised by the Differential Riccati Equation (DRE). The solution of the DRE requires that the exact system dynamics are known. However, this assumption may not always hold, as the plant model might not be completely known or may change over time due to wear and tear. This paper proposes a dual-loop iterative algorithm to find the optimal solutions of the FHLQR formulation for continuous-time LTV systems. The inner loop utilises input trajectories based on an estimate of the optimal control gain with the addition of some excitation noise, and produces measured state trajectories. The outer loop improves the estimate of the optimal control gain utilising these measured state trajectories. It is shown in this work that with appropriate selection of the discretisation parameter T and the set of excitation signals, the proposed dual-loop iterative algorithm can converge to an arbitrarily small neighbourhood of the optimal solution. A simulation example demonstrates the effectiveness of the proposed method.

Introduction

The optimal control problem, in which the objective is to develop a control strategy which minimises a given cost function subject to the dynamics of a given system, is commonly-considered in the control field. The foundation of many solutions to this problem is Bellman’s Optimality Principle [1], and the solution of the Hamilton–Jacobi–Bellman (HJB) Equation. Solving the HJB Equation, however, generally requires precise knowledge of the dynamics, and often does not have a closed form solution. Although these algorithms have their place, in real world systems exact knowledge of the dynamics is often not available. Furthermore, the dynamics can also undergo changes over time, and these changes are more pronounced in some systems than in others.

Within this paper, the well-studied continuous-time Finite Horizon (FH) Linear Quadratic Regulator (LQR) problem is considered. Many engineered systems are often posed in this FHLQR form, where the objective is to minimise a cost function quadratic in both error in state and control effort, over a given finite time period, subject to linear time-varying dynamics. The finite duration of time given in many of the practical specified tasks lends itself to the finite horizon of the controller, which provides an explicit mechanism to trade off the accuracy of task completion with the effort we are willing to spend to achieve it. If the dynamics are known, the optimal control scheme for this problem can be calculated using the Differential Riccati Equation (DRE). This cannot be calculated if the dynamics are unknown. Furthermore, if an inaccurate model is used, or if the dynamics changes between iterations (either slowly, for example due to wear and tear, or suddenly due to a part failing), the control scheme becomes suboptimal. Such inaccurate or unknown dynamics can also be found in the examples of engineered systems; and is exceptionally pronounced among complex biological systems [2] which motivated this work.

As such, this paper proposes an algorithm which solves for the optimal control gain without requiring the knowledge of the dynamics. The proposed method utilises an iterative process to compute the optimal gain matrix using measured state trajectories, using the results of an iterative solution to the DRE proposed in [3]. By not assuming the knowledge of the system dynamics, the proposed algorithm can therefore re-identify an optimal control strategy should the plant dynamics change. The algorithm proposed here is a direct method, where the optimal gain matrix is computed directly from measurements. Such methods are generally simpler than indirect methods, which identify first dynamics and then compute the optimal gain matrix as seen in [[4], [5]].

The proposed algorithm is posed as the solution to the Finite Horizon Linear Quadratic Regulator problem for Linear Time-Varying (LTV) systems. Some solutions to similar problems have been proposed in the literature. In [6], a linear time invariant (LTI) Infinite Horizon Linear Quadratic Regulator problem was investigated. The present paper takes a similar approach, with the major difference being that [6] considers infinite horizon, time invariant dynamics and no terminal cost. The algorithm is proposed as an Adaptive Dynamic Programming (ADP) technique, which utilises successive estimates of the value function to estimate the optimal control law. [7] and [8] provide good reviews of existing ADP techniques. Although the algorithm proposed in the present paper follows a similar structure, the authors have chosen not to describe it as ADP, as value function is not explicitly used to estimate the optimal control gain.

Other approaches also exist for the FHLQR problem, but with discrete-time dynamics—[9] proposes an adaptive algorithm for the discrete-time FHLQR problem with constant dynamics while [10] uses an extremum-seeking iterative approach to find an open-loop control sequence for the discrete-time FHLQR problem with time-varying dynamics.

Similar problems also exist in Iterative Learning Control, such as the Linear Quadratic Optimal Learning Control [11] and the norm-optimal iterative learning control [[12], [13]], where optimal performance is sought over a finite horizon of each iteration. However, these algorithms seek an optimal control trajectory for the given task, as opposed to an optimal control law for a family of tasks characterised by the given cost function. Fundamentally, they require an identical initial conditions for each iteration, and only work when the optimal trajectory is identical for all iterations.

The present paper therefore proposes an algorithm to handle the continuous-time case, where no knowledge of the system dynamics is required, other than that they are linear and time-varying.

For any xRn, |x|=xTx. For any ARn×m, |A| is its induced matrix norm. The set consisting of all continuous functions defined over Rn×m over [t0,tf] for any n,mN is denoted Cn×m[t0,tf]. For any A()Cn×m[t0,tf], Asn×m=maxt0ttf|A(t)|.

For a given xRn, and a given V=VTRn×n, xTVx can be written as x̄Tv̄ with x̄Rn(n+1)2 and v̄Rn(n+1)2, where v̄=[V11,2V12,,.2V1n,V22,2V23,,2V2n,,Vn1,n1,2Vn1,n,Vnn]T,x̄=[x12,x1x2,,x1xn,x22,x2x3,,x2xn,,xn12,xn1xn,xn2]T,where Vij represents the element of VRn×n in the ith row and jth column and xi represents the ith element in vector x.

Furthermore, yTKx, where xRm and yRn, and KRn×m can be written as (xy)k with xyRnm where xy=[x1y1,x1y2,,x1ym,x2y1,,x2ym,,xny1,,xnym1,xnym]T,k=[K11,K21,,Kn1,K12,K22,,Kn2,,K1m,,Knm]T.

Section snippets

Problem formulation

The systems of interest take the following form: ẋ(t)=A(t)x(t)+B(t)u(t),x(t0)=x0,where x(t)Rn, u(t)Rm and dynamics matrices A()Cn×n[t0,tf] and B()Cn×m[t0,tf].

The objective of the FHLQR problem is to minimise the following cost function, subject to the dynamic system (5): J(u())=xT(tf)Φfx(tf)+t0tfxT(t)Q(t)x(t)+uT(t)R(t)u(t)dt,where x is the resulting trajectory from the dynamics (5), ΦfRn×n is symmetric positive semidefinite (Φf=ΦfT0), Q()Cn×n[t0,tf] is also symmetric positive

Preliminaries

Kleinman [3, Theorem 8, page 53] proposed a method of iteratively solving the DRE offline, using a dynamic programming approach. This section introduces the algorithm and two properties of this algorithm are identified to be later used in the analysis.

Proposed algorithm

This section presents an algorithm which iteratively solves the continuous-time FHLQR problem without requiring the explicit knowledge of the dynamics of the system. A high level overview of the proposed algorithm is first presented, followed by more a detailed analysis of each iteration of the outer loop, and finally the convergence of the algorithm to the optimal control gain.

Simulations

In this section, the performance of the algorithm is shown through a number of simulations. In particular, the simulations will illustrate (1) the convergence of the algorithm to near optimal; (2) potential difficulties in achieving excitation when smaller T is chosen; and (3) the convergence of the algorithm to closer to optimal when a smaller T is chosen, however, at increased computational cost.

Conclusion

This paper presents an algorithm which addresses the Finite Horizon Linear Quadratic Regulator problem, without the need for explicit knowledge of the dynamics of the system. This algorithm utilises a two-loop structure, in which the inner loop (index j) is used to gather information about the system, and the outer loop (index k) is used to make successive approximations of the optimal control gain. This structure may also potentially be used in applications in which the dynamics are slowly

Acknowledgment

This work is supported by Australian Research Council grants DP160104018 and DP130100849.

References (17)

There are more references available in the full text version of this article.

Cited by (26)

  • Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems

    2020, Automatica
    Citation Excerpt :

    Nevertheless, most of the existing ADP algorithms assume time-invariant systems, and relatively less results are known for time-varying systems. Adaptive optimal control of linear time-varying systems is studied in Fong, Tan, Crocher, Oetomo, and Mareels (2018) and Pang, Bian, and Jiang (2019), for the continuous-time case and the discrete-time case, respectively. But the optimal control problems considered in Fong et al. (2018) and Pang et al. (2019) are finite-horizon, where no stability issue arises.

  • Optimized under-actuated control of blade vibration system under wind uncertainty

    2020, Journal of Sound and Vibration
    Citation Excerpt :

    The linearized aeroservoelastic model is constituted by linearized structural model in Eqs. (16-18), linearized aerodynamic model in Eqs. (19-21), linearized interactions in Eqs. (23-27) and pitch actuator model in Eq. (13). Linear quadratic regulator can handle the state feedback control and the control constraints [30,31]. Instead of conventional full state feedback control, a partial state controller using modified LQR is designed, so as to suppress blade vibration motions in multiple DOF by single pitch input.

  • Direct Data-Driven Control of Linear Time-Varying Systems

    2023, IEEE Transactions on Automatic Control
View all citing articles on Scopus
View full text