Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics
Introduction
The optimal control problem, in which the objective is to develop a control strategy which minimises a given cost function subject to the dynamics of a given system, is commonly-considered in the control field. The foundation of many solutions to this problem is Bellman’s Optimality Principle [1], and the solution of the Hamilton–Jacobi–Bellman (HJB) Equation. Solving the HJB Equation, however, generally requires precise knowledge of the dynamics, and often does not have a closed form solution. Although these algorithms have their place, in real world systems exact knowledge of the dynamics is often not available. Furthermore, the dynamics can also undergo changes over time, and these changes are more pronounced in some systems than in others.
Within this paper, the well-studied continuous-time Finite Horizon (FH) Linear Quadratic Regulator (LQR) problem is considered. Many engineered systems are often posed in this FHLQR form, where the objective is to minimise a cost function quadratic in both error in state and control effort, over a given finite time period, subject to linear time-varying dynamics. The finite duration of time given in many of the practical specified tasks lends itself to the finite horizon of the controller, which provides an explicit mechanism to trade off the accuracy of task completion with the effort we are willing to spend to achieve it. If the dynamics are known, the optimal control scheme for this problem can be calculated using the Differential Riccati Equation (DRE). This cannot be calculated if the dynamics are unknown. Furthermore, if an inaccurate model is used, or if the dynamics changes between iterations (either slowly, for example due to wear and tear, or suddenly due to a part failing), the control scheme becomes suboptimal. Such inaccurate or unknown dynamics can also be found in the examples of engineered systems; and is exceptionally pronounced among complex biological systems [2] which motivated this work.
As such, this paper proposes an algorithm which solves for the optimal control gain without requiring the knowledge of the dynamics. The proposed method utilises an iterative process to compute the optimal gain matrix using measured state trajectories, using the results of an iterative solution to the DRE proposed in [3]. By not assuming the knowledge of the system dynamics, the proposed algorithm can therefore re-identify an optimal control strategy should the plant dynamics change. The algorithm proposed here is a direct method, where the optimal gain matrix is computed directly from measurements. Such methods are generally simpler than indirect methods, which identify first dynamics and then compute the optimal gain matrix as seen in [[4], [5]].
The proposed algorithm is posed as the solution to the Finite Horizon Linear Quadratic Regulator problem for Linear Time-Varying (LTV) systems. Some solutions to similar problems have been proposed in the literature. In [6], a linear time invariant (LTI) Infinite Horizon Linear Quadratic Regulator problem was investigated. The present paper takes a similar approach, with the major difference being that [6] considers infinite horizon, time invariant dynamics and no terminal cost. The algorithm is proposed as an Adaptive Dynamic Programming (ADP) technique, which utilises successive estimates of the value function to estimate the optimal control law. [7] and [8] provide good reviews of existing ADP techniques. Although the algorithm proposed in the present paper follows a similar structure, the authors have chosen not to describe it as ADP, as value function is not explicitly used to estimate the optimal control gain.
Other approaches also exist for the FHLQR problem, but with discrete-time dynamics—[9] proposes an adaptive algorithm for the discrete-time FHLQR problem with constant dynamics while [10] uses an extremum-seeking iterative approach to find an open-loop control sequence for the discrete-time FHLQR problem with time-varying dynamics.
Similar problems also exist in Iterative Learning Control, such as the Linear Quadratic Optimal Learning Control [11] and the norm-optimal iterative learning control [[12], [13]], where optimal performance is sought over a finite horizon of each iteration. However, these algorithms seek an optimal control trajectory for the given task, as opposed to an optimal control law for a family of tasks characterised by the given cost function. Fundamentally, they require an identical initial conditions for each iteration, and only work when the optimal trajectory is identical for all iterations.
The present paper therefore proposes an algorithm to handle the continuous-time case, where no knowledge of the system dynamics is required, other than that they are linear and time-varying.
For any , . For any , is its induced matrix norm. The set consisting of all continuous functions defined over over for any is denoted . For any , .
For a given , and a given , can be written as with and , where where represents the element of in the th row and th column and represents the th element in vector .
Furthermore, , where and , and can be written as with where
Section snippets
Problem formulation
The systems of interest take the following form: where , and dynamics matrices and .
The objective of the FHLQR problem is to minimise the following cost function, subject to the dynamic system (5): where is the resulting trajectory from the dynamics (5), is symmetric positive semidefinite (), is also symmetric positive
Preliminaries
Kleinman [3, Theorem 8, page 53] proposed a method of iteratively solving the DRE offline, using a dynamic programming approach. This section introduces the algorithm and two properties of this algorithm are identified to be later used in the analysis.
Proposed algorithm
This section presents an algorithm which iteratively solves the continuous-time FHLQR problem without requiring the explicit knowledge of the dynamics of the system. A high level overview of the proposed algorithm is first presented, followed by more a detailed analysis of each iteration of the outer loop, and finally the convergence of the algorithm to the optimal control gain.
Simulations
In this section, the performance of the algorithm is shown through a number of simulations. In particular, the simulations will illustrate (1) the convergence of the algorithm to near optimal; (2) potential difficulties in achieving excitation when smaller is chosen; and (3) the convergence of the algorithm to closer to optimal when a smaller is chosen, however, at increased computational cost.
Conclusion
This paper presents an algorithm which addresses the Finite Horizon Linear Quadratic Regulator problem, without the need for explicit knowledge of the dynamics of the system. This algorithm utilises a two-loop structure, in which the inner loop (index ) is used to gather information about the system, and the outer loop (index ) is used to make successive approximations of the optimal control gain. This structure may also potentially be used in applications in which the dynamics are slowly
Acknowledgment
This work is supported by Australian Research Council grants DP160104018 and DP130100849.
References (17)
- et al.
Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics
Automatica
(2012) - et al.
Finite-horizon LQ control for unknown discrete-time linear systems via extremum seeking
Eur. J. Control
(2013) - et al.
On the design of ILC algorithms using optimization
Automatica
(2001) - et al.
Model-based iterative learning control with a quadratic criterion for time-varying linear systems
Automatica
(2000) - et al.
A converse lyapunov theorem for discrete-time systems with disturbances
Systems Control Lett.
(2002) Dynamic Programming
(1957)- et al.
Learning control in robot-assisted rehabilitation of motor skills–a review
J. Control Decis.
(2016) Suboptimal Design of Linear Regulator Systems Subject To Computer Storage Limitations
(1967)
Cited by (26)
Learning-based optimal control of linear time-varying systems over large time intervals
2024, Systems and Control LettersNonlinear disturbance observer-based robust predefined time tracking and vibration suppression control for the rigid-flexible coupled robotic mechanisms with large beam-deformations
2023, Computers and Mathematics with ApplicationsReinforcement learning for adaptive optimal control of continuous-time linear periodic systems
2020, AutomaticaCitation Excerpt :Nevertheless, most of the existing ADP algorithms assume time-invariant systems, and relatively less results are known for time-varying systems. Adaptive optimal control of linear time-varying systems is studied in Fong, Tan, Crocher, Oetomo, and Mareels (2018) and Pang, Bian, and Jiang (2019), for the continuous-time case and the discrete-time case, respectively. But the optimal control problems considered in Fong et al. (2018) and Pang et al. (2019) are finite-horizon, where no stability issue arises.
Optimized under-actuated control of blade vibration system under wind uncertainty
2020, Journal of Sound and VibrationCitation Excerpt :The linearized aeroservoelastic model is constituted by linearized structural model in Eqs. (16-18), linearized aerodynamic model in Eqs. (19-21), linearized interactions in Eqs. (23-27) and pitch actuator model in Eq. (13). Linear quadratic regulator can handle the state feedback control and the control constraints [30,31]. Instead of conventional full state feedback control, a partial state controller using modified LQR is designed, so as to suppress blade vibration motions in multiple DOF by single pitch input.
Direct Data-Driven Control of Linear Time-Varying Systems
2023, IEEE Transactions on Automatic Control