Brief paperInverse optimal control for discrete-time finite-horizon Linear Quadratic Regulators☆
Introduction
In nature, agents tend to behave and choose their actions in an optimal way, for instance, animal gaits (Alexander, 1984), the timing to hatch queens in bee hives (Alexander, 1996), the way people buy and sell stocks. All of these “optimal behaviors” can be seen as solutions to some specific optimal control problems. The goal of a classical optimal control problem is to find the optimal control input as well as the optimal trajectory when the cost function, system dynamics, and initial conditions are given. In contrast, the objective of an inverse optimal control problem is to “reverse engineer” the cost function, given observations of optimal trajectories or control inputs, for known system dynamics. The cost function is of great interest because it can help one to understand how individuals make decisions and take actions. Moreover, knowledge of an agent’s cost function may enable one to predict the future behavior of the agent. The problem of inverse optimal control was first proposed by Kalman (1964), and has found a multitude of applications (Berret and Jean, 2016, Finn et al., 2016, Mombaur et al., 2010).
This paper is concerned with inverse optimal control for discrete-time Linear Quadratic Regulators (LQRs) over finite time-horizons, i.e., finding the parameters in the quadratic objective function given the discrete-time linear system dynamics and (possibly noisy) observations of the optimal trajectory or control input.
Inverse optimal control for LQR, particularly in the continuous infinite time-horizon case, has been studied by a number of authors (Anderson and Moore, 2007, Fujii, 1987, Jameson and Kreindler, 1973). They assume the optimal feedback gain is known exactly and focus on recovering the objective function. It was shown in Boyd, El Ghaoui, Feron, and Balakrishnan (1994) that the search for matrices and can be formulated with Linear Matrix Inequalities (LMI) when the feedback gain is known. In contrast to these three works, our aim is to recover the objective function using observed (possibly noisy) trajectories, instead of assuming to be known exactly. In addition, our focus is on discrete-time finite horizon LQRs, whose control gain matrix is time variant. This makes the problem harder to tackle. Priess, Conway, Choi, Popovich, and Radcliffe (2015) consider the discrete infinite time-horizon case with noisy observations, in which the optimal feedback gain is time-invariant. Their approach is to identify the feedback matrix and solve for and similar to the method proposed in Boyd et al. (1994). In the finite-time horizon case, the optimal feedback gain is time-variant, and such an approach is not applicable. Furthermore, the idea of “identify the feedback gain , then compute the corresponding ” suffers from the huge number of parameters in the identification stage, i.e., the number of ’s is proportional to the length of the time-horizon. In addition, such identification does not use the knowledge that the ’s are generated by an LQR.
The contributions of this paper are three-fold. First, we justify the well-posedness of the inverse optimal control problem for LQRs in Section 2. Second, in the noiseless case (in which observations of the optimal trajectory are exact) we provide sufficient conditions for consistent estimation of the cost function, i.e., exact recovery of the matrix , c.f., Section 3. Moreover, inspired by the formulation in Aswani, Shen, and Siddiq (2015), we formulate the search for as an optimization problem in the noisy case (in which observations of the optimal trajectory as well as the control input are corrupted by additive noise). We further prove that such formulation is statistically consistent, c.f., Section 4. The proposed method is demonstrated via a number of simulation studies, in which a better performance than the method proposed in Keshavarz, Wang, and Boyd (2011) is observed, c.f. Section 5. Conclusions are offered in Section 6.
The topic of inverse optimal control has received considerable attention in the literature. Much of the focus, especially in recent years, has been on systems with nonlinear dynamics (of which the LQR problem is a special-case). The authors of Johnson, Aghasadeghi, and Bretl (2013) consider the case of continuous finite time-horizon. They analyze the optimality conditions for the optimal control problem, and propose a method to minimize the violation of these conditions. Our focus is the inverse optimal control problem for discrete-time LQRs and we address the conditions for well-posedness, identifiability in noiseless case and propose a statistically consistent method for recovery in noisy case, which are not addressed in Johnson et al. (2013). Similar ideas are used in Bertsimas, Gupta, and Paschalidis (2015) and Keshavarz et al. (2011) for the discrete-time finite horizon case (since in this case, the problem can be also interpreted as an inverse optimization problem). The optimization problems proposed in the above methods are numerically tractable, nevertheless, it has been pointed out by Aswani et al. (2015) that the approaches used in Bertsimas et al. (2015) and Keshavarz et al. (2011) are not statistically consistent and sensitive to observation noise. Aswani et al. (2015) present a statistically consistent formulation, but results in a difficult optimization problem. Molloy, Ford, and Perez (2018) and Molloy, Tsai, Ford, and Perez (2016) also consider the discrete finite time-horizon case. They consider Pontryagin’s Maximum Principle (PMP) for the optimal control problem and pose an optimization problem whose constraints are two of the three conditions of PMP; they then minimize the residual of the third PMP condition. In addition, they assume the optimal control input is known exactly while in our case, the optimal control input can be corrupted by noise. The question of identifiability in the noiseless case, i.e. uniqueness of the solution, for this approach is also addressed therein. In a very recent work (Jin, Kulić, Mou, & Hirche, 2018), the authors consider the discrete-time inverse optimal control problem for nonlinear systems when some segments of the trajectories and input observations are missing. The problem of identifiability in the noiseless case is also discussed therein, however, the sufficient conditions for identifiability presented in these three works involve a data-rated matrix rank criterion, but the existence of data that satisfies such sufficient conditions is not addressed. In the worst-case, such rank conditions may not be satisfiable, irrespective of the problem data (i.e. observed trajectories). In contrast, we state conditions for the existence of data that in turn satisfies sufficient conditions for identifiability. On the other hand, we also discuss well-posedness of the inverse optimal control problem. In Hatz, Schloder, and Bock (2012), the continuous finite time-horizon case is considered. The authors formulate the problem as a hierarchical nonlinear constrained optimization problem and two approaches that are based on PMP and Karush–Kuhn–Tucker (KKT) conditions are proposed. The idea is to replace the inner-layer of the hierarchical optimization problem, i.e., the original optimal control problem with PMP or KKT conditions, hence making the problem tractable. Similarly, Pauwels, Henrion, and Lasserre (2016) and Rouot and Lasserre (2017) also consider the continuous finite time-horizon case, in which the inverse optimal control problem is studied in the framework of Hamilton–Jacobi–Bellman (HJB) equation. The problem is formulated as a polynomial optimization problem, and solved by a hierarchy of semidefinite relaxations.
In this paper, our focus is discrete-time finite time-horizon LQR. This problem set-up is more restrictive than the aforementioned works; however, this extra structure allows us to give precise answers to the questions of well-posedness and identifiability. Such questions are not addressed in the more general works cited above, though heuristic regularization techniques are proposed in Pauwels et al. (2016) and Rouot and Lasserre (2017) to avoid undesirable solutions. Furthermore, our focus on the LQR setting allows us to treat the noisy case in a principled way; problems involving noisy observations are not considered in the more general works of Hatz et al., 2012, Pauwels et al., 2016 and Rouot and Lasserre (2017) .
In the remainder of the paper, denotes the cone of dimensional positive semi-definite matrices and denotes the set of all dimensional symmetric matrices. denotes the Frobenius norm of a matrix. denotes the norm of a vector. It holds that , and is a Hilbert space whose inner-product is defined by and . We denote as the Kronecker product and , denote vectorization and half-vectorization respectively. It holds that , where and is the duplication matrix. It holds that , where , , . For a sequence of vectors , where , we abbreviate the notation as when there is no risk of confusion. And we denote as , where . The ’th row of matrix is denoted as .
Section snippets
Problem formulation and well-posedness
The “forward” optimal LQ problem reads where are -dimensional positive semidefinite matrices, is -dimensional positive definite matrix, and . The inverse optimal control problem aims to find given , the initial value and (possibly noisy) observations of the optimal trajectory or control input . For simplicity, in this paper, we consider the case of and . In addition, it
Inverse optimal control in the noiseless case
After justifying the well-posedness of the inverse optimal control problem, in this section, we consider inverse optimal control for the LQR problem in the noiseless case. It is assumed that we have knowledge of sets of optimal trajectories , i.e., , where is the optimal feedback gain. We omit the superscript “star” in the remainder of this section to shorten the notation.
By PMP, if and are the optimal control and corresponding
Inverse optimal control in the noisy case
Now we turn our attention to the noisy case. Inspired by Aswani et al. (2015), we first pose the inverse optimal control problem in the noisy case. Suppose the probability space carries independent random vectors , and distributed according to some unknown distributions. The following assumptions are made in the remainder of the paper:
Assumption 1 , and , .
Assumption 2 , such that , where
Numerical examples
To illustrate the performance of the estimation statistically, we consider a series of discrete-time systems sampled from continuous systems with the sampling period , where and are sampled from uniform distributions on . The aim for us to generate systems like this is to unsure the controllability of the systems. We take the time horizon . The “real” is generated by letting where each elements of are sampled from the uniform
Conclusion
In this paper, we have analyzed the inverse optimal control problem for discrete-time LQR in finite-time horizons, considering both the noiseless case (in which observations of the optimal trajectories are exact) and the noisy case (in which such observations are corrupted by additive noise). The well-posedness of this problem, i.e., uniqueness of for a given closed-loop dynamics, was first proved. In the noiseless case, we discussed identifiability of the problem, and provided sufficient
Han Zhang received both this B.S. and M.S. degrees from the Dept. of Automation in Shanghai Jiaotong University in 2011 and 2014 respectively. He is a Ph.D. student of Dept. of Mathematics in KTH Royal Institute of Technology since 2014. His main research interests are control of multi-agent systems, distributed optimization and optimal control.
References (26)
- et al.
Finite-horizon inverse optimal control for discrete-time nonlinear systems
Automatica
(2018) The gaits of bipedal and quadrupedal animals
International Journal of Robotics Research
(1984)Optima for animals
(1996)- et al.
Complementarity and nondegeneracy in semidefinite programming
Mathematical Programming
(1997) - et al.
Optimal control: linear quadratic methods
(2007) - et al.
Inverse optimization with noisy data
(2015) - et al.
Why don’t we move slower? The value of time in the neural control of action
Journal of Neuroscience
(2016) - et al.
Data-driven estimation in equilibrium using inverse optimization
Mathematical Programming
(2015) - et al.
Linear matrix inequalities in system and control theory (vol. 15)
(1994) - Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization....
A new approach to the LQ design from the viewpoint of the inverse regulator problem
IEEE Transactions on Automatic Control
Constrained control and estimation: an optimisation approach
Estimating parameters in optimal control problems
SIAM Journal on Scientific Computing
Cited by (35)
Inverse Kalman filtering problems for discrete-time systems
2024, AutomaticaInverse optimal control for averaged cost per stage linear quadratic regulators
2024, Systems and Control LettersControl Input Inference of Mobile Agents under Unknown Objective
2023, IFAC-PapersOnLineInverse linear quadratic dynamic games using partial state observations
2022, AutomaticaCitation Excerpt :For the infinite-horizon inverse problem, the corresponding state feedback gain matrix is constant so that the inverse problem is inherently a convex optimization problem which can be solved using the linear matrix inequality (LMI) method (Boyd, El Ghaoui, Feron, & Balakrishnan, 1994; Priess et al., 2014). However, for the finite-horizon identification problem (Fan, Wan, & Jiang, 2022; Yu, Li, Fang, & Chen, 2021; Zhang, Umenberger, & Hu, 2019), the state feedback gain matrix is time-varying which is governed by a backward Riccati equation. For the finite-horizon scenario, the prior knowledge about the coefficient matrix in the terminal state term is crucial for developing an identification approach.
Han Zhang received both this B.S. and M.S. degrees from the Dept. of Automation in Shanghai Jiaotong University in 2011 and 2014 respectively. He is a Ph.D. student of Dept. of Mathematics in KTH Royal Institute of Technology since 2014. His main research interests are control of multi-agent systems, distributed optimization and optimal control.
Jack Umenberger received the B.E. (Hons 1) and Ph.D. degrees in Mechatronics Engineering from The University of Sydney, Australia, in 2012 and 2017, respectively. He is currently a postdoctoral researcher at Uppsala University, Sweden. His research interests include system identification, robust adaptive and dual control, and reinforcement learning.
Xiaoming Hu received the B.S. degree from the University of Science and Technology of China in 1983, and the M.S. and Ph.D. degrees from the Arizona State University in 1986 and 1989 respectively. He served as a research assistant at the Institute of Automation, the Chinese Academy of Sciences, from 1983 to 1984. From 1989 to 1990 he was a Gustafsson Postdoctoral Fellow at the Royal Institute of Technology, Stockholm, where he is currently a Professor of Optimization and Systems Theory. His main research interests are in nonlinear control systems, nonlinear observer design, sensing and active perception, motion planning, control of multi-agent systems, and mobile manipulation.
- ☆
This work is supported partly by China Scholarship Council . The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Delin Chu under the direction of Editor Ian R Petersen