Elsevier

Automatica

Volume 110, December 2019, 108593
Automatica

Brief paper
Inverse optimal control for discrete-time finite-horizon Linear Quadratic Regulators

https://doi.org/10.1016/j.automatica.2019.108593Get rights and content

Abstract

In this paper, we consider the inverse optimal control problem for discrete-time Linear Quadratic Regulators (LQR), over finite-time horizons. Given observations of the optimal trajectories, or optimal control inputs, to a linear time-invariant system, the goal is to infer the parameters that define the quadratic cost function. The well-posedness of the inverse optimal control problem is first justified. In the noiseless case, when these observations are exact, we analyze the identifiability of the problem and provide sufficient conditions for uniqueness of the solution. In the noisy case, when the observations are corrupted by additive zero-mean noise, we formulate the problem as an optimization problem and prove that the solution to this problem is statistically consistent. The performance of the proposed method is illustrated through numerical examples.

Introduction

In nature, agents tend to behave and choose their actions in an optimal way, for instance, animal gaits (Alexander, 1984), the timing to hatch queens in bee hives (Alexander, 1996), the way people buy and sell stocks. All of these “optimal behaviors” can be seen as solutions to some specific optimal control problems. The goal of a classical optimal control problem is to find the optimal control input as well as the optimal trajectory when the cost function, system dynamics, and initial conditions are given. In contrast, the objective of an inverse optimal control problem is to “reverse engineer” the cost function, given observations of optimal trajectories or control inputs, for known system dynamics. The cost function is of great interest because it can help one to understand how individuals make decisions and take actions. Moreover, knowledge of an agent’s cost function may enable one to predict the future behavior of the agent. The problem of inverse optimal control was first proposed by Kalman (1964), and has found a multitude of applications (Berret and Jean, 2016, Finn et al., 2016, Mombaur et al., 2010).

This paper is concerned with inverse optimal control for discrete-time Linear Quadratic Regulators (LQRs) over finite time-horizons, i.e., finding the parameters in the quadratic objective function given the discrete-time linear system dynamics and (possibly noisy) observations of the optimal trajectory or control input.

Inverse optimal control for LQR, particularly in the continuous infinite time-horizon case, has been studied by a number of authors (Anderson and Moore, 2007, Fujii, 1987, Jameson and Kreindler, 1973). They assume the optimal feedback gain K is known exactly and focus on recovering the objective function. It was shown in Boyd, El Ghaoui, Feron, and Balakrishnan (1994) that the search for matrices Q and R can be formulated with Linear Matrix Inequalities (LMI) when the feedback gain K is known. In contrast to these three works, our aim is to recover the objective function using observed (possibly noisy) trajectories, instead of assuming K to be known exactly. In addition, our focus is on discrete-time finite horizon LQRs, whose control gain matrix Kt is time variant. This makes the problem harder to tackle. Priess, Conway, Choi, Popovich, and Radcliffe (2015) consider the discrete infinite time-horizon case with noisy observations, in which the optimal feedback gain K is time-invariant. Their approach is to identify the feedback matrix K and solve for Q and R similar to the method proposed in Boyd et al. (1994). In the finite-time horizon case, the optimal feedback gain Kt is time-variant, and such an approach is not applicable. Furthermore, the idea of “identify the feedback gain Kt, then compute the corresponding Q” suffers from the huge number of parameters in the identification stage, i.e., the number of Kt’s is proportional to the length of the time-horizon. In addition, such identification does not use the knowledge that the Kt’s are generated by an LQR.

The contributions of this paper are three-fold. First, we justify the well-posedness of the inverse optimal control problem for LQRs in Section 2. Second, in the noiseless case (in which observations of the optimal trajectory are exact) we provide sufficient conditions for consistent estimation of the cost function, i.e., exact recovery of the matrix Q, c.f., Section 3. Moreover, inspired by the formulation in Aswani, Shen, and Siddiq (2015), we formulate the search for Q as an optimization problem in the noisy case (in which observations of the optimal trajectory as well as the control input are corrupted by additive noise). We further prove that such formulation is statistically consistent, c.f., Section 4. The proposed method is demonstrated via a number of simulation studies, in which a better performance than the method proposed in Keshavarz, Wang, and Boyd (2011) is observed, c.f. Section 5. Conclusions are offered in Section 6.

The topic of inverse optimal control has received considerable attention in the literature. Much of the focus, especially in recent years, has been on systems with nonlinear dynamics (of which the LQR problem is a special-case). The authors of Johnson, Aghasadeghi, and Bretl (2013) consider the case of continuous finite time-horizon. They analyze the optimality conditions for the optimal control problem, and propose a method to minimize the violation of these conditions. Our focus is the inverse optimal control problem for discrete-time LQRs and we address the conditions for well-posedness, identifiability in noiseless case and propose a statistically consistent method for recovery in noisy case, which are not addressed in Johnson et al. (2013). Similar ideas are used in Bertsimas, Gupta, and Paschalidis (2015) and Keshavarz et al. (2011) for the discrete-time finite horizon case (since in this case, the problem can be also interpreted as an inverse optimization problem). The optimization problems proposed in the above methods are numerically tractable, nevertheless, it has been pointed out by Aswani et al. (2015) that the approaches used in Bertsimas et al. (2015) and Keshavarz et al. (2011) are not statistically consistent and sensitive to observation noise. Aswani et al. (2015) present a statistically consistent formulation, but results in a difficult optimization problem. Molloy, Ford, and Perez (2018) and Molloy, Tsai, Ford, and Perez (2016) also consider the discrete finite time-horizon case. They consider Pontryagin’s Maximum Principle (PMP) for the optimal control problem and pose an optimization problem whose constraints are two of the three conditions of PMP; they then minimize the residual of the third PMP condition. In addition, they assume the optimal control input is known exactly while in our case, the optimal control input can be corrupted by noise. The question of identifiability in the noiseless case, i.e. uniqueness of the solution, for this approach is also addressed therein. In a very recent work (Jin, Kulić, Mou, & Hirche, 2018), the authors consider the discrete-time inverse optimal control problem for nonlinear systems when some segments of the trajectories and input observations are missing. The problem of identifiability in the noiseless case is also discussed therein, however, the sufficient conditions for identifiability presented in these three works involve a data-rated matrix rank criterion, but the existence of data that satisfies such sufficient conditions is not addressed. In the worst-case, such rank conditions may not be satisfiable, irrespective of the problem data (i.e. observed trajectories). In contrast, we state conditions for the existence of data that in turn satisfies sufficient conditions for identifiability. On the other hand, we also discuss well-posedness of the inverse optimal control problem. In Hatz, Schloder, and Bock (2012), the continuous finite time-horizon case is considered. The authors formulate the problem as a hierarchical nonlinear constrained optimization problem and two approaches that are based on PMP and Karush–Kuhn–Tucker (KKT) conditions are proposed. The idea is to replace the inner-layer of the hierarchical optimization problem, i.e., the original optimal control problem with PMP or KKT conditions, hence making the problem tractable. Similarly, Pauwels, Henrion, and Lasserre (2016) and Rouot and Lasserre (2017) also consider the continuous finite time-horizon case, in which the inverse optimal control problem is studied in the framework of Hamilton–Jacobi–Bellman (HJB) equation. The problem is formulated as a polynomial optimization problem, and solved by a hierarchy of semidefinite relaxations.

In this paper, our focus is discrete-time finite time-horizon LQR. This problem set-up is more restrictive than the aforementioned works; however, this extra structure allows us to give precise answers to the questions of well-posedness and identifiability. Such questions are not addressed in the more general works cited above, though heuristic regularization techniques are proposed in Pauwels et al. (2016) and Rouot and Lasserre (2017) to avoid undesirable solutions. Furthermore, our focus on the LQR setting allows us to treat the noisy case in a principled way; problems involving noisy observations are not considered in the more general works of Hatz et al., 2012, Pauwels et al., 2016 and Rouot and Lasserre (2017) .

In the remainder of the paper, S+n denotes the cone of n dimensional positive semi-definite matrices and Sn denotes the set of all n dimensional symmetric matrices. F denotes the Frobenius norm of a matrix. denotes the l2 norm of a vector. It holds that S+nSnRn×n, and Rn×n is a Hilbert space whose inner-product is defined by (G1|G2)=tr(G1TG2) and (G1|G1)=G1F2. We denote as the Kronecker product and vec(), vech() denote vectorization and half-vectorization respectively. It holds that vec(G)=Dvech(G), where GSn and D is the duplication matrix. It holds that vec(G1G2G3)=(G3TG1)vec(G2), where G1Rk×l, G2Rl×m, G2Rm×n. For a sequence of vectors {xt(i)}t=1:Ni=1:M, where xt(i)Rn, we abbreviate the notation as x1:N(1:M) when there is no risk of confusion. And we denote vec(x1:N(1:M)) as [vec(x1:N(1))T,,vec(x1:N(M))T]T, where vec(x1:N(i))=[x1(i)T,,xN(i)T]T. The i’th row of matrix G is denoted as [G]i.

Section snippets

Problem formulation and well-posedness

The “forward” optimal LQ problem reads minx1:N,u1:N1J=xNTSxN+t=1N1utTRut+xtTQxts.t. xt+1=Axt+But,x1=x̄, where S,Q are n-dimensional positive semidefinite matrices, R is m-dimensional positive definite matrix, xtRn and utRm. The inverse optimal control problem aims to find (S,Q,R) given (A,B), the initial value x1=x̄ and (possibly noisy) observations of the optimal trajectory x2:N or control input u1:N1. For simplicity, in this paper, we consider the case of R=I and S=0. In addition, it

Inverse optimal control in the noiseless case

After justifying the well-posedness of the inverse optimal control problem, in this section, we consider inverse optimal control for the LQR problem in the noiseless case. It is assumed that we have knowledge of M sets of optimal trajectories {x1:N(i),u1:N1(i)}i=1M, i.e., ut(i)=Ktxt(i), where Kt is the optimal feedback gain. We omit the superscript “star” in the remainder of this section to shorten the notation.

By PMP, if u1:N1 and x1:N are the optimal control and corresponding

Inverse optimal control in the noisy case

Now we turn our attention to the noisy case. Inspired by Aswani et al. (2015), we first pose the inverse optimal control problem in the noisy case. Suppose the probability space (Ω,F,P) carries independent random vectors x̄Rn, {vtRn}t=2N and {wtRn}t=1N1 distributed according to some unknown distributions. The following assumptions are made in the remainder of the paper:

Assumption 1

E(x̄2)<+, E(vt)=E(wt1)=0 and E(vt2)<+, E(wt12)<+,t=2:N.

Assumption 2

ηRn, r(η)R{0} such that Px̄ε(rη)>0,ε>0, where ε(r

Numerical examples

To illustrate the performance of the estimation statistically, we consider a series of discrete-time systems sampled from continuous systems ẋ=Aˆx+Bˆu with the sampling period Δt=0.1, where Aˆ=01a1a2,B=01; and a1,a2 are sampled from uniform distributions on [3,3]. The aim for us to generate systems like this is to unsure the controllability of the systems. We take the time horizon N=50. The “real” Q̄ is generated by letting Q̄=Q1Q1T where each elements of Q1R2 are sampled from the uniform

Conclusion

In this paper, we have analyzed the inverse optimal control problem for discrete-time LQR in finite-time horizons, considering both the noiseless case (in which observations of the optimal trajectories are exact) and the noisy case (in which such observations are corrupted by additive noise). The well-posedness of this problem, i.e., uniqueness of Q for a given closed-loop dynamics, was first proved. In the noiseless case, we discussed identifiability of the problem, and provided sufficient

Han Zhang received both this B.S. and M.S. degrees from the Dept. of Automation in Shanghai Jiaotong University in 2011 and 2014 respectively. He is a Ph.D. student of Dept. of Mathematics in KTH Royal Institute of Technology since 2014. His main research interests are control of multi-agent systems, distributed optimization and optimal control.

References (26)

  • MolloyT.L. et al.

    Finite-horizon inverse optimal control for discrete-time nonlinear systems

    Automatica

    (2018)
  • AlexanderR.M.

    The gaits of bipedal and quadrupedal animals

    International Journal of Robotics Research

    (1984)
  • AlexanderR.M.

    Optima for animals

    (1996)
  • AlizadehF. et al.

    Complementarity and nondegeneracy in semidefinite programming

    Mathematical Programming

    (1997)
  • AndersonB.D. et al.

    Optimal control: linear quadratic methods

    (2007)
  • AswaniA. et al.

    Inverse optimization with noisy data

    (2015)
  • BerretB. et al.

    Why don’t we move slower? The value of time in the neural control of action

    Journal of Neuroscience

    (2016)
  • BertsimasD. et al.

    Data-driven estimation in equilibrium using inverse optimization

    Mathematical Programming

    (2015)
  • BoydS. et al.

    Linear matrix inequalities in system and control theory (vol. 15)

    (1994)
  • Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization....
  • FujiiT.

    A new approach to the LQ design from the viewpoint of the inverse regulator problem

    IEEE Transactions on Automatic Control

    (1987)
  • GoodwinG. et al.

    Constrained control and estimation: an optimisation approach

    (2006)
  • HatzK. et al.

    Estimating parameters in optimal control problems

    SIAM Journal on Scientific Computing

    (2012)
  • Cited by (35)

    • Inverse linear quadratic dynamic games using partial state observations

      2022, Automatica
      Citation Excerpt :

      For the infinite-horizon inverse problem, the corresponding state feedback gain matrix is constant so that the inverse problem is inherently a convex optimization problem which can be solved using the linear matrix inequality (LMI) method (Boyd, El Ghaoui, Feron, & Balakrishnan, 1994; Priess et al., 2014). However, for the finite-horizon identification problem (Fan, Wan, & Jiang, 2022; Yu, Li, Fang, & Chen, 2021; Zhang, Umenberger, & Hu, 2019), the state feedback gain matrix is time-varying which is governed by a backward Riccati equation. For the finite-horizon scenario, the prior knowledge about the coefficient matrix in the terminal state term is crucial for developing an identification approach.

    View all citing articles on Scopus

    Han Zhang received both this B.S. and M.S. degrees from the Dept. of Automation in Shanghai Jiaotong University in 2011 and 2014 respectively. He is a Ph.D. student of Dept. of Mathematics in KTH Royal Institute of Technology since 2014. His main research interests are control of multi-agent systems, distributed optimization and optimal control.

    Jack Umenberger received the B.E. (Hons 1) and Ph.D. degrees in Mechatronics Engineering from The University of Sydney, Australia, in 2012 and 2017, respectively. He is currently a postdoctoral researcher at Uppsala University, Sweden. His research interests include system identification, robust adaptive and dual control, and reinforcement learning.

    Xiaoming Hu received the B.S. degree from the University of Science and Technology of China in 1983, and the M.S. and Ph.D. degrees from the Arizona State University in 1986 and 1989 respectively. He served as a research assistant at the Institute of Automation, the Chinese Academy of Sciences, from 1983 to 1984. From 1989 to 1990 he was a Gustafsson Postdoctoral Fellow at the Royal Institute of Technology, Stockholm, where he is currently a Professor of Optimization and Systems Theory. His main research interests are in nonlinear control systems, nonlinear observer design, sensing and active perception, motion planning, control of multi-agent systems, and mobile manipulation.

    This work is supported partly by China Scholarship Council . The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Delin Chu under the direction of Editor Ian R Petersen

    View full text