Elsevier

Automatica

Volume 87, January 2018, Pages 412-420
Automatica

Brief paper
Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance

https://doi.org/10.1016/j.automatica.2017.03.013Get rights and content

Abstract

This paper proposes a new model-free event-triggered optimal control algorithm for continuous-time linear systems. The problem is formulated as an infinite-horizon optimal adaptive learning problem, and we are able to simultaneously address the issue of designing a control and a triggering mechanism with guaranteed optimal performance by design. In order to provide a model-free solution, we adopt a Q-learning framework with a critic network to approximate the optimal cost and a zero-order hold actor network to approximate the optimal control. Since we have dynamics that evolve in continuous and discrete-time, we write the closed-loop system as an impulsive model and prove asymptotic stability of its equilibrium. Numerical simulation of an unknown unstable system is presented to show the efficacy of the proposed approach.

Introduction

In conventional implementations of control systems, control tasks are usually executed by periodically sampling the plant’s output and updating the control inputs. Selection of sampling period is traditionally performed during the control design stage, having in mind the trade off between the reconstruction of the continuous-time signal and the load on the computer (Astrom & Willenmark, 1997). In many applications, however, conventional design methods may not represent an efficient solution. In network control systems where communication channels are shared with multiple and possibly remotely located sensors, actuators and controllers (Hespanha, Naghshtabrizi, & Xu, 2007), periodic and high frequency sampling, computation and transmission of data could result in inefficient use of resources of bandwidth and energy.

In systems with limited bandwidth, event-triggered control (Astrom and Bernhardsson, 1999, Heemels et al., 2012, Tabuada, 2007) and self-triggered control (Anta and Tabuada, 2010, Gommans et al., 2014, Wang and Lemmon, 2009) represent two emerging control strategies that have been shown to be suitable for reducing the communication between actuators/sensors, and the controller. This is attained by letting the system evolve in open-loop and only closing the loop whenever a user designed triggering condition that guarantees stability and performance is satisfied. Sparse communication and less computation could result in decongestion of the network channels and energy save for devices.

While the event-triggered control and the self-triggered control literature continues to flourish, two fundamental issues remain overshadowed, as pointed out in Gommans et al. (2014): the co-design of both the feedback law and the triggering scheme; and the performance guarantees by design for the proposed algorithm. So far, to the best knowledge of the authors, only a few approaches have tried to simultaneously address these points. For example, in Gommans et al. (2014), an optimal self-triggered control with discounted quadratic cost function for discrete time linear systems is proposed. It is shown that in some cases the sparse communication strategy could outperform the traditional periodic time-triggered one. In Peng and Yang (2013), the authors explore another approach to the problem of the co-design with anH performance index for network control systems with communication delays and packet losses. However, these methods rely on an offline computation of the Riccati or Hamilton–Jacobi–Bellman equation and depend on the full knowledge of the system dynamics, being vulnerable to exhaustive modeling and malicious attacks.

The combination of optimal control theory (Lewis & Syrmos, 1995) and adaptive control theory (Ioannou & Fidan, 2006) can be brought together with the use of ideas from reinforcement learning (Sutton & Barto, 1998) to overcome these issues. Approximate dynamic programming has been shown to be a powerful tool to solve reinforcement learning problems in an adaptive way and also to guarantee optimal performance, (Busoniu et al., 2010, Powell, 2007, Vrabie et al., 2012, Zhang et al., 2012). Actor/critic algorithms are a form of reinforcement learning (Sutton & Barto, 1998) which uses an actor structure to select the control policies to improve the performance and a critic structure to evaluate actor’s decisions.

Recently, some studies have tried to combine event-triggered control algorithms for problems where the system dynamics is unknown. One of the earliest attempts was presented in Arzen (1999) and further developed in Durand and Marchand (2009) and in Wang, Mounier, Cela, and Niculescu (2011) where the authors used a PID type of controller with event-based updates. Despite its inherently simple structure and easiness to tune, these types of controllers do not provide any optimality guarantees. As of event-triggered control algorithms for unknown systems with optimality guarantees, Sahoo, Xu, and Jagannathan (2017) presents an algorithm where a neural network based identifier is used to approximate the unknown nonlinear continuous-time system. The resulting closed-loop signals are locally ultimately bounded and the controller is near-optimal. For the cases where the full state information is not available, Zhong and He (2017) proposes a scheme combining neural network based observer with an optimal event-triggered control algorithm. As a result of the approximation, the stability result obtained is local and the closed-loop signals are also ultimately bounded.

In our previous work, in Vamvoudakis (2014), we have derived a novel optimal adaptive event triggered control algorithm for known nonlinear systems by using an approach based on Hamiltonians. This did not enable us to define a model-free approach. For that reason, in this paper, we derive a novel model-free approach based on Q-learning while also guaranteeing that the Zeno behavior is excluded. Q-learning is a model-free reinforcement learning (Bertsekas and Tsitsiklis, 1996, Busoniu et al., 2010, Powell, 2007, Sutton and Barto, 1998, Zhang et al., 2012) technique primarily developed for discrete-time systems where an optimal action is selected based on previous state and actions observations (Watkins & Dayan, 1992). It learns an action-dependent value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. When such an action-dependent value function is learned, then the optimal policy can be computed easily. The biggest strength of Q-learning is that it does not require a model of the system to be controlled.

Main results: The contributions of the paper are threefold. We first show that the optimal event-triggered control policy is suboptimal with respect to the time-triggered optimal control one. Further, in order to derive a scheme that is independent of the system matrices, we use Q-learning and derive appropriate tuning laws to learn the newly proposed Q-function. Specifically, we use an actor/critic structure that adaptively tunes and approximates the optimal event-triggered controller and the Q-function, respectively, to solve the problem online and forward in time. Finally, by using an impulsive systems model for the closed-loop system, we prove that the equilibrium point of the flow and the jump dynamics is globally asymptotically stable.

Structure: This paper is structured as follows. Section  2 formulates the infinite horizon optimal control problem, Section  3 provides a brief background on the optimal control solution and the relationship between the optimal time and even-triggered control policies. Since Section  3 relies on complete knowledge of the system matrices, Section  4 provides a model-free formulation based on a Q-learning approach and an actor/critic structure to estimate the parameters of the Q-function. Rigorous Lyapunov based proof of asymptotic stability is provided and the existence of a positive lower bound for the inter-event times is shown. Numerical simulations are presented in Section  5 to show the efficacy of the proposed algorithm and finally Section  6 concludes and talks about future work.

Notation: The notation used here is standard.R+ is the set of positive real numbers. We denoteλ¯(A) andλ̄(A) as the minimum and maximum, respectively, eigenvalues of a matrixA. Also, denotes the Euclidean norm for a vector and the Frobenius norm for a matrix. The Kronecker product is represented by and the half-vectorization,vech(A), of a symmetricn×n matrixA is then(n+1)/2 column vector obtained by vectorizing only the lower (or upper) triangular part ofA,vech(A)=[A1,1An,1A2,2An,2An1,n1An1,nAn,n]T andmat() is the inversevech operation known as matrization. The colon symbol: can be used to form implicit vectors from a matrix or vector, i.e. A(j:k),k>j is[A(j)A(j+1)A(k)]. A functionκ:R+R is said to belong to classK functions if it is continuous, strictly increasing andκ(0)=0. Through out this work, we will use the closed-loop impulsive system formulation as in Haddad, Chellaboina, and Nersesov (2006) and Hespanha, Liberzon, and Teel (2008) defined as follows: {ψ̇=f(ψ),t(rj,rj+1]ψ+=g(ψ),t=rj, where{rj}j=0 is a monotonically increasing sequence of sampling instants withrj thejth consecutive sampling instant satisfyinglimjrj=; the stateψRnψ is continuous between the sampling instants;f andg are the flow and the jump dynamics, respectively, and fromRnψ toRnψ. We denote by()+ the right-limit operator, i.e.,ψ+=limstψ(s).

Section snippets

Problem formulation

Consider the following linear time invariant continuous-time system, ẋ(t)=Ax(t)+Bu(t),x(0)=x0,t0 wherex(t)Rn is a measurable state vector,u(t)Rm is the control input andARn×n,BRn×m are the plant and input matrices, respectively, that will be considered uncertain/unknown.

To save resources, the controller will work with a sampled version of the state defined as follows: xˆ(t)={x(rj),t(rj,rj+1]x(t),t=rj. The controller maps the sampled state onto a control vector which after using a

Connection between the time-triggered and the event-triggered LQR

One can define the time triggered Hamiltonian associated with (1), (3) with controlleruc as follows: H(x,uc,Vx)=VxT(Ax+Buc)+12xTHx+12ucTRuc,x,uc. After employing the stationarity condition, in the Hamiltonian (4), i.e. H(x,uc,Vx)uc=0, the time-triggered optimal control can be found to be ucuc(x)=argminucH(x,uc,Vx)=R1BTVx,x.

Assumption 1

We assume that the pair(A,B) is stabilizable and the pair(H,A) is detectable. 

Since the system (1) is linear, we can represent the time-triggered

Model free formulation

The value function (6) needs to be parameterized as a function of the statex and the controlud to represent the Q-function. We can write the following Q-function or action-dependent valueQ(x,ud):Rn+mR+, Q(x,ud)V(x)+H(x,ud,Vx)H(x,uc,V(x)x)=V(x)+12xTP(Ax+Bud)+12(Ax+Bud)TPx+12udTRud+12xTHx,x,ud, whereH(x,uc,Vx)=0, and the optimal time-triggered cost isV(x)=xTPx.

The Q-function (12) can be written in a compact quadratic form in the statex and controlud as follows: Q(x,ud)=12UT[P+H+P

Simulation

In order to show the effectiveness of the proposed model-free event-triggered control algorithm, we shall use an example adopted from Tabuada (2007). Consider the following 2nd order unstable linear system of the form,ẋ=[0123]x+[01]u, with performance matrices picked usH=11I andR=1 withI a second order identity matrix. The triggering parameter is selected asβ=0.6, the constants are picked asL=10,L1=2,T=0.01s and the tuning gains for the critic and the actor approximators areαa=0.5 andαc=20,

Conclusion

In this work, we presented a novel control algorithm that combines ideas from event-triggered control, optimal control and Q-learning theories. We formulate the problem of control under sparse communication as an optimization problem and we use ideas from integral reinforcement learning to write the Q-function as a parametrization of the state and the actions. Since Q-learning needs to store a large amount of data throughout learning, we used an approximate dynamic programming framework which

Kyriakos G. Vamvoudakis was born in Athens, Greece. He received the diploma (a 5 year degree, equivalent to a master of science) in electronic and computer engineering from Technical University of Crete, Greece in 2006 with highest honors. After moving to the United States of America, he studied at The University of Texas and received his M.S. and Ph.D. in electrical engineering in 2008 and 2011, respectively. During the period from 2012 to 2016, he was a project research scientist at the

References (28)

  • Durand, S., & Marchand, N. (2009). Further results on event-based PID controller. In Control conference (ECC) European...
  • W.M. Haddad et al.

    Impulsive and hybrid dynamical systems: Stability, dissipativity, and control

    (2006)
  • Heemels, W.P.M.H., Johansson, K.H., & Tabuada, P. (2012). An introduction to event-triggered and self-triggered...
  • Hespanha, J., Naghshtabrizi, P., & Xu, Y. (2007). A survey of recent results in networked control systems. In Proc....
  • Cited by (0)

    Kyriakos G. Vamvoudakis was born in Athens, Greece. He received the diploma (a 5 year degree, equivalent to a master of science) in electronic and computer engineering from Technical University of Crete, Greece in 2006 with highest honors. After moving to the United States of America, he studied at The University of Texas and received his M.S. and Ph.D. in electrical engineering in 2008 and 2011, respectively. During the period from 2012 to 2016, he was a project research scientist at the Center for Control, Dynamical Systems and Computation at the University of California, Santa Barbara. He is now an assistant professor at the Kevin T. Crofton Department of Aerospace and Ocean Engineering at Virginia Tech. His research interests have focused on game-theoretic control, network security, smart grid and multi-agent optimization.

    He is the recipient of several international awards including the 2016 International Neural Network Society Young Investigator (INNS) Award, the Best Paper Award for Autonomous/Unmanned Vehicles at the 27th Army Science Conference in 2010, in the Best Presentation Award at the World Congress of Computational Intelligence in 2010, and the Best Researcher Award from the Automation and Robotics Research Institute in 2011.

    He is a coauthor of one patent, more than 90 technical publications, and two books. He currently is an associate editor of the Journal of Optimization Theory and Applications, an associate editor of Control Theory and Technology, a registered electrical/computer engineer (PE) and a member of the Technical Chamber of Greece. He is a senior member of IEEE.

    Henrique Ferraz received his B.S. degree in control and automation engineering and his M.S. degree in electrical engineering, both from Federal University of Rio de Janeiro, Brazil in 2009 and 2012. Currently, he is pursuing his Ph.D. degree in the area of control systems, in the Department of Electrical and Computer Engineering at the University of California, Santa Barbara. His research interests include network control systems, estimation theory, optimization, and learning.

    The work was partially supported by a Virginia Tech startup fund and by a CAPES   BEX 1111-13-1 grant. The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Akira Kojima under the direction of Editor Ian R. Petersen.

    View full text