Brief paperModel-free event-triggered control algorithm for continuous-time linear systems with optimal performance☆
Introduction
In conventional implementations of control systems, control tasks are usually executed by periodically sampling the plant’s output and updating the control inputs. Selection of sampling period is traditionally performed during the control design stage, having in mind the trade off between the reconstruction of the continuous-time signal and the load on the computer (Astrom & Willenmark, 1997). In many applications, however, conventional design methods may not represent an efficient solution. In network control systems where communication channels are shared with multiple and possibly remotely located sensors, actuators and controllers (Hespanha, Naghshtabrizi, & Xu, 2007), periodic and high frequency sampling, computation and transmission of data could result in inefficient use of resources of bandwidth and energy.
In systems with limited bandwidth, event-triggered control (Astrom and Bernhardsson, 1999, Heemels et al., 2012, Tabuada, 2007) and self-triggered control (Anta and Tabuada, 2010, Gommans et al., 2014, Wang and Lemmon, 2009) represent two emerging control strategies that have been shown to be suitable for reducing the communication between actuators/sensors, and the controller. This is attained by letting the system evolve in open-loop and only closing the loop whenever a user designed triggering condition that guarantees stability and performance is satisfied. Sparse communication and less computation could result in decongestion of the network channels and energy save for devices.
While the event-triggered control and the self-triggered control literature continues to flourish, two fundamental issues remain overshadowed, as pointed out in Gommans et al. (2014): the co-design of both the feedback law and the triggering scheme; and the performance guarantees by design for the proposed algorithm. So far, to the best knowledge of the authors, only a few approaches have tried to simultaneously address these points. For example, in Gommans et al. (2014), an optimal self-triggered control with discounted quadratic cost function for discrete time linear systems is proposed. It is shown that in some cases the sparse communication strategy could outperform the traditional periodic time-triggered one. In Peng and Yang (2013), the authors explore another approach to the problem of the co-design with an performance index for network control systems with communication delays and packet losses. However, these methods rely on an offline computation of the Riccati or Hamilton–Jacobi–Bellman equation and depend on the full knowledge of the system dynamics, being vulnerable to exhaustive modeling and malicious attacks.
The combination of optimal control theory (Lewis & Syrmos, 1995) and adaptive control theory (Ioannou & Fidan, 2006) can be brought together with the use of ideas from reinforcement learning (Sutton & Barto, 1998) to overcome these issues. Approximate dynamic programming has been shown to be a powerful tool to solve reinforcement learning problems in an adaptive way and also to guarantee optimal performance, (Busoniu et al., 2010, Powell, 2007, Vrabie et al., 2012, Zhang et al., 2012). Actor/critic algorithms are a form of reinforcement learning (Sutton & Barto, 1998) which uses an actor structure to select the control policies to improve the performance and a critic structure to evaluate actor’s decisions.
Recently, some studies have tried to combine event-triggered control algorithms for problems where the system dynamics is unknown. One of the earliest attempts was presented in Arzen (1999) and further developed in Durand and Marchand (2009) and in Wang, Mounier, Cela, and Niculescu (2011) where the authors used a PID type of controller with event-based updates. Despite its inherently simple structure and easiness to tune, these types of controllers do not provide any optimality guarantees. As of event-triggered control algorithms for unknown systems with optimality guarantees, Sahoo, Xu, and Jagannathan (2017) presents an algorithm where a neural network based identifier is used to approximate the unknown nonlinear continuous-time system. The resulting closed-loop signals are locally ultimately bounded and the controller is near-optimal. For the cases where the full state information is not available, Zhong and He (2017) proposes a scheme combining neural network based observer with an optimal event-triggered control algorithm. As a result of the approximation, the stability result obtained is local and the closed-loop signals are also ultimately bounded.
In our previous work, in Vamvoudakis (2014), we have derived a novel optimal adaptive event triggered control algorithm for known nonlinear systems by using an approach based on Hamiltonians. This did not enable us to define a model-free approach. For that reason, in this paper, we derive a novel model-free approach based on Q-learning while also guaranteeing that the Zeno behavior is excluded. Q-learning is a model-free reinforcement learning (Bertsekas and Tsitsiklis, 1996, Busoniu et al., 2010, Powell, 2007, Sutton and Barto, 1998, Zhang et al., 2012) technique primarily developed for discrete-time systems where an optimal action is selected based on previous state and actions observations (Watkins & Dayan, 1992). It learns an action-dependent value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. When such an action-dependent value function is learned, then the optimal policy can be computed easily. The biggest strength of Q-learning is that it does not require a model of the system to be controlled.
Main results: The contributions of the paper are threefold. We first show that the optimal event-triggered control policy is suboptimal with respect to the time-triggered optimal control one. Further, in order to derive a scheme that is independent of the system matrices, we use Q-learning and derive appropriate tuning laws to learn the newly proposed Q-function. Specifically, we use an actor/critic structure that adaptively tunes and approximates the optimal event-triggered controller and the Q-function, respectively, to solve the problem online and forward in time. Finally, by using an impulsive systems model for the closed-loop system, we prove that the equilibrium point of the flow and the jump dynamics is globally asymptotically stable.
Structure: This paper is structured as follows. Section 2 formulates the infinite horizon optimal control problem, Section 3 provides a brief background on the optimal control solution and the relationship between the optimal time and even-triggered control policies. Since Section 3 relies on complete knowledge of the system matrices, Section 4 provides a model-free formulation based on a Q-learning approach and an actor/critic structure to estimate the parameters of the Q-function. Rigorous Lyapunov based proof of asymptotic stability is provided and the existence of a positive lower bound for the inter-event times is shown. Numerical simulations are presented in Section 5 to show the efficacy of the proposed algorithm and finally Section 6 concludes and talks about future work.
Notation: The notation used here is standard. is the set of positive real numbers. We denote and as the minimum and maximum, respectively, eigenvalues of a matrix. Also, denotes the Euclidean norm for a vector and the Frobenius norm for a matrix. The Kronecker product is represented by and the half-vectorization,, of a symmetric matrix is the column vector obtained by vectorizing only the lower (or upper) triangular part of and is the inverse operation known as matrization. The colon symbol: can be used to form implicit vectors from a matrix or vector, i.e. is. A function is said to belong to class functions if it is continuous, strictly increasing and. Through out this work, we will use the closed-loop impulsive system formulation as in Haddad, Chellaboina, and Nersesov (2006) and Hespanha, Liberzon, and Teel (2008) defined as follows: where is a monotonically increasing sequence of sampling instants with theth consecutive sampling instant satisfying; the state is continuous between the sampling instants; and are the flow and the jump dynamics, respectively, and from to. We denote by the right-limit operator, i.e.,.
Section snippets
Problem formulation
Consider the following linear time invariant continuous-time system, where is a measurable state vector, is the control input and are the plant and input matrices, respectively, that will be considered uncertain/unknown.
To save resources, the controller will work with a sampled version of the state defined as follows: The controller maps the sampled state onto a control vector which after using a
Connection between the time-triggered and the event-triggered LQR
One can define the time triggered Hamiltonian associated with (1), (3) with controller as follows: After employing the stationarity condition, in the Hamiltonian (4), i.e. , the time-triggered optimal control can be found to be
Assumption 1 We assume that the pair is stabilizable and the pair is detectable. □
Model free formulation
The value function (6) needs to be parameterized as a function of the state and the control to represent the Q-function. We can write the following Q-function or action-dependent value, where, and the optimal time-triggered cost is.
The Q-function (12) can be written in a compact quadratic form in the state and control as follows:
Simulation
In order to show the effectiveness of the proposed model-free event-triggered control algorithm, we shall use an example adopted from Tabuada (2007). Consider the following 2nd order unstable linear system of the form,, with performance matrices picked us and with a second order identity matrix. The triggering parameter is selected as, the constants are picked as and the tuning gains for the critic and the actor approximators are and,
Conclusion
In this work, we presented a novel control algorithm that combines ideas from event-triggered control, optimal control and Q-learning theories. We formulate the problem of control under sparse communication as an optimization problem and we use ideas from integral reinforcement learning to write the Q-function as a parametrization of the state and the actions. Since Q-learning needs to store a large amount of data throughout learning, we used an approximate dynamic programming framework which
Kyriakos G. Vamvoudakis was born in Athens, Greece. He received the diploma (a 5 year degree, equivalent to a master of science) in electronic and computer engineering from Technical University of Crete, Greece in 2006 with highest honors. After moving to the United States of America, he studied at The University of Texas and received his M.S. and Ph.D. in electrical engineering in 2008 and 2011, respectively. During the period from 2012 to 2016, he was a project research scientist at the
References (28)
- et al.
Self-triggered linear quadratic control
Automatica
(2014) - et al.
Lyapunov conditions for input-to-state stability of impulsive systems
Automatica
(2008) - et al.
Event-triggered communication and control co-design for networked control systems
Automatica
(2013) - et al.
Event driven intelligent PID controllers with applications to motion control
IFAC Proceedings Volumes
(2011) - et al.
To sample or not to sample: Self-triggered control for nonlinear systems
IEEE Transactions on Automatic Control
(2010) - Arzen, K. (1999). A simple event-based PID controller. In Proc. IFAC world congress, Vol. 18 (pp....
- Astrom, K., & Bernhardsson, B. (1999). Comparison of periodic and event based sampling for first order stochastic...
- et al.
Computer controlled systems: Theory and design
(1997) - et al.
Neuro-dynamic programming
(1996) - et al.
Reinforcement learning and dynamic programming using function approximators
(2010)
Impulsive and hybrid dynamical systems: Stability, dissipativity, and control
Cited by (0)
Kyriakos G. Vamvoudakis was born in Athens, Greece. He received the diploma (a 5 year degree, equivalent to a master of science) in electronic and computer engineering from Technical University of Crete, Greece in 2006 with highest honors. After moving to the United States of America, he studied at The University of Texas and received his M.S. and Ph.D. in electrical engineering in 2008 and 2011, respectively. During the period from 2012 to 2016, he was a project research scientist at the Center for Control, Dynamical Systems and Computation at the University of California, Santa Barbara. He is now an assistant professor at the Kevin T. Crofton Department of Aerospace and Ocean Engineering at Virginia Tech. His research interests have focused on game-theoretic control, network security, smart grid and multi-agent optimization.
He is the recipient of several international awards including the 2016 International Neural Network Society Young Investigator (INNS) Award, the Best Paper Award for Autonomous/Unmanned Vehicles at the 27th Army Science Conference in 2010, in the Best Presentation Award at the World Congress of Computational Intelligence in 2010, and the Best Researcher Award from the Automation and Robotics Research Institute in 2011.
He is a coauthor of one patent, more than 90 technical publications, and two books. He currently is an associate editor of the Journal of Optimization Theory and Applications, an associate editor of Control Theory and Technology, a registered electrical/computer engineer (PE) and a member of the Technical Chamber of Greece. He is a senior member of IEEE.
Henrique Ferraz received his B.S. degree in control and automation engineering and his M.S. degree in electrical engineering, both from Federal University of Rio de Janeiro, Brazil in 2009 and 2012. Currently, he is pursuing his Ph.D. degree in the area of control systems, in the Department of Electrical and Computer Engineering at the University of California, Santa Barbara. His research interests include network control systems, estimation theory, optimization, and learning.
- ☆
The work was partially supported by a Virginia Tech startup fund and by a CAPES BEX 1111-13-1 grant. The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Akira Kojima under the direction of Editor Ian R. Petersen.