Online event-triggered adaptive critic design for multi-player zero-sum games of partially unknown nonlinear systems with input constraints
Introduction
As a crucial branch of game theory, differential game theory, which focuses on continuous-time systems, has been extensively applied in sociology, economics, military science and other research fields [1], [2], [3]. When applied to the control fields, differential games could contribute to the solutions of various issues such as multivehicle coordinated lane change issue [4], the missile interception issue [5] and cooperative target tracking issue of multiple robots [6]. Under these engineering backgrounds, differential games is substantially the optimal control issue with multiple controllers [7]. Herein, the controllers could be deemed as players that have both team-based objects and personal interests. According to the objects of players, differential games could be divided into two categories, i.e., zero-sum games and non-zero-sum games. This work focuses on MP-ZSGs of which the target is to solve the Hamilton-Jacobi-Isaacs equations (HJIEs) which is a generalized version of Hamilton–Jacobi-Bellman (HJB) equations in optimal control issue. For nonlinear systems, these equations are often intractable to solve. Hence, different intelligent methods have been designed to tackle HJIEs [8], [9], [10].
Adaptive critic design (ACD) is a method integrating the ideas of approximations and dynamic programming methodologies that could solve the optimization issues with effect [11], [12]. The ACD-based algorithms generally rely on the actor-critic framework in which the actor collaborates with the critic to avert the ”curse of dimensionality”. The elaborated process is that the actor has the responsibility to execute control strategy with the current data and the critic is responsible for providing the actor with the feedback data which is obtained by evaluating the current strategy [13]. Due to the fact that ACD, adaptive dynamic programming (ADP) and reinforcement learning (RL) have many similar spirits, they often could be included in the same category of algorithms [14]. Recently, various algorithms based on ACD/ADP/RL have been developed to solve the optimization issues [15], [16], [17]. More specifically, for example, in [18], an online RL method was developed to address the linear quadratic tracking issue without knowing the system drift dynamics. The optimal control of unknown systems with disturbances was presented through adaptive fuzzy control in [19] or an off-policy algorithm in [20]. Under the critic-actor framework, In [21], a near-optimal control scheme for nonzero-sum games of nonlinear systems was developed utilizing single-network ADP. In [22], a NN was employed to reconstruct unknown system dynamics and a single-critic ADP-based method was designed to learn the solutions of multiple-player nonzero-sum games (MP-NZSGs) systems.
In general, there exist the phenomena of that the resources are limited in some specific applications, for example, the networked control systems (NCSs) composed of deployed sensors, actuators, communication modules and control modules [23]. In these cases, the high-frequency data acquisition and employment could contribute to guaranteeing control performances. Nevertheless, frequent computing, control actions and data exchange among the modules occupy the valued computing/communication resources, which may well result in excessive energy consumption and poor control performances. Hence, ETM, a type of data filtration mechanisms, is integrated into ACD/RL/ADP methods to address this issue. This mechanism works in the manner that only when the predetermined triggering condition is violated the current data is selected to update the control strategy [24]. In [25], with the assistance of generalized fuzzy hyperbolic models employed to rebuild unknown system models, an event-triggered ADP method was designed to solve MP-NZSGs issue. Under the framework of identifier-critic networks, an ADP algorithm employed ETM to solve zero-sum game issue of the system with partially unknown dynamics [26]. In [27], ETM was combined with ACD method to derive a guaranteed cost control strategy of the systems subjects to matched uncertainties. Although adaptive tracking control has been developed as [28], a novel event-driven ADP approach was presented to find an output tracking control scheme to address the optimal tracking control issue in [29].
There widely exist saturation phenomena in many applications which is likely to affect system performances. Hence, to realize global optimization objective as literature [30], the constrained controller design has gained numerous attention. By integrating ETM, an ADP approach based on actor-critic structure was presented to seek the control strategy when considering control constraints. Under the single-critic architecture, a novel RL approach of which the advantage was that there was no special requirement imposed on the initial control was developed to design the robust controller for uncertain constrained-input systems. In [31], the off-policy learning mechanism was combined with integral reinforcement learning technique to design an iterative method for MP-NZSGs constrained issues without knowing system dynamics.
The experience replay (ER) technique presented in [32] could take advantage of the recorded data and current data with effect. To remove the PE condition, ER technique was introduced to integral reinforcement learning structure to solve MP-NZSGs issue with unknown drift dynamics [33]. And in [34], decentralized event-based control strategies for interconnected systems were developed by adaptive critic learning with ER.
In the light of the above-mentioned research results, it’s researched in our work that online constrained event-based control for MP-ZSGs of continuous-time nonlinear systems of which the model is partially unknown. The identifier-critic framework is constructed to solve the constrained MP-ZSGs issue without knowing the drift dynamics. With the aid of ER method, the proposed algorithm could remove the PE condition via reasonably using the recorded data and current data. Meanwhile, the ETM is introduced to design event-triggered controls to save the limited computing/communication resources. Furthermore, it’s proved that the critic weights errors and system states are UUB and the Zeno behavior can be excluded under the designed triggering condition.
The contributions of this work include three aspects.
- 1.
Different from [35], [36], [37], this paper proposes an ACD method of identifier-critic framework. Besides, the introduction of ETM can reduce the computing/communication burdens further.
- 2.
Unlike [26], this paper addresses MP-ZSGs issue of the nonlinear system with constrained controls. The non-quadratic functions are utilized to construct the cost functions and the corresponding event-triggered condition is designed such that the controls can be updated in an aperiodic fashion.
- 3.
This paper constructs a modified tuning strategy for critic which is derived from ER technique. Compared with [38], the proposed method could remove the PE condition via reasonably utilizing the recorded data and current data. Due to this character, it could also be deemed as a concurrent learning method.
The paper is composed of five sections. The problem formulation is shown in Section 2. In Section 3, we frame the ACD-based event-driven control method and present the stability demonstrations of the closed-loop system. A numerical example is simulated to present the availability of the designed algorithm in Section 4. In Section 5, we draw the conclusions.
Notations: denotes the set containing all real numbers. represents the n-dimensional Euclidean space. denotes the space including all real matrices. is a compact set that contains the origin. represents the set of all positive integers. and respectively denote the subsets of . denotes the Euclidean norm of a vector/matrix. Tr represents the trace operation. is gradient operator. and respectively denote the minimum eigenvalue and maximum eigenvalue of a matrix. denotes the n-dimensional unix matrix.
Section snippets
Problem formulation
Consider the continuous-time nonlinear system formulated aswhere is the system state, and separately represent constrained control input and disturbance vector. and . The function is Lipschitz continuous on the compact set . Moreover, system (1) is assumed to be controllable.
Define the performance index as
The utility function is given bywhere
Event-based adaptive control designs for MP-ZSGs
In this section, ETM is integrated into the ACD algorithm architecture. Thus the MP-ZSGs of system (1) is figured out. Furthermore, considering the unknown internal dynamics, we design an identifier-critic framework via constructing the identifier network and critic network. At last it’s proved that system states and the errors of the critic weights are UUB and the Zeno behavior can be avoided under the triggering condition.
Simulations
In this part, an example is simulated to show the validity of the designed approach for solving MP-ZSGs issue with input constraints. Considering the system of MP-ZSGs described as
Herein, the number of players and . The initial state is set as [−1; 1] and the function . Other corresponding parameters are: and .
To learn the unknown dynamics of (72)
Conclusion
In this work, an online event-triggered ACD method has been proposed to solve MP-ZSGs of continuous-time systems with partially unknown dynamics. The non-quadratic functions are adopted to construct utility functions when considering control constraints. For that system drift dynamics is unknown, an identifier is designed via using NN to reconstruct the system model. And a critic NN is utilized to approximate the solutions of HJIEs. Owing to the utilization of ER technique, through reasonably
CRediT authorship contribution statement
Pengda Liu: Conceptualization, Methodology, Software, Writing - original draft. Huaguang Zhang: Data curation, Supervision, Methodology, Conceptualization. He Ren: Formal analysis, Software, Visualization. Chong Liu: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no conflicts of interest to this work. We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no any commercial or associative interest that represents a conflict of interest in connection with the paper submitted. I would like to declare on behalf of my co-authors that the work described is original research that has not been published previously, and not under
Acknowledgements
This work was supported by National Key RD Program of China under grant 2018YFA0702200, and National Natural Science Foundation of China (61627809, 61621004), and Liaoning Revitalization Talents Program (XLYC1801005).
Pengda Liu received the B.S. degree in automation control and the M.S. degree in control engineering from Northeastern University, Shenyang, China, in 2012 and 2014. He is currently pursuing the Ph.D. degree in control theory and control engineering with College of Information Science and Engineering, Northeastern University, Shenyang, China. His research interests include adaptive dynamic programming, reinforcement learning, and optimal control.
References (49)
- et al.
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
Neurocomputing
(2013) - et al.
Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems
Neurocomputing
(2016) - et al.
Stability analysis of heuristic dynamic programming algorithm for nonlinear systems
Neurocomputing
(2015) - et al.
Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems
Neurocomputing
(2019) - et al.
Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances
Neural Netw.
(2018) - et al.
Efficient model-based reinforcement learning for approximate online optimal control
Automatica
(2016) - et al.
Global adaptive control of switched uncertain nonlinear systems: An improved MDADT method
Automatica
(2020) - et al.
Global adaptive stabilization of stochastic high-order switched nonlinear non-lower triangular systems
Syst. Control Lett.
(2020) - et al.
Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
Neurocomputing
(2019) - et al.
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica
(2010)
Neural-network-based synchronous iteration learning method for multi-player zero-sum games
Neurocomputing
Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
Automatica
An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games
Automatica
Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games
IEEE Trans. Neural Netw. Learn. Syst.
Multivehicle coordinated lane change strategy in the roundabout under internet of vehicles based on game theory and cognitive computing
IEEE Trans. Ind. Informat.
Differential-game-based guidance law using target orientation observations
IEEE Trans. Aerosp. Electron. Syst.
Cooperative target tracking control of multiple robots
IEEE Trans. Ind. Electron.
Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism
Nonlinear Dyn.
An iterative relaxation approach to the solution of the Hamilton-Jacobi-Bellman-Isaacs equation in nonlinear optimal control
IEEE/CAA J. Automat. Sin.
Relaxed control design of discrete-time Takagi-Sugeno fuzzy system: an event-triggered real-time scheduling approach
IEEE Trans. Syst. Man Cybern. Syst.
Adaptive critic designs for event-triggered robust control of nonlinear systems with unknonwn dynamics
IEEE Trans. Cybern.
Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online
IEEE Control Syst.
Discrete-time local value iteration adaptive dynamic programming: Convergence analysis
IEEE Trans. Syst., Man, Cybern., Syst.
Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning
IEEE Trans. Autom. Control
Cited by (12)
A learning-based approach to event-triggered guaranteed cost control for completely unknown nonlinear systems
2024, Transactions of the Institute of Measurement and ControlFully cooperative games with state and input constraints using reinforcement learning based on control barrier functions
2024, Asian Journal of ControlOptimal control of partially unknown constrained-input systems: A dynamic event-triggered-based approach
2024, Optimal Control Applications and Methods
Pengda Liu received the B.S. degree in automation control and the M.S. degree in control engineering from Northeastern University, Shenyang, China, in 2012 and 2014. He is currently pursuing the Ph.D. degree in control theory and control engineering with College of Information Science and Engineering, Northeastern University, Shenyang, China. His research interests include adaptive dynamic programming, reinforcement learning, and optimal control.
Huaguang Zhang (M’03–SM’04–F’14) received the B.S. degree and the M.S. degree in control engineering from Northeast Dianli University of China, Jilin City, China, in 1982 and 1985, respectively. He received the Ph.D. degree in thermal power engineering and automation from Southeast University, Nanjing, China, in 1991. He joined the Department of Automatic Control, Northeastern University, Shenyang, China, in 1992, as a Postdoctoral Fellow for two years. Since 1994, he has been a Professor and Head of the Institute of Electric Automation, School of Information Science and Engineering, Northeastern University, Shenyang, China. His main research interests are fuzzy control, stochastic system control, neural networks based control, nonlinear control, and their applications. He has authored and coauthored over 280 journal and conference papers, six monographs and co-invented 90 patents. Dr. Zhang is the fellow of IEEE, the E-letter Chair of IEEE CIS Society, the former Chair of the Adaptive Dynamic Programming & Reinforcement Learning Technical Committee on IEEE Computational Intelligence Society. He is an Associate Editor of AUTOMATICA , IEEE TRANSACTIONS ON NEURAL NETWORKS, IEEE TRANSACTIONS ON CYBERNETICS, and NEUROCOMPUTING, respectively. He was an Associate Editor of IEEE TRANSACTIONS ON FUZZY SYSTEMS (2008–2013). He was awarded the Outstanding Youth Science Foundation Award from the National Natural Science Foundation Committee of China in 2003. He was named the Cheung Kong Scholar by the Education Ministry of China in 2005. He is a recipient of the IEEE Transactions on Neural Networks 2012 Outstanding Paper Award. He is also a recipient of Andrew P. Sage Best Transactions Paper Award 2015.
He Ren received the B.S. degree in automation control in 2013 and the M.S. degree in control theory and control engineering in 2016 from Northeast Dian li University, Jilin, China. He has been pursuing the Ph.D. degree since 2016 in Northeastern University, Shenyang, China. His current research covers neural adaptive dynamic programming, neural networks, optimal control and their industrial applications.
Chong Liu received the B.S. degree in electronic and information engineering from Inner Mongolia Normal University, Inner Mongolia, China, in 2011, the M.S. degree in electronic science and technology from Changchun University of Science and Technology, Changchun, China, in 2015 and the Ph.D. degree in control theory and control engineering from Northeastern University, Shenyang, China, in 2020.
He is currently working in Xi’an University of Architecture and Technology as a Lecturer. His research interests include adaptive dynamic programming, neural network, and optimal control.