Online event-triggered adaptive critic design for multi-player zero-sum games of partially unknown nonlinear systems with input constraints

doi:10.1016/j.neucom.2021.07.058

Neurocomputing

Volume 462, 28 October 2021, Pages 309-319

https://doi.org/10.1016/j.neucom.2021.07.058 Get rights and content

Abstract

This paper focuses on the design of online event-triggered optimal control strategy for multi-player zero-sum games (MP-ZSGs) with control constraints when the system model is partially unknown. Non-quadratic functions are utilized to construct the cost functions under the condition of control constraints. The proposed algorithm is designed based on the framework of identifier-critic networks. The unknown drift dynamics model is reconstructed by an identifier neural network (INN) using the input and output data. The near-optimal event-based controls and time-based disturbances are designed by training a critic neural network (CNN). With the aid of the designed event-triggered mechanism (ETM), the needless computing and communication actions of the system signals have been reduced so as to save computing/communication resources. Meanwhile, to remove the persistence of excitation (PE) condition, the historical and current data are utilized to construct a modified tuning law of CNN. Theoretically, the uniform ultimate boundedness (UUB) properties of the system states and the critic weights errors are proved by Lyapunov approach. Moreover, the Zeno behavior is proved to be excluded under the designed triggering condition. Finally, the convergence and performance of the online method are verified by simulating a representative example.

Introduction

As a crucial branch of game theory, differential game theory, which focuses on continuous-time systems, has been extensively applied in sociology, economics, military science and other research fields [1], [2], [3]. When applied to the control fields, differential games could contribute to the solutions of various issues such as multivehicle coordinated lane change issue [4], the missile interception issue [5] and cooperative target tracking issue of multiple robots [6]. Under these engineering backgrounds, differential games is substantially the optimal control issue with multiple controllers [7]. Herein, the controllers could be deemed as players that have both team-based objects and personal interests. According to the objects of players, differential games could be divided into two categories, i.e., zero-sum games and non-zero-sum games. This work focuses on MP-ZSGs of which the target is to solve the Hamilton-Jacobi-Isaacs equations (HJIEs) which is a generalized version of Hamilton–Jacobi-Bellman (HJB) equations in optimal control issue. For nonlinear systems, these equations are often intractable to solve. Hence, different intelligent methods have been designed to tackle HJIEs [8], [9], [10].

Adaptive critic design (ACD) is a method integrating the ideas of approximations and dynamic programming methodologies that could solve the optimization issues with effect [11], [12]. The ACD-based algorithms generally rely on the actor-critic framework in which the actor collaborates with the critic to avert the ”curse of dimensionality”. The elaborated process is that the actor has the responsibility to execute control strategy with the current data and the critic is responsible for providing the actor with the feedback data which is obtained by evaluating the current strategy [13]. Due to the fact that ACD, adaptive dynamic programming (ADP) and reinforcement learning (RL) have many similar spirits, they often could be included in the same category of algorithms [14]. Recently, various algorithms based on ACD/ADP/RL have been developed to solve the optimization issues [15], [16], [17]. More specifically, for example, in [18], an online RL method was developed to address the linear quadratic tracking issue without knowing the system drift dynamics. The optimal control of unknown systems with disturbances was presented through adaptive fuzzy control in [19] or an off-policy algorithm in [20]. Under the critic-actor framework, In [21], a near-optimal control scheme for nonzero-sum games of nonlinear systems was developed utilizing single-network ADP. In [22], a NN was employed to reconstruct unknown system dynamics and a single-critic ADP-based method was designed to learn the solutions of multiple-player nonzero-sum games (MP-NZSGs) systems.

In general, there exist the phenomena of that the resources are limited in some specific applications, for example, the networked control systems (NCSs) composed of deployed sensors, actuators, communication modules and control modules [23]. In these cases, the high-frequency data acquisition and employment could contribute to guaranteeing control performances. Nevertheless, frequent computing, control actions and data exchange among the modules occupy the valued computing/communication resources, which may well result in excessive energy consumption and poor control performances. Hence, ETM, a type of data filtration mechanisms, is integrated into ACD/RL/ADP methods to address this issue. This mechanism works in the manner that only when the predetermined triggering condition is violated the current data is selected to update the control strategy [24]. In [25], with the assistance of generalized fuzzy hyperbolic models employed to rebuild unknown system models, an event-triggered ADP method was designed to solve MP-NZSGs issue. Under the framework of identifier-critic networks, an ADP algorithm employed ETM to solve zero-sum game issue of the system with partially unknown dynamics [26]. In [27], ETM was combined with ACD method to derive a guaranteed cost control strategy of the systems subjects to matched uncertainties. Although adaptive tracking control has been developed as [28], a novel event-driven ADP approach was presented to find an output tracking control scheme to address the optimal tracking control issue in [29].

There widely exist saturation phenomena in many applications which is likely to affect system performances. Hence, to realize global optimization objective as literature [30], the constrained controller design has gained numerous attention. By integrating ETM, an ADP approach based on actor-critic structure was presented to seek the control strategy when considering control constraints. Under the single-critic architecture, a novel RL approach of which the advantage was that there was no special requirement imposed on the initial control was developed to design the robust controller for uncertain constrained-input systems. In [31], the off-policy learning mechanism was combined with integral reinforcement learning technique to design an iterative method for MP-NZSGs constrained issues without knowing system dynamics.

The experience replay (ER) technique presented in [32] could take advantage of the recorded data and current data with effect. To remove the PE condition, ER technique was introduced to integral reinforcement learning structure to solve MP-NZSGs issue with unknown drift dynamics [33]. And in [34], decentralized event-based control strategies for interconnected systems were developed by adaptive critic learning with ER.

In the light of the above-mentioned research results, it’s researched in our work that online constrained event-based control for MP-ZSGs of continuous-time nonlinear systems of which the model is partially unknown. The identifier-critic framework is constructed to solve the constrained MP-ZSGs issue without knowing the drift dynamics. With the aid of ER method, the proposed algorithm could remove the PE condition via reasonably using the recorded data and current data. Meanwhile, the ETM is introduced to design event-triggered controls to save the limited computing/communication resources. Furthermore, it’s proved that the critic weights errors and system states are UUB and the Zeno behavior can be excluded under the designed triggering condition.

The contributions of this work include three aspects.

1.
Different from [35], [36], [37], this paper proposes an ACD method of identifier-critic framework. Besides, the introduction of ETM can reduce the computing/communication burdens further.
2.
Unlike [26], this paper addresses MP-ZSGs issue of the nonlinear system with constrained controls. The non-quadratic functions are utilized to construct the cost functions and the corresponding event-triggered condition is designed such that the controls can be updated in an aperiodic fashion.
3.
This paper constructs a modified tuning strategy for critic which is derived from ER technique. Compared with [38], the proposed method could remove the PE condition via reasonably utilizing the recorded data and current data. Due to this character, it could also be deemed as a concurrent learning method.

The paper is composed of five sections. The problem formulation is shown in Section 2. In Section 3, we frame the ACD-based event-driven control method and present the stability demonstrations of the closed-loop system. A numerical example is simulated to present the availability of the designed algorithm in Section 4. In Section 5, we draw the conclusions.

Notations: $R$ denotes the set containing all real numbers. $R^{n}$ represents the n-dimensional Euclidean space. $R^{n \times m}$ denotes the space including all real matrices. $ϒ$ is a compact set that contains the origin. $N^{+}$ represents the set of all positive integers. $N = {1, \dots, N}$ and $M = {1, \dots, M}$ respectively denote the subsets of $N^{+}$ . $‖ \cdot ‖$ denotes the Euclidean norm of a vector/matrix. Tr $(\cdot)$ represents the trace operation. $▽ (\cdot) ≜ \partial (\cdot) / \partial x$ is gradient operator. $λ_{em} (\cdot)$ and $λ_{eM} (\cdot)$ respectively denote the minimum eigenvalue and maximum eigenvalue of a matrix. $I_{n \times n}$ denotes the n-dimensional unix matrix.

Section snippets

Problem formulation

Consider the continuous-time nonlinear system formulated as $\dot{x} = f (x) + \sum_{k = 1}^{N} g_{k} (x) u_{k} + \sum_{l = 1}^{M} h_{l} (x) w_{l},$ where $x \in ϒ \subset R^{n}$ is the system state, $u_{k} \in μ \subset R^{m_{k}}$ and $w_{l} \in ω \subset R^{m_{l}}$ separately represent constrained control input and disturbance vector. $μ = {u_{1}, \dots, u_{N}}$ and $ω = {w_{1}, \dots, w_{M}}$ . The function $f (x)$ is Lipschitz continuous on the compact set $ϒ$ . Moreover, system (1) is assumed to be controllable.

Define the performance index as $J (x_{0}, μ, ω) = \int_{0}^{\infty} φ (x, μ, ω) d ς .$

The utility function $φ (x, μ, ω)$ is given by $φ (x, μ, ω) = x^{⊤} Px + χ (μ) - \sum_{l = 1}^{M} w_{l}^{⊤} S_{l} w_{l},$ where $χ ($

Event-based adaptive control designs for MP-ZSGs

In this section, ETM is integrated into the ACD algorithm architecture. Thus the MP-ZSGs of system (1) is figured out. Furthermore, considering the unknown internal dynamics, we design an identifier-critic framework via constructing the identifier network and critic network. At last it’s proved that system states and the errors of the critic weights are UUB and the Zeno behavior can be avoided under the triggering condition.

Simulations

In this part, an example is simulated to show the validity of the designed approach for solving MP-ZSGs issue with input constraints. Considering the system of MP-ZSGs described as $\dot{x} = [\begin{matrix} 0.35 x_{2} - 0.4 x_{1} \\ - 0.3 x_{1} - 0.3 x_{2} + 0.15 x_{1}^{2} x_{2} \end{matrix}] + [\begin{matrix} 0.15 \\ 0.3 \end{matrix}] u_{1} + [\begin{matrix} 0 \\ 0.2 \cos (2 x_{2}) \end{matrix}] u_{2} + [\begin{matrix} 0 \\ 0.4 \sin (2 x_{1}) \end{matrix}] w_{1} + [\begin{matrix} 0.4 \\ 0.4 \end{matrix}] w_{2} .$

Herein, the number of players $N = 2$ and $M = 2$ . The initial state $x_{0}$ is set as [−1; 1] and the function $ψ_{k} = α_{k} \tanh (\cdot), k = 1, 2$ . Other corresponding parameters are: $R_{1} = R_{2} = 0.6, S_{1} = S_{2} = 0.5, P = 0.5$ and $α_{1} = α_{2} = 0.1$ .

To learn the unknown dynamics of (72)

Conclusion

In this work, an online event-triggered ACD method has been proposed to solve MP-ZSGs of continuous-time systems with partially unknown dynamics. The non-quadratic functions are adopted to construct utility functions when considering control constraints. For that system drift dynamics is unknown, an identifier is designed via using NN to reconstruct the system model. And a critic NN is utilized to approximate the solutions of HJIEs. Owing to the utilization of ER technique, through reasonably

CRediT authorship contribution statement

Pengda Liu: Conceptualization, Methodology, Software, Writing - original draft. Huaguang Zhang: Data curation, Supervision, Methodology, Conceptualization. He Ren: Formal analysis, Software, Visualization. Chong Liu: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no conflicts of interest to this work. We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no any commercial or associative interest that represents a conflict of interest in connection with the paper submitted. I would like to declare on behalf of my co-authors that the work described is original research that has not been published previously, and not under

Acknowledgements

This work was supported by National Key R $&$ D Program of China under grant 2018YFA0702200, and National Natural Science Foundation of China (61627809, 61621004), and Liaoning Revitalization Talents Program (XLYC1801005).

Pengda Liu received the B.S. degree in automation control and the M.S. degree in control engineering from Northeastern University, Shenyang, China, in 2012 and 2014. He is currently pursuing the Ph.D. degree in control theory and control engineering with College of Information Science and Engineering, Northeastern University, Shenyang, China. His research interests include adaptive dynamic programming, reinforcement learning, and optimal control.

References (49)

D. Liu et al.
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
Neurocomputing
(2013)
C. Mu et al.
Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems
Neurocomputing
(2016)
T. Feng et al.
Stability analysis of heuristic dynamic programming algorithm for nonlinear systems
Neurocomputing
(2015)
H. Su et al.
Online event-triggered adaptive critic design for non-zero-sum games of partially unknown networked systems
Neurocomputing
(2019)
X. Yang et al.
Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances
Neural Netw.
(2018)
R. Kamalapurkar et al.
Efficient model-based reinforcement learning for approximate online optimal control
Automatica
(2016)
B. Niu et al.
Global adaptive control of switched uncertain nonlinear systems: An improved MDADT method
Automatica
(2020)
B. Niu et al.
Global adaptive stabilization of stochastic high-order switched nonlinear non-lower triangular systems
Syst. Control Lett.
(2020)
H. Ren et al.
Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator
Neurocomputing
(2019)
K.G. Vamvoudakis et al.
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica
(2010)

R. Song et al.

Neural-network-based synchronous iteration learning method for multi-player zero-sum games

Neurocomputing

(2017)

R.W. Beard et al.

Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation

Automatica

(1997)

H. Zhang et al.

An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games

Automatica

(2011)

R. Song et al.

Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games

IEEE Trans. Neural Netw. Learn. Syst.

(2017)

N. Ding et al.

Multivehicle coordinated lane change strategy in the roundabout under internet of vehicles based on game theory and cognitive computing

IEEE Trans. Ind. Informat.

(2020)

Y. Oshman et al.

Differential-game-based guidance law using target orientation observations

IEEE Trans. Aerosp. Electron. Syst.

(2006)

Z. Wang et al.

Cooperative target tracking control of multiple robots

IEEE Trans. Ind. Electron.

(2012)

C. Mu et al.

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Nonlinear Dyn.

(2019)

M.D.S. Aliyu

An iterative relaxation approach to the solution of the Hamilton-Jacobi-Bellman-Isaacs equation in nonlinear optimal control

IEEE/CAA J. Automat. Sin.

(2018)

X. Xie et al.

Relaxed control design of discrete-time Takagi-Sugeno fuzzy system: an event-triggered real-time scheduling approach

IEEE Trans. Syst. Man Cybern. Syst.

(2018)

X. Yang et al.

Adaptive critic designs for event-triggered robust control of nonlinear systems with unknonwn dynamics

IEEE Trans. Cybern.

(2019)

K.G. Vamvoudakis et al.

Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online

IEEE Control Syst.

(2017)

Q. Wei et al.

Discrete-time local value iteration adaptive dynamic programming: Convergence analysis

IEEE Trans. Syst., Man, Cybern., Syst.

(2018)

H. Modares et al.

Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning

IEEE Trans. Autom. Control

(2014)

Cited by (12)

Model-free adaptive dynamic event-triggered robust control for unknown nonlinear systems using iterative neural dynamic programming
2024, Information Sciences
This paper proposes an adaptive dynamic event-triggered (ADET) robust control method for unknown nonlinear systems using iterative neural dynamic programming (INDP). Firstly, the ADET robust control problem is transformed into an ADET optimal control problem by introducing an infinite domain integral function. Then, dynamic variables and adaptive thresholds are designed for discrete-time systems with auxiliary variables. The proposed ADET method reduces computation and transmission costs compared to existing static and dynamic event triggering methods. INDP is utilized to learn the optimal solution of the ADET Hamilton-Jacobi-Bellman equation within the heuristic dynamic programming (HDP) framework, providing system stability and algorithm convergence. The INDP algorithm employs model, action, and critic neural networks and includes a method to directly minimize the iterative cost function in the back-propagation process. The neural network implementation of the INDP algorithm is detailed. Simulation results demonstrate that the proposed ADET method based on INDP achieves faster convergence with fewer transmitted data and control updates.
Safe reinforcement learning for discrete-time fully cooperative games with partial state and control constraints using control barrier functions
2023, Neurocomputing
In this paper, a novel safe reinforcement learning is proposed for fully cooperative games of discrete-time multi-player systems with partial state and control constraints. The fully cooperative game is a special case of the nonzero-sum games where all players cooperate to accomplish the common task. However, there are few works for fully cooperative game issues of discrete-time systems with partial state and control constraints. The issue is addressed by our algorithm based on the constrained value iteration framework using the measured data along the system trajectories, and the Nash equilibrium of the constrained fully cooperative game is achieved. Compared to previous methods for fully cooperative game issues, neither the accurate system dynamics nor the initial admissible control policies are required via the algorithm. Meanwhile, the discrete-time exponential control barrier functions are adopted to address the issue of state constraints. Moreover, the convergence of the proposed algorithm is proven in theory. Then, the system dynamics, the control policies and the value function are approximated by the three-layer neural networks, respectively. Finally, two experiments are presented to demonstrate the safety and effectiveness of the proposed algorithm.
A learning-based approach to event-triggered guaranteed cost control for completely unknown nonlinear systems
2024, Transactions of the Institute of Measurement and Control
Zero-Sum Game-Based Decentralized Optimal Control for Saturated Nonlinear Interconnected Systems via a Data and Event Driven Approach
2024, IEEE Systems Journal
Fully cooperative games with state and input constraints using reinforcement learning based on control barrier functions
2024, Asian Journal of Control
Optimal control of partially unknown constrained-input systems: A dynamic event-triggered-based approach
2024, Optimal Control Applications and Methods

View all citing articles on Scopus

Huaguang Zhang (M’03–SM’04–F’14) received the B.S. degree and the M.S. degree in control engineering from Northeast Dianli University of China, Jilin City, China, in 1982 and 1985, respectively. He received the Ph.D. degree in thermal power engineering and automation from Southeast University, Nanjing, China, in 1991. He joined the Department of Automatic Control, Northeastern University, Shenyang, China, in 1992, as a Postdoctoral Fellow for two years. Since 1994, he has been a Professor and Head of the Institute of Electric Automation, School of Information Science and Engineering, Northeastern University, Shenyang, China. His main research interests are fuzzy control, stochastic system control, neural networks based control, nonlinear control, and their applications. He has authored and coauthored over 280 journal and conference papers, six monographs and co-invented 90 patents. Dr. Zhang is the fellow of IEEE, the E-letter Chair of IEEE CIS Society, the former Chair of the Adaptive Dynamic Programming & Reinforcement Learning Technical Committee on IEEE Computational Intelligence Society. He is an Associate Editor of AUTOMATICA , IEEE TRANSACTIONS ON NEURAL NETWORKS, IEEE TRANSACTIONS ON CYBERNETICS, and NEUROCOMPUTING, respectively. He was an Associate Editor of IEEE TRANSACTIONS ON FUZZY SYSTEMS (2008–2013). He was awarded the Outstanding Youth Science Foundation Award from the National Natural Science Foundation Committee of China in 2003. He was named the Cheung Kong Scholar by the Education Ministry of China in 2005. He is a recipient of the IEEE Transactions on Neural Networks 2012 Outstanding Paper Award. He is also a recipient of Andrew P. Sage Best Transactions Paper Award 2015.

He Ren received the B.S. degree in automation control in 2013 and the M.S. degree in control theory and control engineering in 2016 from Northeast Dian li University, Jilin, China. He has been pursuing the Ph.D. degree since 2016 in Northeastern University, Shenyang, China. His current research covers neural adaptive dynamic programming, neural networks, optimal control and their industrial applications.

Chong Liu received the B.S. degree in electronic and information engineering from Inner Mongolia Normal University, Inner Mongolia, China, in 2011, the M.S. degree in electronic science and technology from Changchun University of Science and Technology, Changchun, China, in 2015 and the Ph.D. degree in control theory and control engineering from Northeastern University, Shenyang, China, in 2020.

He is currently working in Xi’an University of Architecture and Technology as a Lecturer. His research interests include adaptive dynamic programming, neural network, and optimal control.

View full text

Online event-triggered adaptive critic design for multi-player zero-sum games of partially unknown nonlinear systems with input constraints

Abstract

Introduction

Section snippets

Problem formulation

Event-based adaptive control designs for MP-ZSGs

Simulations

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neural Netw.

Automatica

Automatica

Syst. Control Lett.

Neurocomputing

Automatica

Neurocomputing

Automatica

An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games

Automatica

Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games

IEEE Trans. Neural Netw. Learn. Syst.

Multivehicle coordinated lane change strategy in the roundabout under internet of vehicles based on game theory and cognitive computing

IEEE Trans. Ind. Informat.

Differential-game-based guidance law using target orientation observations

IEEE Trans. Aerosp. Electron. Syst.

Cooperative target tracking control of multiple robots

IEEE Trans. Ind. Electron.

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Nonlinear Dyn.

An iterative relaxation approach to the solution of the Hamilton-Jacobi-Bellman-Isaacs equation in nonlinear optimal control

IEEE/CAA J. Automat. Sin.

Relaxed control design of discrete-time Takagi-Sugeno fuzzy system: an event-triggered real-time scheduling approach

IEEE Trans. Syst. Man Cybern. Syst.

Adaptive critic designs for event-triggered robust control of nonlinear systems with unknonwn dynamics

IEEE Trans. Cybern.

Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online

IEEE Control Syst.

Discrete-time local value iteration adaptive dynamic programming: Convergence analysis

IEEE Trans. Syst., Man, Cybern., Syst.

Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning

IEEE Trans. Autom. Control