Neural-network-based synchronous iteration learning method for multi-player zero-sum games

doi:10.1016/j.neucom.2017.02.051

Neurocomputing

Volume 242, 14 June 2017, Pages 73-82

https://doi.org/10.1016/j.neucom.2017.02.051 Get rights and content

Abstract

In this paper, a synchronous solution method for multi-player zero-sum games without system dynamics is established based on neural network. The policy iteration (PI) algorithm is presented to solve the Hamilton–Jacobi–Bellman (HJB) equation. It is proven that the obtained iterative cost function is convergent to the optimal game value. For avoiding system dynamics, off-policy learning method is given to obtain the iterative cost function, controls and disturbances based on PI. Critic neural network (CNN), action neural networks (ANNs) and disturbance neural networks (DNNs) are used to approximate the cost function, controls and disturbances. The weights of neural networks compose the synchronous weight matrix, and the uniformly ultimately bounded (UUB) of the synchronous weight matrix is proven. Two examples are given to show that the effectiveness of the proposed synchronous solution method for multi-player ZS games.

Introduction

The importance of strategic behavior in the human and social world is increasingly recognized in theory and practice. As a result, game theory has emerged as a fundamental instrument in pure and applied research [1]. Modern day society relies on the operation of complex systems, including aircraft, automobiles, electric power systems, economic entities, business organizations, banking and finance systems, computer networks, manufacturing systems, and industrial processes. Networked dynamical agents have cooperative team-based goals as well as individual selfish goals, and their interplay can be complex and yield unexpected results in terms of emergent teams. Cooperation and conflict of multiple decision-makers for such systems can be studied within the field of cooperative and noncooperative game theory [2]. It knows that many real-world systems are often controlled by more than one controller or decision maker with each using an individual strategy. These controllers often operate in a group with a general quadratic performance index function as a game. Therefore, some scholars research the multi-player games. In [3], off-policy integral reinforcement learning method was developed to solve nonlinear continuous-time multi-player non-zero-sum (NZS) games. In [4], a multi-player zero-sum (ZS) differential games for a class of continuous-time uncertain nonlinear systems were solved using upper and lower iterations. ZS game theory relies on solving the Hamilton–Jacobi–Isaacs (HJI) equations, a generalized version of the Hamilton–Jacobi–Bellman(HJB) equations appearing in optimal control problems. In the nonlinear case the HJI equations are difficult or impossible to solve, and may not have global analytic solutions even in simple cases. Therefore, many approximate methods are proposed to obtain the solution of HJI equations [5], [6], [7], [8].

Adaptive dynamic programming (ADP) algorithm is an effective approximate method in optimal control field [9], [10], [11], [12], [13]. ADP algorithms include value iteration (VI) and policy iteration (PI) [14], [15], [16], [17]. VI is a Lyapunov recursion, which is easy to implement and does not require Lyapunov equation solutions [18], [19], [20], [21]. In [22], discrete-time VI was proposed to solve HJB equation approximately with convergence analysis. In [23], a novel non-model-based, data-driven adaptive optimal controller was presented by continuous-time VI. In [24], a class of continuous-time nonlinear two-player ZS differential games was considered, VI ADP method was proposed for the situations that the saddle point exists or does not exist. On the other hand, PI refers to a class of algorithms built as a two-step iteration: policy evaluation and policy improvement [25], starting from evaluating the performance index function of a given initial admissible (stabilizing) controller [26], [27], [28]. In [29], PI algorithm and convergence analysis were given for nonlinear systems with saturating actuators. In [30], optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning was developed. In [31], a data-driven ADP method was proposed for a class of continuous-time unknown nonlinear systems ZS optimal control problems. In [32], an online solution method for two-player ZS games was presented by synchronous PI.

Although the progress on ADP algorithm is significant in the optimal control field, within the radius of our knowledge, it is still an open problem about how to solve multi-player ZS games for completely unknown continuous-time nonlinear systems. In this paper, this open problem will be explicitly figured out. The main contributions of this paper are summarized as follows.

(1)
A synchronous solution method based on PI algorithm and neural networks is established.
(2)
It is proven that the iterative cost function converges to the optimal game value with system dynamics for traditional PI algorithm.
(3)
Synchronous solution method is given to solve the off-policy HJB equation with convergence analysis, according to critic neural network (CNN), action neural networks (ANNs) and disturbance neural networks (DNNs).
(4)
The uniformly ultimately bounded (UUB) of the synchronous weight matrix is proven.

The rest of this paper is organized as follows. In Section 2, we present the motivations and preliminaries of the discussed problem. In Section 3, the synchronous solution of multi-player ZS games is developed and the convergence proof is given. In Section 4, two examples are given to demonstrate the effectiveness of the proposed scheme. In Section 5, the conclusion is drawn.

Section snippets

Motivations and preliminaries

In this paper, we consider the continuous-time nonlinear system described by $\begin{matrix} \dot{x} = f (x) + g (x) \sum_{i = 1}^{p} u_{i} + h (x) \sum_{j = 1}^{q} d_{j} \end{matrix}$ where x ∈ Ω ∈ Rⁿ is the system state, $u_{i} \in R^{m_{1}}$ and $d_{j} \in R^{m_{2}}$ are the control input and the disturbance input, respectively. f(x) ∈ Rⁿ, g(x) and h(x) are unknown functions. $f (0) = 0$ and $x = 0$ is an equilibrium point of the system. Assume that f(x), g(x) and h(x) are locally Lipschitz functions on the compact set Ω that contains the origin. The dynamical system is stabilizable on Ω. The performance

Synchronous solution of multi-player ZS games

In this section, off-policy algorithm will be proposed based on Algorithm 1. The neural networks implementation process is also given. Based on that, the stability of the synchronous solution method is proven.

Simulation study

In this section, two examples will be provided to demonstrate the effectiveness of the optimal control scheme proposed in this paper.

Conclusions

This paper proposed a synchronous solution method for multi-player zero-sum games without system dynamics based on neural network. PI algorithm is presented to solve the HJB equation with system dynamics. It is proven that the obtained iterative cost function by PI is convergent to optimal game value. Based on PI, off-policy learning method is given to obtain the iterative cost function, controls and disturbances. The weights of CNN, ANNs and DNNs compose synchronous weight matrix, which is

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant 61304079, 61673054, 61374105, in part by Fundamental Research Funds for the Central Universities under Grant FRF-TP-15-056A3, and in part by the Open Research Project from SKLMCCS under Grant 20150104.

References (42)

T. Feng et al.
Stability analysis of heuristic dynamic programming algorithm for nonlinear systems
Neurocomputing
(2015)
FengT. et al.
Globally optimal distributed cooperative control for general linear multi-agent systems
Neurocomputing
(2016)
GaoW. et al.
Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming
Automatica
(2017)
WangT. et al.
Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach
Neurocomputing
(2016)
TangY. et al.
Fuzzy-based goal representation adaptive dynamic programming
IEEE Trans. Fuzzy Syst.
(2016)
WeiQ. et al.
Finite-approximation-error based discrete-time iterative adaptive dynamic programming
IEEE Trans. Cybern.
(2014)
QinC. et al.
Neural network-based online h control for discrete-time affine nonlinear system using adaptive dynamic programming
Neurocomputing
(2016)
A. Al-Tamimi et al.
Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof
IEEE Trans. Syst., Man, Cybern., Part B: Cybern.
(2008)
BianT. et al.
Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
Automatica
(2016)
ZhangH. et al.
An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games
Automatica
(2011)

D. Vrabie et al.

Adaptive optimal control for continuous-time linear systems based on policy iteration

Automatica

(2009)

SongR. et al.

Adaptive dynamic programming for a class of complex-valued nonlinear systems

IEEE Trans. Neural Netw. Learn. Syst.

(2014)

M. Abu-Khalaf et al.

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

Automatica

(2005)

K. Vamvoudakis et al.

Online solution of nonlinear two player zero-sum games using synchronous policy iteration

Int. J. Robust Nonlinear Control

(2012)

F. Lewis et al.

Optimal Control

(2012)

YeungD.W.K. et al.

Cooperative Stochastic Differential Games

(2006)

F. Lewis et al.

Optimal Control, Third Edition

(2012)

SongR. et al.

Off-policy integral reinforcement learning method to solve nonlinear continuous-time multi-player non-zero-sum games

IEEE Trans. Neural Netw. Learn. Syst.

(2017)

LiuD. et al.

Multiperson zero-sum differential games for a class of uncertain nonlinear systems

Int. J. Adapt. Control Signal Process.

(2014)

MuC. et al.

Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems

Neurocomputing

(2016)

FangX. et al.

Data-driven heuristic dynamic programming with virtual reality

Neurocomputing

(2015)

Cited by (40)

Neural network-based sliding mode controllers applied to robot manipulators: A review
2023, Neurocomputing
In recent years, numerous attempts have been made to integrate sliding mode control (SMC) and neural networks (NN) in order to leverage the advantages of both methods while mitigating their respective disadvantages. These endeavors have yielded significant achievements, leading to diverse applications in enhancing control performance for nonlinear objects, including robots. This paper primarily focuses on investigating critical technical research issues, potential applications, and future perspectives of SMC based on NNs when applied to robot manipulators. Firstly, a comprehensive examination is conducted to assess the advantages, disadvantages, and potential applications of SMC and its various variants. Secondly, recent advancements in control systems have introduced NNs as a promising innovation. NNs offer an alternative approach to adaptive learning and control, effectively addressing the technical challenges associated with SMCs. Finally, the assessment of these combined approaches' advantages and limitations is based on studies conducted over the last few decades, along with their future development directions.
Neural critic learning for tracking control design of constrained nonlinear multi-person zero-sum games
2022, Neurocomputing
In this paper, an adaptive critic method based on neural networks is established to solve the tracking control problem for multi-person zero-sum games with constrained nonlinear dynamics. First, an augmented system is constructed with the tracking error system and the reference system, an appropriate function is introduced to handle the constrained problem, and a constrained tracking Hamilton–Jacobi-Isaacs (HJI) equation is derived for the augmented system. Then, a constrained tracking design with neural critic learning for multi-person zero-sum games is developed to approximately solve the tracking HJI equation with input constraints. A new updating rule is given and only one critic network is employed during neural critic learning. In addition, we prove that the tracking error in the augmented system is uniformly ultimately bounded by using Lyapunov’s direct method. Finally, an example is given to verify the effectiveness of the proposed method. In this example, we make the number of control inputs less than the number of disturbance inputs.
Observer-based event-triggered control for zero-sum games of input constrained multi-player nonlinear systems
2021, Neural Networks
In this paper, an event-triggered control (ETC) method is investigated to solve zero-sum game (ZSG) problems of unknown multi-player continuous-time nonlinear systems with input constraints by using adaptive dynamic programming (ADP). To relax the requirement of system dynamics, a neural network (NN) observer is constructed to identify the dynamics of multi-player system via the input and output data. Then, the event-triggered Hamilton–Jacobi–Isaacs (HJI) equation of the ZSG can be solved by constructing a critic NN, and the approximated optimal control law and the worst disturbance law can be obtained directly. A triggering scheme which determines the updating time instants of the control law and the disturbance law is developed. Thus, the proposed ADP-based ETC method cannot only reduce the computational burden, but also save communication resource and bandwidths. Furthermore, we prove that the signals of the closed-loop system and the approximate errors of the critic NN weights are uniformly ultimately bounded by using Lyapunov’s direct method, and the Zeno behavior is excluded. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed ETC scheme.
Online event-triggered adaptive critic design for multi-player zero-sum games of partially unknown nonlinear systems with input constraints
2021, Neurocomputing
This paper focuses on the design of online event-triggered optimal control strategy for multi-player zero-sum games (MP-ZSGs) with control constraints when the system model is partially unknown. Non-quadratic functions are utilized to construct the cost functions under the condition of control constraints. The proposed algorithm is designed based on the framework of identifier-critic networks. The unknown drift dynamics model is reconstructed by an identifier neural network (INN) using the input and output data. The near-optimal event-based controls and time-based disturbances are designed by training a critic neural network (CNN). With the aid of the designed event-triggered mechanism (ETM), the needless computing and communication actions of the system signals have been reduced so as to save computing/communication resources. Meanwhile, to remove the persistence of excitation (PE) condition, the historical and current data are utilized to construct a modified tuning law of CNN. Theoretically, the uniform ultimate boundedness (UUB) properties of the system states and the critic weights errors are proved by Lyapunov approach. Moreover, the Zeno behavior is proved to be excluded under the designed triggering condition. Finally, the convergence and performance of the online method are verified by simulating a representative example.
A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward
2021, Neurocomputing
This paper constructs a partial policy iteration adaptive dynamic programming (ADP) algorithm to solve the optimal control problem of nonlinear systems with discounted total reward. Compared with traditional policy iteration ADP algorithm, the approach updates the iterative control law only in a local region of the global system state space. With the benefit of this feature, the overall computational burden at each iteration for processing units can be significantly reduced. Hence, this feature enables our algorithm to be successfully executed on low-performance devices such as smartphones, smartwatches and the Internet of Things (IoT) objects. We provide the convergency analysis to show that the generated sequence of value functions is monotonically nonincreasing and can finally reach a local optimum. In addition, the corresponding local policy space is developed theoretically for the first time. Besides, when the sequence of the local system state spaces is chosen properly, we prove that the developed algorithm is capable of finding the global optimal performance index function for the nonlinear systems. Finally, we present a numerical simulation to demonstrate the effectiveness of the proposed algorithm.
Event-triggered constrained robust control for partly-unknown nonlinear systems via ADP
2020, Neurocomputing
This paper proposes a constrained robust control algorithm for some continuous nonlinear systems with partly-unknown dynamics using an adaptive dynamic programming (ADP) method based on event-triggered (ET) mechanism. First, the constrained robustness of partly-unknown systems problems can be transformed into event-triggered optimal control problems with a suitable value function. Second, the identifier network and the critic network are applied to approximate the partly-unknown system and the cost function, respectively. In addition, the stability proof of the closed-loop system is addressed by Lyapunov theory with an appropriate triggering condition and the uniformly ultimately boundedness (UUB) of the closed-loop system is proved under the event-triggered mechanism. Moreover, the simulation experiments results show the superiority of the event-triggered mechanism via adaptive dynamic programming. The optimal control pair are only updated at the aperiodic sampling times determined by the event triggering condition, which can greatly improve the efficiency of communication resource utilization and save computing storage space.

View all citing articles on Scopus

Ruizhuo Song received the Ph.D. degree in control theory and control engineering from Northeastern University, Shenyang, China, in 2012. She was a postdoctoral fellow with University of Science and Technology Beijing, Beijing, China. She is currently an Associate Professor with the School of Automation and Electrical Engineering, University of Science and Technology Beijing. She was a Visiting Scholar with the Department of Electrical Engineering at University of Texas at Arlington, Arlington, TX, USA, from 2013 to 2014. Her current research interests include optimal control, neural-networks-based control, nonlinear control, wireless sensor networks, and adaptive dynamic programming and their industrial application. She has published over 40 journal and conference papers, and coauthored 2 monographs.

Qinglai Wei received the B.S. degree in Automation, and the Ph.D. degree in control theory and control engineering, from the Northeastern University, Shenyang, China, in 2002 and 2009, respectively. From 2009–2011, he was a postdoctoral fellow with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is currently a Professor of the institute. He has authored three books, and published over 60 international journal papers. His research interests include adaptive dynamic programming, neural-networks-based control, optimal control, nonlinear systems and their industrial applications. Dr. Wei is an Associate Editor of IEEE Transaction on Systems Man, and Cybernetics: Systems since 2016, Information Sciences since 2016, Neurocomputing since 2016, Optimal Control Applications and Methods since 2016, Acta Automatica Sinica since 2015, and has been holding the same position for IEEE Transactions on Neural Networks and Learning Systems during 2014–2015. He is the Secretary of IEEE Computational Intelligence Society (CIS) Beijing Chapter since 2015. He was Registration Chair of the 12th World Congress on Intelligent Control and Automation (WCICA2016), 2014 IEEE World Congress on Computational Intelligence (WCCI2014), the 2013 International Conference on Brain Inspired Cognitive Systems (BICS 2013), and the Eighth International Symposium on Neural Networks (ISNN 2011). He was the Publication Chair of 5th International Conference on Information Science and Technology (ICIST2015) and the Ninth International Symposium on Neural Networks (ISNN 2012). He was the Finance Chair of the 4th International Conference on Intelligent Control and Information Processing (ICICIP 2013) and the Publicity Chair of the 2012 International Conference on Brain Inspired Cognitive Systems (BICS 2012). He was guest editors for several international journals. He was a recipient of Shuang-Chuang Talents in Jiangsu Province, China, in 2014. He was a recipient of the Outstanding Paper Award of Acta Automatica Sinica in 2011 and Zhang Siying Outstanding Paper Award of Chinese Control and Decision Conference (CCDC) in 2015. He was a recipient of Young Researcher Award of Asia Pacific Neural Network Society (APNNS) in 2016.

Biao Song received the B.S. degree in electronic information engineering from Yanbian University, Yanbian, China, in 2011. He is currently a Ph.D. student with the School of Automation and Electrical Engineering, University of Science and Technology Beijing. His research interests include wireless sensor networks, adaptive dynamic programming.

View full text

Neural-network-based synchronous iteration learning method for multi-player zero-sum games

Abstract

Introduction

Section snippets

Motivations and preliminaries

Synchronous solution of multi-player ZS games

Simulation study

Conclusions

Acknowledgment

Neurocomputing

Neurocomputing

Automatica

Neurocomputing

IEEE Trans. Fuzzy Syst.

IEEE Trans. Cybern.

Neurocomputing

IEEE Trans. Syst., Man, Cybern., Part B: Cybern.

Automatica

Automatica

Automatica

IEEE Trans. Neural Netw. Learn. Syst.

Automatica

Int. J. Robust Nonlinear Control

Cooperative Stochastic Differential Games

Optimal Control, Third Edition

Off-policy integral reinforcement learning method to solve nonlinear continuous-time multi-player non-zero-sum games

IEEE Trans. Neural Netw. Learn. Syst.

Multiperson zero-sum differential games for a class of uncertain nonlinear systems

Int. J. Adapt. Control Signal Process.

Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems

Neurocomputing

Data-driven heuristic dynamic programming with virtual reality

Neurocomputing