Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

Yasini, Sholeh; Naghibi Sitani, Mohammad Bagher; Kirampor, Ali

doi:10.1007/s13042-014-0300-y

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

Original Article
Published: 15 October 2014

Volume 7, pages 967–980, (2016)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Sholeh Yasini¹,
Mohammad Bagher Naghibi Sitani¹ &
Ali Kirampor¹

1118 Accesses
33 Citations
Explore all metrics

An Erratum to this article was published on 18 February 2015

Abstract

This paper presents an online adaptive optimal control method based on reinforcement learning to solve the multi-agent nonzero-sum (NZS) differential games of nonlinear constrained-input continuous-time systems. A non-quadratic cost functional associated with each agent is employed to encode the saturation nonlinearity into the NZS game. The algorithm is implemented as a separate actor-critic neural network (NN) structure for every participant in the game, where adaptation of both NNs is performed simultaneously and continuously. The technique of concurrent learning is utilized to obtain novel update laws for the critic NN weights. That is, recorded data and current data are used concurrently for adaptation of the critic NN weights. This results in an algorithm where an easier and verifiable condition is sufficient for parameter convergence rather than the restrictive persistence of excitation (PE) condition. The stability of the closed-loop systems is guaranteed and the convergence to the Nash equilibrium solution of the game is shown. Simulation results show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Article 09 January 2022

Zhe Chen, Wenqian Xue, … Frank L. Lewis

Event-triggered Integral Reinforcement Learning for Nonzero-Sum Games

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Article 13 December 2018

Chaoxu Mu & Ke Wang

References

Shah V (1998) Power control for wireless data services based on utility and pricing. Dissertation, Rutgers University
Mukaidani H (2007) Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems. J Appl Math Comput 188(1):103–115
Article MathSciNet MATH Google Scholar
Isaacs R (1965) Differential Games. Wiley, New York
MATH Google Scholar
Starr A, Ho Y (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):148–206
Article MATH Google Scholar
Basar T, Olsder GJ (1998) Dynamic Noncooperative Game Theory, 2nd edn. SIAM, Philadelphia
Book MATH Google Scholar
Li T, Gajic Z (1994) Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games. New Trends Dynam Appl. Birkhäuser, Boston, pp 489–494
Google Scholar
Freiling G, Jank G, Abou-Kandil H (2002) On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Autom Control 41(2):264–269
Article MathSciNet MATH Google Scholar
Jungers M, De Pieri E, Abu-Kandil H (2007) Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach. Int J Tomography Stat 7:49–54
MathSciNet Google Scholar
Sutton R (1988) Learning to predictive by the method of temporal differences. Mach Learn 3(1):9–44
Google Scholar
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Article MathSciNet Google Scholar
Lewis FL, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control. IEEE Control Syst 32(6):76–105
Article MathSciNet Google Scholar
Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control. Multiscience Press, Brentwood
Google Scholar
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153
Article Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic Programming. Athena Scientific, MA
MATH Google Scholar
Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246
Article MATH Google Scholar
Vamvoudakis K, Lewis FL (2010) Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 46(5):878–888
Article MathSciNet MATH Google Scholar
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis FL, Dixon WD (2012) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Article MathSciNet MATH Google Scholar
Modares H, Lewis FL, Naghibi Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525
Article Google Scholar
Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360
Article MathSciNet MATH Google Scholar
Vamvoudakis K, Lewis FL (2010) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc. 49th IEEE CDC, pp 3040-3047
Modares H, Lewis FL, Naghibi Sistani MB (2014) Online solution of nonquadratic two-player zero-sum games arising in the H _∞ control of constrained input systems. Int J Adapt Cont Sig Proc 28(3–5):232–254
Article MathSciNet MATH Google Scholar
Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC, pp 142–147
Vrabie D, Lewis FL (2010) Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC, pp 3066–3071
Vamvoudakis K, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569
Article MathSciNet MATH Google Scholar
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 45(1):206–216
Article Google Scholar
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Article MathSciNet MATH Google Scholar
Abu-Khalaf M, Lewis FL, Huang J (2008) Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252
Article Google Scholar
Chowdhary GV (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Dissertation, Georgia Institute of Technology
Modares H, Lewis FL, Naghibi Sistani MB, Chowdhary GV, Yucelen T (2013) Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay. AIAA Guidance Navigation and Control Conference, Boston, Massachusetts
Modares H, Lewis FL, Naghibi Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Article MathSciNet MATH Google Scholar
Yasini S, Karimpour A, Naghibi Sistani MB, Modares H (2014) Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems. Int J Adapt Cont Sig Proc. doi:10.1002/acs.2485
MathSciNet MATH Google Scholar
Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, New York
Book MATH Google Scholar
Lyshevski SE (1998) Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc. IEEE ACC. pp 205–209
Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560
Article Google Scholar
Wang XZ, Li CG, Yeung DS, Song S, Feng H (2008) A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis. Neurocomputing 71(7–9):1515–1526
Article Google Scholar
Ghazikhani A, Monsefi R, Sadoghi Yazdi H (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cyber 5(1):51–62. doi:10.1007/s13042-013-0180-6
Article Google Scholar
Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cyber 4(3):217–233. doi:10.1007/s13042-012-0089-5
Article Google Scholar
Nevisitc V, Primbs JA (1996) Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech. Rep
Raja R, Karthik Raja U, Samidurai R, Leelamani A (2014) Dynamic analysis of discrete-time BAM neural networks with stochastic perturbations and impulses. Int J Mach Learn Cyber 5(1):39–50. doi:10.1007/s13042-013-0199-8
Article MATH Google Scholar
Hardy G, Littlewood J, Polya G (1998) Inequalities, 2nd edn. Cambridge University Press, Cambridge
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Ferdowsi University of Mashhad, Mashhad, 91775-1111, Iran
Sholeh Yasini, Mohammad Bagher Naghibi Sitani & Ali Kirampor

Authors

Sholeh Yasini
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Bagher Naghibi Sitani
View author publications
You can also search for this author in PubMed Google Scholar
Ali Kirampor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Bagher Naghibi Sitani.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s13042-015-0336-7.

Appendices

Appendix A

Proof of theorem 1

The convergence proof is based on Lyapunov analysis. We consider the following positive definite Lyapunov candidate

$$L = V_{1} (x) + V_{2} (x) + \overbrace {{\frac{1}{2}\tilde{W}_{1}^{T} a_{1}^{ - 1} \tilde{W}_{1} }}^{{L_{1} }} + \overbrace {{\frac{1}{2}\tilde{W}_{2}^{T} a_{2}^{ - 1} \tilde{W}_{2} }}^{{L_{2} }} + \frac{1}{2}\tilde{W}_{3}^{T} a_{3}^{ - 1} \tilde{W}_{3} + \frac{1}{2}\tilde{W}_{4}^{T} a_{4}^{ - 1} \tilde{W}_{4}$$

(A.1)

where $V_{1} (x)$, $V_{2} (x)$ are approximate solutions to the constrained coupled HJ Eq. 10. The derivative of the Lyapunov function is given by

$$\dot{L}(x) = \dot{V}_{1} (x) + \dot{V}_{2} (x) + \overbrace {{\tilde{W}_{1}^{T} a_{1}^{ - 1} \dot{\tilde{W}}_{1} }}^{{\dot{L}_{1} }} + \overbrace {{\tilde{W}_{2}^{T} a_{2}^{ - 1} \dot{\tilde{W}}}}^{{\dot{L}_{2} }}_{2} + \tilde{W}_{3}^{T} a_{3}^{ - 1} \dot{\tilde{W}}_{3} + \tilde{W}_{4}^{T} a_{4}^{ - 1} \dot{\tilde{W}}_{4}$$

(A.2)

The first term in (A.2) is

$$\begin{gathered} \dot{V}_{1} (x) = \nabla V_{1} \dot{x} = \left( {W_{1}^{T} \nabla \sigma_{1} + \nabla \varepsilon_{1}^{T} } \right)\,\left( {f + g_{1} \hat{u}_{1} + g_{2} \hat{u}_{2} } \right) \hfill \\ = W_{1}^{T} \nabla \sigma_{1} f - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) + \varepsilon^{\prime}_{1} (x) \hfill \\ \end{gathered}$$

(A.3)

where $\varepsilon^{\prime}_{1} (x) = \nabla \varepsilon_{1}^{T} (f - g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ))$.

Add and subtract the terms $W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (D_{1} )$ and $W_{1}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (D_{2} )$ to (A.3) reveals

$$\begin{gathered} \dot{V}_{1} (x) = W_{1}^{T} \varsigma_{1} (t) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right)\, + W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) + \varepsilon^{\prime}_{1} (x) \hfill \\ \end{gathered}$$

(A.4)

From the HJ Eq. 14 we have

$$W_{1}^{T} \varsigma_{1} (t) = - Q_{1} (x(t)) - M_{1} (u_{1} (t)) - M_{1} (u_{2} (t)) + \varepsilon_{CHJ1}$$

(A.5)

The terms $M_{1} (u_{1} (t))$ and $M_{1} (u_{2} (t))$ are obtained as follow by substituting (9) into (3)

$$M_{1} (u_{1} (t)) = \bar{u}_{1} W_{1}^{T} \nabla \sigma_{1} g_{1} \tanh (D_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} ))$$

(A.6)

$$M_{1} (u_{2} (t)) = \bar{u}_{2} W_{2}^{T} \nabla \sigma_{2} g_{2} R_{22}^{ - T} R_{12} \tanh (D_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{2} ))$$

(A.7)

where $\underline{{\mathbf{1}}}$ is a column vector having its all elements equal to one, $\bar{R}_{11} \in 1 \times \Re^{{m_{1} }}$ and $\bar{R}_{12} \in 1 \times \Re^{{m_{2} }}$ are row vectors having their elements equal to the elements of the main diagonal of $R_{11}$ and $R_{12}$, respectively. Substituting (A.5) into (A.4) gives

$$\begin{gathered} \dot{V}_{1} (x) = - Q_{1} (x(t)) - M_{1} (u_{1} (t)) - M_{1} (u_{2} (t))\, + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) + \varepsilon^{\prime}_{1} (x) + \varepsilon_{CHJ1} \hfill \\ \end{gathered}$$

(A.8)

According to Assumptions 1 and 2, one can easily show that

$$\left\| {\varepsilon^{\prime}_{1} (x)} \right\| \le b_{\varepsilon 1x} b_{f} \left\| x \right\| + b_{\varepsilon 1x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right).$$

(A.9)

Next, using (A.9) and the fact that $M_{1} (u_{1} (t))$ and $M_{1} (u_{2} (t))$ are positive definite, and noting that since $Q_{1} (x) \ge 0$, there exists a $q_{1}$ such that $x^{T} q_{1} x < Q_{1} (x)$ for $x \in \varOmega$, (A.8) becomes

$$\dot{V}_{1} < - x^{T} q_{1} x + b_{\varepsilon 1x} b_{f} \left\| x \right\| + 2\bar{u}_{1} b_{g1} b_{{\sigma_{1} x}} W_{1} + 2\bar{u}_{2} b_{g2} b_{\sigma 1x} W_{1} + b_{\varepsilon 1x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + \varepsilon_{h1m}$$

(A.10)

where $\varepsilon_{h1}$ is the bound for $\varepsilon_{CHJ1}$.

Denoting $k^{\prime}_{1} = b_{\varepsilon 1x} b_{f}$, $k^{\prime}_{2} = 2b_{\sigma 1x} W_{1} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + b_{\varepsilon 1x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + \varepsilon_{h1}$, (A.10) becomes

$$\dot{V}_{1} < - x^{T} q_{1} x + k^{\prime}_{1} \left\| x \right\| + k^{\prime}_{2}$$

(A.11)

Similarly, by noting that $x^{T} q_{2} x < Q_{2} (x)$ for $x \in \varOmega$, for the second term in (A.2) one obtains

$$\dot{V}_{2} < - x^{T} q_{2} x + k^{\prime\prime}_{1} \left\| x \right\| + k^{\prime\prime}_{2}$$

(A.12)

where $k^{\prime\prime}_{1} = b_{\varepsilon 2x} b_{f}$, $k^{\prime\prime}_{2} = 2b_{\sigma 2x} W_{2} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + b_{\varepsilon 2x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + \varepsilon_{h2}$, and $\varepsilon_{h2}$ is bound for $\varepsilon_{CHJ2}$.

Using (33) and the fact that $\dot{\tilde{W}}_{1} = - \dot{\hat{W}}_{1}$, the third term of (A.2) is obtained as

$$\dot{L}_{1} = \tilde{W}_{1}^{T} a_{1} a_{1}^{ - 1} \left( {\frac{{\hat{\varsigma }_{1} (t)}}{{\left( {\hat{\varsigma }_{1}^{T} (t)\hat{\varsigma }_{1} (t) + 1} \right)^{2} }}e_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\varsigma }_{1k} }}{{\left( {\hat{\varsigma }_{1k}^{T} \hat{\varsigma }_{1k} + 1} \right)^{2} }}e_{1} (t_{k} )} } \right)$$

(A.13)

where $\hat{\varsigma }_{1k} = \hat{\varsigma }_{1} (t_{k} )$. For $e_{1} (t)$ in (A.13) we have

$$e_{1} (t) = Q_{1} (x) + M_{1} (\hat{u}_{1} (t)) + M_{1} (\hat{u}_{2} (t)) + \hat{W}_{1}^{T} \hat{\varsigma }_{1} (t)$$

(A.14)

Adding zero from (17) to (A.14) gives

$$\begin{gathered} e_{1} (t) = M_{1} (\hat{u}_{2} (t)) - M_{1} (u_{2} (t)) + M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t)) - \tilde{W}_{1}^{T} \hat{\varsigma }_{1} (t) + \varepsilon_{CHJ1} \hfill \\ + W_{1}^{T} \nabla \sigma_{1} \left( {f - g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} )} \right) \hfill \\ - W_{1}^{T} \nabla \sigma_{1} \left( {f - g_{1} \bar{u}_{1} \tanh (D_{1} ) - g_{2} \bar{u}_{2} \tanh (D_{2} )} \right) \hfill \\ \end{gathered}$$

(A.15)

where $M_{1} (\hat{u}_{1} (t))$ and $M_{1} (\hat{u}_{2} (t))$ are obtained as follow by substituting (25) and (26) into (3)

$$M_{1} (\hat{u}_{1} (t)) = \bar{u}_{1} \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \tanh (\hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (\hat{D}_{1} ))$$

(A.16)

$$M_{1} (\hat{u}_{2} (t)) = \bar{u}_{2} \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} R_{22}^{ - T} R_{12} \tanh (\hat{D}_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (\hat{D}_{2} ))$$

(A.17)

Using (A.6) and (A.16), $M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t))$ is obtained as

$$\begin{gathered} M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t)) = \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (\hat{D}_{1} )) \hfill \\ - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (D_{1} ) - \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} )) \hfill \\ \end{gathered}$$

(A.18)

Next, the term $\ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} ))$ can be closely approximated as [30]

$$\ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} )) \approx - 2D_{1} \text{sgn} (D_{1} ) + \varepsilon_{{D_{1} }} \approx - 2D_{1} \tanh (\delta D_{1} ) + \bar{\varepsilon }_{{D_{1} }}$$

(A.19)

where $\delta$ is a big constant, $\bar{\varepsilon }_{{D_{1} }}$ is bounded approximation error, and $D_{1} = (1/2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} W_{1}$.

Using (A.19), and adding and subtracting $W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\delta \hat{D}_{1} )$ to (A.18), it becomes

$$\begin{gathered} M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t)) = \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \tilde{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\delta \hat{D}_{1} ) \hfill \\ - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (D_{1} ) - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta \hat{D}_{1} ) - \tanh (\delta D_{1} )} \right) \hfill \\ + \bar{u}_{1}^{2} \bar{R}_{11} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) \hfill \\ \end{gathered}$$

(A.20)

Likewise, $M_{1} (\hat{u}_{2} (t)) - M_{1} (u_{2} (t))$ is obtained as

$$\begin{gathered} M_{1} (\hat{u}_{2} (t)) - M_{1} (u_{2} (t)) = \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - 1} R_{12} \tanh (\hat{D}_{2} ) + \tilde{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (\delta \hat{D}_{2} ) \hfill \\ - W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\delta \hat{D}_{2} ) - \tanh (\delta D_{2} )} \right) \hfill \\ - W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (D_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) \hfill \\ \end{gathered}$$

(A.21)

Substituting (A.20) and (A.21) into (A.15) and doing some manipulations gives

$$e_{1} (t) = - \tilde{W}_{1}^{T} (t)\hat{\varsigma }_{1} (t) - \tilde{W}_{3}^{T} \Pi_{1} (t) - \tilde{W}_{4}^{T} \Pi_{2} (t) + \Xi_{1} (t)$$

(A.22)

where $\Pi_{1}$ and $\Pi_{2}$ are defined as (31) and (32), and bounded term $\Xi_{1} (t)$ is

$$\begin{gathered} \Xi_{1} (t) = W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta D_{1} ) - \tanh (\delta \hat{D}_{1} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\hat{D}_{2} ) - \tanh (D_{2} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + \varepsilon_{h1} + \bar{u}_{1}^{2} \bar{R}_{11} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{12} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) \hfill \\ \end{gathered}$$

(A.23)

Similarly, $e_{1} (t_{k} )$ in (A.13) is obtained as

$$e_{1} (t_{k} ) = - \tilde{W}_{1}^{T} \hat{\varsigma }_{1k} - \tilde{W}_{3}^{T} \Pi_{1} (t_{k} ) - \tilde{W}_{4}^{T} \Pi_{2} (t_{k} ) + \Xi_{1} (t_{k} ).$$

(A.24)

Substituting (A.22) and (A.24) into (A.13), one gets

$$\begin{gathered} \dot{L}_{1} = \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\left( { - \hat{\varsigma }_{1}^{T} (t)\tilde{W}_{1} - \Pi_{1}^{T} (t)\tilde{W}_{3} - \Pi_{2}^{T} (t)\tilde{W}_{4} + \Xi_{1} (t)} \right)} \right) \hfill \\ + \tilde{W}_{1}^{T} \left( {\sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}\left( { - \hat{\varsigma }_{1k}^{T} \tilde{W}_{1} - \Pi_{1}^{T} (t_{k} )\tilde{W}_{3} - \Pi_{2}^{T} (t_{k} )\tilde{W}_{4} + \Xi_{1} (t_{k} )} \right)} } \right) \hfill \\ \end{gathered}$$

(A.25)

where $\hat{\bar{\varsigma }}_{1} = \hat{\varsigma }_{1} (t)/(\hat{\varsigma }_{1}^{T} (t)\hat{\varsigma }_{1} (t) + 1)$, $\hat{\bar{\varsigma }}_{1k} \equiv \hat{\bar{\varsigma }}_{1} (t_{k} ) = \hat{\varsigma }_{1k} /(\hat{\varsigma }_{1k}^{T} \hat{\varsigma }_{1k} + 1)$, $s_{1} = \hat{\varsigma }_{1}^{T} (t)\hat{\varsigma }_{1} (t) + 1$, $s_{1k} \equiv s_{1} (t_{k} ) = \hat{\varsigma }_{1k}^{T} \hat{\varsigma }_{1k} + 1$.

Denoting ${\rm T}_{1k} = - \tilde{W}_{3}^{T} \Pi_{1} (t_{k} ) - \tilde{W}_{4}^{T} \Pi_{2} (t_{k} )$ and $\Xi_{1k} = \Xi_{1} (t_{k} )$, (A.25) becomes

$$\begin{gathered} \dot{L}_{1} = - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} (t) + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\tilde{W}_{1} - \tilde{W}_{4}^{T} \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\tilde{W}_{1} \hfill \\ \end{gathered}$$

(A.26)

Note that ${\rm T}_{1k}$ depends on the actor NN errors of the recorded past times. Now, using $\tilde{W}_{1} = W_{1} - \hat{W}_{1}$, the third term in (A.2) is obtained as

$$\begin{gathered} \dot{L}_{1} = - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \tilde{W}_{3}^{T} \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \tilde{W}_{4}^{T} \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \tilde{W}_{4}^{T} \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} \hfill \\ \end{gathered}$$

(A.27)

Similarly, the fourth term in (A.2) can be written as

$$\begin{gathered} \dot{L}_{2} = - \tilde{W}_{2}^{T} \left[ {\hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{2k} \hat{\varsigma }_{2k}^{T} } } \right]\tilde{W}_{2} + \tilde{W}_{2}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{2} }}{{s_{2} }}\Xi_{2} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{2k} }}{{s_{2k} }}} \left( {T_{2k} + \Xi_{2k} } \right)} \right) \hfill \\ - \tilde{W}_{4}^{T} \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} + \tilde{W}_{4}^{T} \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} - \tilde{W}_{3}^{T} \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} + \tilde{W}_{3}^{T} \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} \hfill \\ \end{gathered}$$

(A.28)

where $\Pi^{\prime}_{1}$ and $\Pi^{\prime}_{2}$ are defined as (33) and (34), $\hat{\bar{\varsigma }}_{2} = \hat{\varsigma }_{2} (t)/(\hat{\varsigma }_{2}^{T} (t)\hat{\varsigma }_{2} (t) + 1)$, $\hat{\bar{\varsigma }}_{2k} \equiv \hat{\bar{\varsigma }}_{2} (t_{k} ) = \hat{\varsigma }_{2k} /(\hat{\varsigma }_{2k}^{T} \hat{\varsigma }_{2k} + 1)$, $s_{2} = \hat{\varsigma }_{2}^{T} (t)\hat{\varsigma }_{2} (t) + 1$, $s_{2k} = \hat{\varsigma }_{2k}^{T} \hat{\varsigma }_{2k} + 1$, ${\rm T}_{2k} = - \tilde{W}_{3}^{T} \Pi^{\prime}_{1} (t_{k} ) - \tilde{W}_{4}^{T} \Pi^{\prime}_{2} (t_{k} )$, and the bounded term $\Xi_{2} (t)$ is

$$\begin{gathered} \Xi_{2} (t) = W_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right) + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\hat{D}_{1} ) - \tanh (D_{1} )} \right) + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + \varepsilon_{h2} + \bar{u}_{1}^{2} \bar{R}_{22} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{21} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) \hfill \\ \end{gathered}$$

(A.29)

Next, using (A.11)–(A.12) and (A.27)–(A.28), the derivative of Lyapunov function (A.2) becomes

$$\begin{gathered} \dot{L} < - x^{T} qx + k_{1} \left\| x \right\| + k_{2} \hfill \\ - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{2}^{T} \left[ {\hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{2k} \hat{\varsigma }_{2k}^{T} } } \right]\tilde{W}_{2} + \tilde{W}_{2}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{2} }}{{s_{2} }}\Xi_{2} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{2k} }}{{s_{2k} }}} \left( {T_{2k} + \Xi_{2k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} \left( {a_{3}^{ - 1} \dot{\hat{W}}_{3} - \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right) - \tilde{W}_{3}^{T} \left( {\Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ - \tilde{W}_{4}^{T} \left( {a_{4}^{ - 1} \dot{\hat{W}}_{4} - \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right) - \tilde{W}_{4}^{T} \left( {\Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ \end{gathered}$$

(A.30)

where $k_{1} = k^{\prime}_{1} + k^{\prime\prime}_{1}$, $k_{2} = k^{\prime}_{2} + k^{\prime\prime}_{2}$, $q = q_{1} + q_{2}$. Now, we define the actor NN tuning laws for the first and second agent as

$$\dot{\hat{W}}_{3} = - a_{3} \left( {\left( {B_{3} \hat{W}_{3} - B_{1} \hat{\bar{\varsigma }}_{1}^{T} \hat{W}_{1} } \right) - \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right)$$

(A.31)

$$\dot{\hat{W}}_{4} = - a_{4} \left( {\left( {B_{4} \hat{W}_{4} - B_{2} \hat{\bar{\varsigma }}_{2}^{T} \hat{W}_{2} } \right) - \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right)$$

(A.32)

These add to $\dot{L}$ the terms

$$\begin{gathered} \tilde{W}_{3}^{T} B_{3} W_{1} - \tilde{W}_{3}^{T} B_{3} \tilde{W}_{3} - \tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} \tilde{W}_{1} + \hfill \\ \tilde{W}_{4}^{T} B_{4} W_{2} - \tilde{W}_{4}^{T} B_{4} \tilde{W}_{4} - \tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} \tilde{W}_{2} \hfill \\ \end{gathered}$$

(A.33)

Using (A.33), and applying Young inequality [40] to the terms $\tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} \tilde{W}_{1}$, $\tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} \tilde{W}_{2}$, $\dot{L}$ becomes

$$\begin{gathered} \dot{L} < - x^{T} qx + k_{1} \left\| x \right\| + k_{2} \hfill \\ - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} B_{3} \tilde{W}_{3} + \tilde{W}_{3}^{T} B_{3} W_{1} - \tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \frac{1}{2}\tilde{W}_{3}^{T} B_{1} B_{1}^{T} \tilde{W}_{3} + \frac{1}{2}\tilde{W}_{1}^{T} \hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} \tilde{W}_{1} \hfill \\ - \tilde{W}_{2}^{T} \left[ {\hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{2k} \hat{\varsigma }_{2k}^{T} } } \right]\tilde{W}_{2} + \tilde{W}_{2}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{2} }}{{s_{2} }}\Xi_{2} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{2k} }}{{s_{2k} }}} \left( {T_{2k} + \Xi_{2k} } \right)} \right) \hfill \\ - \tilde{W}_{4}^{T} B_{4} \tilde{W}_{4} + \tilde{W}_{4}^{T} B_{4} W_{2} - \tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \frac{1}{2}\tilde{W}_{4}^{T} B_{2} B_{2}^{T} \tilde{W}_{4} + \frac{1}{2}\tilde{W}_{2}^{T} \hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} \tilde{W}_{2} \hfill \\ - \tilde{W}_{3}^{T} \left( {\Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) - \tilde{W}_{4}^{T} \left( {\Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ \end{gathered}$$

(A.34)

Denoting $N_{i} = \hat{\bar{\varsigma }}_{i} \hat{\bar{\varsigma }}_{i}^{T} + 2\sum\nolimits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{ik} \hat{\varsigma }_{ik}^{T} }$, $\varGamma_{i} = \frac{{\hat{\bar{\varsigma }}_{i} }}{{s_{i} }}\Xi_{i} (t) + \sum\nolimits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{ik} }}{{s_{ik} }}} \left( {T_{ik} + \Xi_{ik} } \right)$, $i = 1,\,2$. If Condition 1 is satisfied, then $N_{i}$ is positive definite and thus $\dot{L}$ can be written as

$$\begin{gathered} \dot{L} < - x^{T} qx + k_{1} \left\| x \right\| + k_{2} \hfill \\ - 0.5\lambda_{\hbox{min} } (N_{1} )\tilde{W}_{1}^{T} \tilde{W}_{1} + \tilde{W}_{1}^{T} \varGamma_{1} - 0.5\lambda_{\hbox{min} } (N_{2} )\tilde{W}_{2}^{T} \tilde{W}_{2} + \tilde{W}_{2}^{T} \varGamma_{2} \hfill \\ - \tilde{W}_{3}^{T} \left( {B_{3} - \frac{1}{2}B_{1} B_{1}^{T} } \right)\tilde{W}_{3} + \tilde{W}_{3}^{T} \left( {B_{3} W_{1} + B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \Pi_{1} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ - \tilde{W}_{4}^{T} \left( {B_{4} - \frac{1}{2}B_{2} B_{2}^{T} } \right)\tilde{W}_{4} + \tilde{W}_{4}^{T} \left( {B_{4} W_{2} + B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \Pi_{2} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ \end{gathered}$$

(A.35)

where $\lambda_{\hbox{min} } (N_{i} )$, $i = 1,\,2$ is the minimum eigenvalue of $N_{i}$, $i = 1,\,2$. Define $c = B_{3} - \frac{1}{2}B_{1} B_{1}^{T}$, $d = B_{4} - \frac{1}{2}B_{2} B_{2}^{T}$

If we choose the design parameters $B_{1} ,\,B_{2} ,\,B_{3} ,\,B_{4}$ such that $c > 0$ and $d > 0$, then the derivative of the Lyapunov function is less than zero if

$$\left\| x \right\| > \frac{{k_{1} }}{{2\lambda_{\hbox{min} } (q)}} + \sqrt {\frac{{k_{1}^{2} }}{{4\lambda_{\hbox{min} }^{2} (q)}} + \frac{{k_{2} }}{{\lambda_{\hbox{min} } (q)}}}$$

(A.36)

$$\left\| {\tilde{W}_{1} } \right\| > \frac{{2\varGamma_{1} }}{{\lambda_{\hbox{min} } (N_{1} )}}$$

(A.37)

$$\left\| {\tilde{W}_{2} } \right\| > \frac{{2\varGamma_{2} }}{{\lambda_{\hbox{min} } (N_{2} )}}$$

(A.38)

$$\left\| {\tilde{W}_{3} } \right\| > \frac{{B_{3} W_{1} + B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \Pi_{1} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} }}{c}$$

(A.39)

$$\left\| {\tilde{W}_{4} } \right\| > \frac{{B_{4} W_{2} + B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \Pi_{2} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} }}{d}$$

(A.40)

Thus, using standard Lyapunov theory, all the critic and actor NN weight estimation errors are UUB, and the systems states are guaranteed to never leave their initial compact set.

This completes the proof. $\square$

Appendix B

Proof of Theorem 2

a Consider all the UUB weight errors in Theorem 2. The approximate constrained coupled HJ equations are

$$\begin{gathered} H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = Q_{1} (x) + \hat{W}_{1}^{T} \nabla \sigma_{1} f - \hat{W}_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - \hat{W}_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\delta \hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \bar{\varepsilon }_{{\hat{D}_{1} }} \hfill \\ + \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (\hat{D}_{2} ) - \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (\delta \hat{D}_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{\hat{D}_{2} }} \hfill \\ \end{gathered}$$

(B.1)

$$\begin{gathered} H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = Q_{2} (x) + \hat{W}_{2}^{T} \nabla \sigma_{2} f - \hat{W}_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - \hat{W}_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \tanh (\hat{D}_{1} ) - \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \tanh (\delta \hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{21} \bar{\varepsilon }_{{\hat{D}_{1} }} \hfill \\ + \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) - \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\delta \hat{D}_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{\hat{D}_{2} }} \hfill \\ \end{gathered}$$

(B.2)

After adding zero from HJ equations in (17) and (18) and using the fact that $\tilde{W}_{1} = W_{1} - \hat{W}_{1}$, $\tilde{W}_{2} = W_{2} - \hat{W}_{2}$, $\tilde{W}_{3} = W_{1} - \hat{W}_{3}$, $\tilde{W}_{4} = W_{2} - \hat{W}_{4}$, one has

$$\begin{gathered} H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = - \tilde{W}_{1}^{T} \nabla \sigma_{1} f + \tilde{W}_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \tilde{W}_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \tilde{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta \hat{D}_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + \tilde{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\delta \hat{D}_{2} ) - \tanh (\hat{D}_{2} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta D_{1} ) - \tanh (\delta \hat{D}_{1} )} \right)\, \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\hat{D}_{2} - \tanh (D_{2} )} \right) \hfill \\ + \bar{u}_{1}^{2} \bar{R}_{11} \bar{\varepsilon }_{{\hat{D}_{1} }} + \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{u}_{1}^{2} \bar{R}_{11} \bar{\varepsilon }_{{D_{1} }} - \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{D_{2} }} + \varepsilon_{CHJ1} \hfill \\ \end{gathered}$$

(B.3)

$$\begin{gathered} H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = - \tilde{W}_{2}^{T} \nabla \sigma_{2} f + \tilde{W}_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \tilde{W}_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \tilde{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta \hat{D}_{2} ) - \tanh (\hat{D}_{2} )} \right) \hfill \\ + \tilde{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\delta \hat{D}_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\hat{D}_{1} ) - \tanh (D_{1} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\delta D_{1} ) - \tanh (\delta \hat{D}_{1} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + \bar{u}_{1}^{2} \bar{R}_{21} \bar{\varepsilon }_{{\hat{D}_{1} }} + \bar{u}_{2}^{2} \bar{R}_{22} \bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{u}_{1}^{2} \bar{R}_{21} \bar{\varepsilon }_{{D_{1} }} - \bar{u}_{2}^{2} \bar{R}_{22} \bar{\varepsilon }_{{D_{2} }} + \varepsilon_{CHJ2} \hfill \\ \end{gathered}$$

(B.4)

Now using Assumptions 1 and 2, taking norms in (B.3) and (B.4) reveals

$$\begin{gathered} \left\| {H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\| \le b_{f} b_{\sigma 1x} \left\| x \right\|\left\| {\tilde{W}_{1}^{T} } \right\| + b_{\sigma 1x} \left( {b_{g1} \bar{u}_{1} + b_{g2} \bar{u}_{2} } \right)\left\| {\tilde{W}_{1}^{T} } \right\| + b_{\sigma 1x} b_{g1} \bar{u}_{1} \left\| {\tilde{W}_{3}^{T} } \right\| \hfill \\ + b_{\sigma 2x} b_{g2} \bar{u}_{2} \lambda_{\hbox{max} } \left( {R_{22}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{12} } \right)\left\| {\tilde{W}_{4}^{T} } \right\| \hfill \\ + b_{\sigma 1x} b_{g2} \bar{u}_{2} W_{1m}^{T} + b_{\sigma 1x} b_{g1} \bar{u}_{1} W_{1m}^{T} + 2b_{\sigma 2x} b_{g2} \bar{u}_{2} \lambda_{\hbox{max} } \left( {R_{22}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{12} } \right)W_{2m}^{T} + \left\| {\varepsilon_{1} } \right\| \hfill \\ \end{gathered}$$

(B.5)

$$\begin{gathered} \left\| {H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\| \le b_{f} b_{\sigma 2x} \left\| x \right\|\left\| {\tilde{W}_{2}^{T} } \right\| + b_{\sigma 2x} b_{g1} \bar{u}_{1} \left\| {\tilde{W}_{2}^{T} } \right\| + b_{\sigma 2x} b_{g2} \bar{u}_{2} \left\| {\tilde{W}_{2}^{T} } \right\| \hfill \\ + b_{\sigma 2x} b_{g2} \bar{u}_{2} \left\| {\tilde{W}_{4}^{T} } \right\| + 2b_{\sigma 1x} b_{g1} \bar{u}_{1} \lambda_{\hbox{max} } \left( {R_{11}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{21} } \right)\left\| {\tilde{W}_{3}^{T} } \right\| \hfill \\ + 2b_{\sigma 1x} b_{g1} \bar{u}_{1} \lambda_{\hbox{max} } \left( {R_{11}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{21} } \right)W_{1m}^{T} + b_{\sigma 2x} b_{g1} \bar{u}_{1} W_{2m}^{T} + b_{\sigma 2x} b_{g2} \bar{u}_{2} W_{2m}^{T} + \left\| {\varepsilon_{2} } \right\| \hfill \\ \end{gathered}$$

(B.6)

where

$$\begin{gathered} \left\| {\varepsilon_{1} } \right\| \le \bar{u}_{1}^{2} \bar{R}_{11} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{12} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) + \varepsilon_{h1} \hfill \\ \left\| {\varepsilon_{2} } \right\| \le \bar{u}_{1}^{2} \bar{R}_{21} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{22} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) + \varepsilon_{h2}, \hfill \\ \end{gathered}$$

and $\varepsilon_{h1}$, $\varepsilon_{h2}$ are bounds for $\varepsilon_{CHJ1}$, $\varepsilon_{CHJ2}$, respectively. All the signals on the right hand side of (B.5) and (B.6) are UUB. Therefore, $\left\| {H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\|$ and $\left\| {H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\|$ are UUB and convergence to the approximate coupled HJ solutions is obtained.

b. Consider $\hat{u}_{1}$ and $\hat{u}_{2}$ in (25) and (26). Then one has

$$\begin{gathered} \left\| {u_{1} - \hat{u}_{1} } \right\| = \bar{u}_{1} \left\| { - \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} W_{1} } \right) + \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} \hat{W}_{3} } \right)} \right\| \hfill \\ \le \bar{u}_{1} \left\| { - \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} W_{1} } \right) + \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} \left( {W_{1} - \tilde{W}_{3} } \right)} \right)} \right\| \hfill \\ \end{gathered}$$

(B.7)

$$\begin{gathered} \left\| {u_{2} - \hat{u}_{2} } \right\| = \bar{u}_{2} \left\| { - \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} W_{2} } \right) + \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} \hat{W}_{4} } \right)} \right\| \hfill \\ \le \bar{u}_{1} \left\| { - \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} W_{2} } \right) + \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} \left( {W_{2} - \tilde{W}_{4} } \right)} \right)} \right\| \hfill \\ \end{gathered}$$

(B.8)

Hence, $\left\| {u_{1} - \hat{u}_{1} } \right\|$ and $\left\| {u_{2} - \hat{u}_{2} } \right\|$ are UUB. Therefore, the pair $\left( {\hat{u}_{1} ,\hat{u}_{2} } \right)$ gives the approximate Nash equilibrium solution of the game and this completes the proof.$\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yasini, S., Naghibi Sitani, M.B. & Kirampor, A. Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems. Int. J. Mach. Learn. & Cyber. 7, 967–980 (2016). https://doi.org/10.1007/s13042-014-0300-y

Download citation

Received: 12 February 2014
Accepted: 18 September 2014
Published: 15 October 2014
Issue Date: December 2016
DOI: https://doi.org/10.1007/s13042-014-0300-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

Abstract

Access this article

Similar content being viewed by others

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Event-triggered Integral Reinforcement Learning for Nonzero-Sum Games

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A

Proof of theorem 1

Appendix B

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation