Skip to main content
Log in

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

An Erratum to this article was published on 18 February 2015

Abstract

This paper presents an online adaptive optimal control method based on reinforcement learning to solve the multi-agent nonzero-sum (NZS) differential games of nonlinear constrained-input continuous-time systems. A non-quadratic cost functional associated with each agent is employed to encode the saturation nonlinearity into the NZS game. The algorithm is implemented as a separate actor-critic neural network (NN) structure for every participant in the game, where adaptation of both NNs is performed simultaneously and continuously. The technique of concurrent learning is utilized to obtain novel update laws for the critic NN weights. That is, recorded data and current data are used concurrently for adaptation of the critic NN weights. This results in an algorithm where an easier and verifiable condition is sufficient for parameter convergence rather than the restrictive persistence of excitation (PE) condition. The stability of the closed-loop systems is guaranteed and the convergence to the Nash equilibrium solution of the game is shown. Simulation results show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Shah V (1998) Power control for wireless data services based on utility and pricing. Dissertation, Rutgers University

  2. Mukaidani H (2007) Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems. J Appl Math Comput 188(1):103–115

    Article  MathSciNet  MATH  Google Scholar 

  3. Isaacs R (1965) Differential Games. Wiley, New York

    MATH  Google Scholar 

  4. Starr A, Ho Y (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):148–206

    Article  MATH  Google Scholar 

  5. Basar T, Olsder GJ (1998) Dynamic Noncooperative Game Theory, 2nd edn. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  6. Li T, Gajic Z (1994) Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games. New Trends Dynam Appl. Birkhäuser, Boston, pp 489–494

    Google Scholar 

  7. Freiling G, Jank G, Abou-Kandil H (2002) On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Autom Control 41(2):264–269

    Article  MathSciNet  MATH  Google Scholar 

  8. Jungers M, De Pieri E, Abu-Kandil H (2007) Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach. Int J Tomography Stat 7:49–54

    MathSciNet  Google Scholar 

  9. Sutton R (1988) Learning to predictive by the method of temporal differences. Mach Learn 3(1):9–44

    Google Scholar 

  10. Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50

    Article  MathSciNet  Google Scholar 

  11. Lewis FL, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control. IEEE Control Syst 32(6):76–105

    Article  MathSciNet  Google Scholar 

  12. Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control. Multiscience Press, Brentwood

    Google Scholar 

  13. Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153

    Article  Google Scholar 

  14. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic Programming. Athena Scientific, MA

    MATH  Google Scholar 

  15. Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246

    Article  MATH  Google Scholar 

  16. Vamvoudakis K, Lewis FL (2010) Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 46(5):878–888

    Article  MathSciNet  MATH  Google Scholar 

  17. Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis FL, Dixon WD (2012) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92

    Article  MathSciNet  MATH  Google Scholar 

  18. Modares H, Lewis FL, Naghibi Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525

    Article  Google Scholar 

  19. Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360

    Article  MathSciNet  MATH  Google Scholar 

  20. Vamvoudakis K, Lewis FL (2010) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc. 49th IEEE CDC, pp 3040-3047

  21. Modares H, Lewis FL, Naghibi Sistani MB (2014) Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems. Int J Adapt Cont Sig Proc 28(3–5):232–254

    Article  MathSciNet  MATH  Google Scholar 

  22. Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC, pp 142–147

  23. Vrabie D, Lewis FL (2010) Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC, pp 3066–3071

  24. Vamvoudakis K, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 45(1):206–216

    Article  Google Scholar 

  26. Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791

    Article  MathSciNet  MATH  Google Scholar 

  27. Abu-Khalaf M, Lewis FL, Huang J (2008) Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252

    Article  Google Scholar 

  28. Chowdhary GV (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Dissertation, Georgia Institute of Technology

  29. Modares H, Lewis FL, Naghibi Sistani MB, Chowdhary GV, Yucelen T (2013) Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay. AIAA Guidance Navigation and Control Conference, Boston, Massachusetts

  30. Modares H, Lewis FL, Naghibi Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202

    Article  MathSciNet  MATH  Google Scholar 

  31. Yasini S, Karimpour A, Naghibi Sistani MB, Modares H (2014) Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems. Int J Adapt Cont Sig Proc. doi:10.1002/acs.2485

    MathSciNet  MATH  Google Scholar 

  32. Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, New York

    Book  MATH  Google Scholar 

  33. Lyshevski SE (1998) Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc. IEEE ACC. pp 205–209

  34. Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560

    Article  Google Scholar 

  35. Wang XZ, Li CG, Yeung DS, Song S, Feng H (2008) A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis. Neurocomputing 71(7–9):1515–1526

    Article  Google Scholar 

  36. Ghazikhani A, Monsefi R, Sadoghi Yazdi H (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cyber 5(1):51–62. doi:10.1007/s13042-013-0180-6

    Article  Google Scholar 

  37. Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cyber 4(3):217–233. doi:10.1007/s13042-012-0089-5

    Article  Google Scholar 

  38. Nevisitc V, Primbs JA (1996) Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech. Rep

  39. Raja R, Karthik Raja U, Samidurai R, Leelamani A (2014) Dynamic analysis of discrete-time BAM neural networks with stochastic perturbations and impulses. Int J Mach Learn Cyber 5(1):39–50. doi:10.1007/s13042-013-0199-8

    Article  MATH  Google Scholar 

  40. Hardy G, Littlewood J, Polya G (1998) Inequalities, 2nd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Bagher Naghibi Sitani.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s13042-015-0336-7.

Appendices

Appendix A

Proof of theorem 1

The convergence proof is based on Lyapunov analysis. We consider the following positive definite Lyapunov candidate

$$L = V_{1} (x) + V_{2} (x) + \overbrace {{\frac{1}{2}\tilde{W}_{1}^{T} a_{1}^{ - 1} \tilde{W}_{1} }}^{{L_{1} }} + \overbrace {{\frac{1}{2}\tilde{W}_{2}^{T} a_{2}^{ - 1} \tilde{W}_{2} }}^{{L_{2} }} + \frac{1}{2}\tilde{W}_{3}^{T} a_{3}^{ - 1} \tilde{W}_{3} + \frac{1}{2}\tilde{W}_{4}^{T} a_{4}^{ - 1} \tilde{W}_{4}$$
(A.1)

where \(V_{1} (x)\), \(V_{2} (x)\) are approximate solutions to the constrained coupled HJ Eq. 10. The derivative of the Lyapunov function is given by

$$\dot{L}(x) = \dot{V}_{1} (x) + \dot{V}_{2} (x) + \overbrace {{\tilde{W}_{1}^{T} a_{1}^{ - 1} \dot{\tilde{W}}_{1} }}^{{\dot{L}_{1} }} + \overbrace {{\tilde{W}_{2}^{T} a_{2}^{ - 1} \dot{\tilde{W}}}}^{{\dot{L}_{2} }}_{2} + \tilde{W}_{3}^{T} a_{3}^{ - 1} \dot{\tilde{W}}_{3} + \tilde{W}_{4}^{T} a_{4}^{ - 1} \dot{\tilde{W}}_{4}$$
(A.2)

The first term in (A.2) is

$$\begin{gathered} \dot{V}_{1} (x) = \nabla V_{1} \dot{x} = \left( {W_{1}^{T} \nabla \sigma_{1} + \nabla \varepsilon_{1}^{T} } \right)\,\left( {f + g_{1} \hat{u}_{1} + g_{2} \hat{u}_{2} } \right) \hfill \\ = W_{1}^{T} \nabla \sigma_{1} f - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) + \varepsilon^{\prime}_{1} (x) \hfill \\ \end{gathered}$$
(A.3)

where \(\varepsilon^{\prime}_{1} (x) = \nabla \varepsilon_{1}^{T} (f - g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ))\).

Add and subtract the terms \(W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (D_{1} )\) and \(W_{1}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (D_{2} )\) to (A.3) reveals

$$\begin{gathered} \dot{V}_{1} (x) = W_{1}^{T} \varsigma_{1} (t) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right)\, + W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) + \varepsilon^{\prime}_{1} (x) \hfill \\ \end{gathered}$$
(A.4)

From the HJ Eq. 14 we have

$$W_{1}^{T} \varsigma_{1} (t) = - Q_{1} (x(t)) - M_{1} (u_{1} (t)) - M_{1} (u_{2} (t)) + \varepsilon_{CHJ1}$$
(A.5)

The terms \(M_{1} (u_{1} (t))\) and \(M_{1} (u_{2} (t))\) are obtained as follow by substituting (9) into (3)

$$M_{1} (u_{1} (t)) = \bar{u}_{1} W_{1}^{T} \nabla \sigma_{1} g_{1} \tanh (D_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} ))$$
(A.6)
$$M_{1} (u_{2} (t)) = \bar{u}_{2} W_{2}^{T} \nabla \sigma_{2} g_{2} R_{22}^{ - T} R_{12} \tanh (D_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{2} ))$$
(A.7)

where \(\underline{{\mathbf{1}}}\) is a column vector having its all elements equal to one, \(\bar{R}_{11} \in 1 \times \Re^{{m_{1} }}\) and \(\bar{R}_{12} \in 1 \times \Re^{{m_{2} }}\) are row vectors having their elements equal to the elements of the main diagonal of \(R_{11}\) and \(R_{12}\), respectively. Substituting (A.5) into (A.4) gives

$$\begin{gathered} \dot{V}_{1} (x) = - Q_{1} (x(t)) - M_{1} (u_{1} (t)) - M_{1} (u_{2} (t))\, + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) + \varepsilon^{\prime}_{1} (x) + \varepsilon_{CHJ1} \hfill \\ \end{gathered}$$
(A.8)

According to Assumptions 1 and 2, one can easily show that

$$\left\| {\varepsilon^{\prime}_{1} (x)} \right\| \le b_{\varepsilon 1x} b_{f} \left\| x \right\| + b_{\varepsilon 1x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right).$$
(A.9)

Next, using (A.9) and the fact that \(M_{1} (u_{1} (t))\) and \(M_{1} (u_{2} (t))\) are positive definite, and noting that since \(Q_{1} (x) \ge 0\), there exists a \(q_{1}\) such that \(x^{T} q_{1} x < Q_{1} (x)\) for \(x \in \varOmega\), (A.8) becomes

$$\dot{V}_{1} < - x^{T} q_{1} x + b_{\varepsilon 1x} b_{f} \left\| x \right\| + 2\bar{u}_{1} b_{g1} b_{{\sigma_{1} x}} W_{1} + 2\bar{u}_{2} b_{g2} b_{\sigma 1x} W_{1} + b_{\varepsilon 1x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + \varepsilon_{h1m}$$
(A.10)

where \(\varepsilon_{h1}\) is the bound for \(\varepsilon_{CHJ1}\).

Denoting \(k^{\prime}_{1} = b_{\varepsilon 1x} b_{f}\), \(k^{\prime}_{2} = 2b_{\sigma 1x} W_{1} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + b_{\varepsilon 1x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + \varepsilon_{h1}\), (A.10) becomes

$$\dot{V}_{1} < - x^{T} q_{1} x + k^{\prime}_{1} \left\| x \right\| + k^{\prime}_{2}$$
(A.11)

Similarly, by noting that \(x^{T} q_{2} x < Q_{2} (x)\) for \(x \in \varOmega\), for the second term in (A.2) one obtains

$$\dot{V}_{2} < - x^{T} q_{2} x + k^{\prime\prime}_{1} \left\| x \right\| + k^{\prime\prime}_{2}$$
(A.12)

where \(k^{\prime\prime}_{1} = b_{\varepsilon 2x} b_{f}\), \(k^{\prime\prime}_{2} = 2b_{\sigma 2x} W_{2} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + b_{\varepsilon 2x} \left( {\bar{u}_{1} b_{g1} + \bar{u}_{2} b_{g2} } \right) + \varepsilon_{h2}\), and \(\varepsilon_{h2}\) is bound for \(\varepsilon_{CHJ2}\).

Using (33) and the fact that \(\dot{\tilde{W}}_{1} = - \dot{\hat{W}}_{1}\), the third term of (A.2) is obtained as

$$\dot{L}_{1} = \tilde{W}_{1}^{T} a_{1} a_{1}^{ - 1} \left( {\frac{{\hat{\varsigma }_{1} (t)}}{{\left( {\hat{\varsigma }_{1}^{T} (t)\hat{\varsigma }_{1} (t) + 1} \right)^{2} }}e_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\varsigma }_{1k} }}{{\left( {\hat{\varsigma }_{1k}^{T} \hat{\varsigma }_{1k} + 1} \right)^{2} }}e_{1} (t_{k} )} } \right)$$
(A.13)

where \(\hat{\varsigma }_{1k} = \hat{\varsigma }_{1} (t_{k} )\). For \(e_{1} (t)\) in (A.13) we have

$$e_{1} (t) = Q_{1} (x) + M_{1} (\hat{u}_{1} (t)) + M_{1} (\hat{u}_{2} (t)) + \hat{W}_{1}^{T} \hat{\varsigma }_{1} (t)$$
(A.14)

Adding zero from (17) to (A.14) gives

$$\begin{gathered} e_{1} (t) = M_{1} (\hat{u}_{2} (t)) - M_{1} (u_{2} (t)) + M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t)) - \tilde{W}_{1}^{T} \hat{\varsigma }_{1} (t) + \varepsilon_{CHJ1} \hfill \\ + W_{1}^{T} \nabla \sigma_{1} \left( {f - g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} )} \right) \hfill \\ - W_{1}^{T} \nabla \sigma_{1} \left( {f - g_{1} \bar{u}_{1} \tanh (D_{1} ) - g_{2} \bar{u}_{2} \tanh (D_{2} )} \right) \hfill \\ \end{gathered}$$
(A.15)

where \(M_{1} (\hat{u}_{1} (t))\) and \(M_{1} (\hat{u}_{2} (t))\) are obtained as follow by substituting (25) and (26) into (3)

$$M_{1} (\hat{u}_{1} (t)) = \bar{u}_{1} \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \tanh (\hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (\hat{D}_{1} ))$$
(A.16)
$$M_{1} (\hat{u}_{2} (t)) = \bar{u}_{2} \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} R_{22}^{ - T} R_{12} \tanh (\hat{D}_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (\hat{D}_{2} ))$$
(A.17)

Using (A.6) and (A.16), \(M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t))\) is obtained as

$$\begin{gathered} M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t)) = \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (\hat{D}_{1} )) \hfill \\ - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (D_{1} ) - \bar{u}_{1}^{2} \bar{R}_{11} \ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} )) \hfill \\ \end{gathered}$$
(A.18)

Next, the term \(\ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} ))\) can be closely approximated as [30]

$$\ln (\underline{{\mathbf{1}}} - \tanh^{2} (D_{1} )) \approx - 2D_{1} \text{sgn} (D_{1} ) + \varepsilon_{{D_{1} }} \approx - 2D_{1} \tanh (\delta D_{1} ) + \bar{\varepsilon }_{{D_{1} }}$$
(A.19)

where \(\delta\) is a big constant, \(\bar{\varepsilon }_{{D_{1} }}\) is bounded approximation error, and \(D_{1} = (1/2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} W_{1}\).

Using (A.19), and adding and subtracting \(W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\delta \hat{D}_{1} )\) to (A.18), it becomes

$$\begin{gathered} M_{1} (\hat{u}_{1} (t)) - M_{1} (u_{1} (t)) = \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \tilde{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\delta \hat{D}_{1} ) \hfill \\ - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (D_{1} ) - W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta \hat{D}_{1} ) - \tanh (\delta D_{1} )} \right) \hfill \\ + \bar{u}_{1}^{2} \bar{R}_{11} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) \hfill \\ \end{gathered}$$
(A.20)

Likewise, \(M_{1} (\hat{u}_{2} (t)) - M_{1} (u_{2} (t))\) is obtained as

$$\begin{gathered} M_{1} (\hat{u}_{2} (t)) - M_{1} (u_{2} (t)) = \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - 1} R_{12} \tanh (\hat{D}_{2} ) + \tilde{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (\delta \hat{D}_{2} ) \hfill \\ - W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\delta \hat{D}_{2} ) - \tanh (\delta D_{2} )} \right) \hfill \\ - W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (D_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) \hfill \\ \end{gathered}$$
(A.21)

Substituting (A.20) and (A.21) into (A.15) and doing some manipulations gives

$$e_{1} (t) = - \tilde{W}_{1}^{T} (t)\hat{\varsigma }_{1} (t) - \tilde{W}_{3}^{T} \Pi_{1} (t) - \tilde{W}_{4}^{T} \Pi_{2} (t) + \Xi_{1} (t)$$
(A.22)

where \(\Pi_{1}\) and \(\Pi_{2}\) are defined as (31) and (32), and bounded term \(\Xi_{1} (t)\) is

$$\begin{gathered} \Xi_{1} (t) = W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta D_{1} ) - \tanh (\delta \hat{D}_{1} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\hat{D}_{2} ) - \tanh (D_{2} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + \varepsilon_{h1} + \bar{u}_{1}^{2} \bar{R}_{11} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{12} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) \hfill \\ \end{gathered}$$
(A.23)

Similarly, \(e_{1} (t_{k} )\) in (A.13) is obtained as

$$e_{1} (t_{k} ) = - \tilde{W}_{1}^{T} \hat{\varsigma }_{1k} - \tilde{W}_{3}^{T} \Pi_{1} (t_{k} ) - \tilde{W}_{4}^{T} \Pi_{2} (t_{k} ) + \Xi_{1} (t_{k} ).$$
(A.24)

Substituting (A.22) and (A.24) into (A.13), one gets

$$\begin{gathered} \dot{L}_{1} = \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\left( { - \hat{\varsigma }_{1}^{T} (t)\tilde{W}_{1} - \Pi_{1}^{T} (t)\tilde{W}_{3} - \Pi_{2}^{T} (t)\tilde{W}_{4} + \Xi_{1} (t)} \right)} \right) \hfill \\ + \tilde{W}_{1}^{T} \left( {\sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}\left( { - \hat{\varsigma }_{1k}^{T} \tilde{W}_{1} - \Pi_{1}^{T} (t_{k} )\tilde{W}_{3} - \Pi_{2}^{T} (t_{k} )\tilde{W}_{4} + \Xi_{1} (t_{k} )} \right)} } \right) \hfill \\ \end{gathered}$$
(A.25)

where \(\hat{\bar{\varsigma }}_{1} = \hat{\varsigma }_{1} (t)/(\hat{\varsigma }_{1}^{T} (t)\hat{\varsigma }_{1} (t) + 1)\), \(\hat{\bar{\varsigma }}_{1k} \equiv \hat{\bar{\varsigma }}_{1} (t_{k} ) = \hat{\varsigma }_{1k} /(\hat{\varsigma }_{1k}^{T} \hat{\varsigma }_{1k} + 1)\), \(s_{1} = \hat{\varsigma }_{1}^{T} (t)\hat{\varsigma }_{1} (t) + 1\), \(s_{1k} \equiv s_{1} (t_{k} ) = \hat{\varsigma }_{1k}^{T} \hat{\varsigma }_{1k} + 1\).

Denoting \({\rm T}_{1k} = - \tilde{W}_{3}^{T} \Pi_{1} (t_{k} ) - \tilde{W}_{4}^{T} \Pi_{2} (t_{k} )\) and \(\Xi_{1k} = \Xi_{1} (t_{k} )\), (A.25) becomes

$$\begin{gathered} \dot{L}_{1} = - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} (t) + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\tilde{W}_{1} - \tilde{W}_{4}^{T} \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\tilde{W}_{1} \hfill \\ \end{gathered}$$
(A.26)

Note that \({\rm T}_{1k}\) depends on the actor NN errors of the recorded past times. Now, using \(\tilde{W}_{1} = W_{1} - \hat{W}_{1}\), the third term in (A.2) is obtained as

$$\begin{gathered} \dot{L}_{1} = - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \tilde{W}_{3}^{T} \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \tilde{W}_{4}^{T} \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \tilde{W}_{4}^{T} \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} \hfill \\ \end{gathered}$$
(A.27)

Similarly, the fourth term in (A.2) can be written as

$$\begin{gathered} \dot{L}_{2} = - \tilde{W}_{2}^{T} \left[ {\hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{2k} \hat{\varsigma }_{2k}^{T} } } \right]\tilde{W}_{2} + \tilde{W}_{2}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{2} }}{{s_{2} }}\Xi_{2} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{2k} }}{{s_{2k} }}} \left( {T_{2k} + \Xi_{2k} } \right)} \right) \hfill \\ - \tilde{W}_{4}^{T} \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} + \tilde{W}_{4}^{T} \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} - \tilde{W}_{3}^{T} \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} + \tilde{W}_{3}^{T} \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} \hfill \\ \end{gathered}$$
(A.28)

where \(\Pi^{\prime}_{1}\) and \(\Pi^{\prime}_{2}\) are defined as (33) and (34), \(\hat{\bar{\varsigma }}_{2} = \hat{\varsigma }_{2} (t)/(\hat{\varsigma }_{2}^{T} (t)\hat{\varsigma }_{2} (t) + 1)\), \(\hat{\bar{\varsigma }}_{2k} \equiv \hat{\bar{\varsigma }}_{2} (t_{k} ) = \hat{\varsigma }_{2k} /(\hat{\varsigma }_{2k}^{T} \hat{\varsigma }_{2k} + 1)\), \(s_{2} = \hat{\varsigma }_{2}^{T} (t)\hat{\varsigma }_{2} (t) + 1\), \(s_{2k} = \hat{\varsigma }_{2k}^{T} \hat{\varsigma }_{2k} + 1\), \({\rm T}_{2k} = - \tilde{W}_{3}^{T} \Pi^{\prime}_{1} (t_{k} ) - \tilde{W}_{4}^{T} \Pi^{\prime}_{2} (t_{k} )\), and the bounded term \(\Xi_{2} (t)\) is

$$\begin{gathered} \Xi_{2} (t) = W_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right) + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\hat{D}_{1} ) - \tanh (D_{1} )} \right) + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + \varepsilon_{h2} + \bar{u}_{1}^{2} \bar{R}_{22} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{21} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) \hfill \\ \end{gathered}$$
(A.29)

Next, using (A.11)–(A.12) and (A.27)–(A.28), the derivative of Lyapunov function (A.2) becomes

$$\begin{gathered} \dot{L} < - x^{T} qx + k_{1} \left\| x \right\| + k_{2} \hfill \\ - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{2}^{T} \left[ {\hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{2k} \hat{\varsigma }_{2k}^{T} } } \right]\tilde{W}_{2} + \tilde{W}_{2}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{2} }}{{s_{2} }}\Xi_{2} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{2k} }}{{s_{2k} }}} \left( {T_{2k} + \Xi_{2k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} \left( {a_{3}^{ - 1} \dot{\hat{W}}_{3} - \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right) - \tilde{W}_{3}^{T} \left( {\Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ - \tilde{W}_{4}^{T} \left( {a_{4}^{ - 1} \dot{\hat{W}}_{4} - \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right) - \tilde{W}_{4}^{T} \left( {\Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ \end{gathered}$$
(A.30)

where \(k_{1} = k^{\prime}_{1} + k^{\prime\prime}_{1}\), \(k_{2} = k^{\prime}_{2} + k^{\prime\prime}_{2}\), \(q = q_{1} + q_{2}\). Now, we define the actor NN tuning laws for the first and second agent as

$$\dot{\hat{W}}_{3} = - a_{3} \left( {\left( {B_{3} \hat{W}_{3} - B_{1} \hat{\bar{\varsigma }}_{1}^{T} \hat{W}_{1} } \right) - \Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right)$$
(A.31)
$$\dot{\hat{W}}_{4} = - a_{4} \left( {\left( {B_{4} \hat{W}_{4} - B_{2} \hat{\bar{\varsigma }}_{2}^{T} \hat{W}_{2} } \right) - \Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}\hat{W}_{1} - \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}\hat{W}_{2} } \right)$$
(A.32)

These add to \(\dot{L}\) the terms

$$\begin{gathered} \tilde{W}_{3}^{T} B_{3} W_{1} - \tilde{W}_{3}^{T} B_{3} \tilde{W}_{3} - \tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} \tilde{W}_{1} + \hfill \\ \tilde{W}_{4}^{T} B_{4} W_{2} - \tilde{W}_{4}^{T} B_{4} \tilde{W}_{4} - \tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} \tilde{W}_{2} \hfill \\ \end{gathered}$$
(A.33)

Using (A.33), and applying Young inequality [40] to the terms \(\tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} \tilde{W}_{1}\), \(\tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} \tilde{W}_{2}\), \(\dot{L}\) becomes

$$\begin{gathered} \dot{L} < - x^{T} qx + k_{1} \left\| x \right\| + k_{2} \hfill \\ - \tilde{W}_{1}^{T} \left[ {\hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{1k} \hat{\varsigma }_{1k}^{T} } } \right]\tilde{W}_{1} + \tilde{W}_{1}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{1} }}{{s_{1} }}\Xi_{1} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{1k} }}{{s_{1k} }}} \left( {T_{1k} + \Xi_{1k} } \right)} \right) \hfill \\ - \tilde{W}_{3}^{T} B_{3} \tilde{W}_{3} + \tilde{W}_{3}^{T} B_{3} W_{1} - \tilde{W}_{3}^{T} B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \frac{1}{2}\tilde{W}_{3}^{T} B_{1} B_{1}^{T} \tilde{W}_{3} + \frac{1}{2}\tilde{W}_{1}^{T} \hat{\bar{\varsigma }}_{1} \hat{\bar{\varsigma }}_{1}^{T} \tilde{W}_{1} \hfill \\ - \tilde{W}_{2}^{T} \left[ {\hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} + \sum\limits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{2k} \hat{\varsigma }_{2k}^{T} } } \right]\tilde{W}_{2} + \tilde{W}_{2}^{T} \left( {\frac{{\hat{\bar{\varsigma }}_{2} }}{{s_{2} }}\Xi_{2} (t) + \sum\limits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{2k} }}{{s_{2k} }}} \left( {T_{2k} + \Xi_{2k} } \right)} \right) \hfill \\ - \tilde{W}_{4}^{T} B_{4} \tilde{W}_{4} + \tilde{W}_{4}^{T} B_{4} W_{2} - \tilde{W}_{4}^{T} B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \frac{1}{2}\tilde{W}_{4}^{T} B_{2} B_{2}^{T} \tilde{W}_{4} + \frac{1}{2}\tilde{W}_{2}^{T} \hat{\bar{\varsigma }}_{2} \hat{\bar{\varsigma }}_{2}^{T} \tilde{W}_{2} \hfill \\ - \tilde{W}_{3}^{T} \left( {\Pi_{1} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) - \tilde{W}_{4}^{T} \left( {\Pi_{2} (t)\frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} (t)\frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ \end{gathered}$$
(A.34)

Denoting \(N_{i} = \hat{\bar{\varsigma }}_{i} \hat{\bar{\varsigma }}_{i}^{T} + 2\sum\nolimits_{k = 1}^{l} {\hat{\bar{\varsigma }}_{ik} \hat{\varsigma }_{ik}^{T} }\), \(\varGamma_{i} = \frac{{\hat{\bar{\varsigma }}_{i} }}{{s_{i} }}\Xi_{i} (t) + \sum\nolimits_{k = 1}^{l} {\frac{{\hat{\bar{\varsigma }}_{ik} }}{{s_{ik} }}} \left( {T_{ik} + \Xi_{ik} } \right)\), \(i = 1,\,2\). If Condition 1 is satisfied, then \(N_{i}\) is positive definite and thus \(\dot{L}\) can be written as

$$\begin{gathered} \dot{L} < - x^{T} qx + k_{1} \left\| x \right\| + k_{2} \hfill \\ - 0.5\lambda_{\hbox{min} } (N_{1} )\tilde{W}_{1}^{T} \tilde{W}_{1} + \tilde{W}_{1}^{T} \varGamma_{1} - 0.5\lambda_{\hbox{min} } (N_{2} )\tilde{W}_{2}^{T} \tilde{W}_{2} + \tilde{W}_{2}^{T} \varGamma_{2} \hfill \\ - \tilde{W}_{3}^{T} \left( {B_{3} - \frac{1}{2}B_{1} B_{1}^{T} } \right)\tilde{W}_{3} + \tilde{W}_{3}^{T} \left( {B_{3} W_{1} + B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \Pi_{1} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ - \tilde{W}_{4}^{T} \left( {B_{4} - \frac{1}{2}B_{2} B_{2}^{T} } \right)\tilde{W}_{4} + \tilde{W}_{4}^{T} \left( {B_{4} W_{2} + B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \Pi_{2} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} } \right) \hfill \\ \end{gathered}$$
(A.35)

where \(\lambda_{\hbox{min} } (N_{i} )\), \(i = 1,\,2\) is the minimum eigenvalue of \(N_{i}\), \(i = 1,\,2\). Define \(c = B_{3} - \frac{1}{2}B_{1} B_{1}^{T}\), \(d = B_{4} - \frac{1}{2}B_{2} B_{2}^{T}\)

If we choose the design parameters \(B_{1} ,\,B_{2} ,\,B_{3} ,\,B_{4}\) such that \(c > 0\) and \(d > 0\), then the derivative of the Lyapunov function is less than zero if

$$\left\| x \right\| > \frac{{k_{1} }}{{2\lambda_{\hbox{min} } (q)}} + \sqrt {\frac{{k_{1}^{2} }}{{4\lambda_{\hbox{min} }^{2} (q)}} + \frac{{k_{2} }}{{\lambda_{\hbox{min} } (q)}}}$$
(A.36)
$$\left\| {\tilde{W}_{1} } \right\| > \frac{{2\varGamma_{1} }}{{\lambda_{\hbox{min} } (N_{1} )}}$$
(A.37)
$$\left\| {\tilde{W}_{2} } \right\| > \frac{{2\varGamma_{2} }}{{\lambda_{\hbox{min} } (N_{2} )}}$$
(A.38)
$$\left\| {\tilde{W}_{3} } \right\| > \frac{{B_{3} W_{1} + B_{1} \hat{\bar{\varsigma }}_{1}^{T} W_{1} + \Pi_{1} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{1} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} }}{c}$$
(A.39)
$$\left\| {\tilde{W}_{4} } \right\| > \frac{{B_{4} W_{2} + B_{2} \hat{\bar{\varsigma }}_{2}^{T} W_{2} + \Pi_{2} \frac{{\hat{\bar{\varsigma }}_{1}^{T} }}{{s_{1} }}W_{1} + \Pi^{\prime}_{2} \frac{{\hat{\bar{\varsigma }}_{2}^{T} }}{{s_{2} }}W_{2} }}{d}$$
(A.40)

Thus, using standard Lyapunov theory, all the critic and actor NN weight estimation errors are UUB, and the systems states are guaranteed to never leave their initial compact set.

This completes the proof. \(\square\)

Appendix B

Proof of Theorem 2

a Consider all the UUB weight errors in Theorem 2. The approximate constrained coupled HJ equations are

$$\begin{gathered} H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = Q_{1} (x) + \hat{W}_{1}^{T} \nabla \sigma_{1} f - \hat{W}_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - \hat{W}_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\delta \hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{11} \bar{\varepsilon }_{{\hat{D}_{1} }} \hfill \\ + \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (\hat{D}_{2} ) - \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \tanh (\delta \hat{D}_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{\hat{D}_{2} }} \hfill \\ \end{gathered}$$
(B.1)
$$\begin{gathered} H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = Q_{2} (x) + \hat{W}_{2}^{T} \nabla \sigma_{2} f - \hat{W}_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) - \hat{W}_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \tanh (\hat{D}_{1} ) - \hat{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \tanh (\delta \hat{D}_{1} ) + \bar{u}_{1}^{2} \bar{R}_{21} \bar{\varepsilon }_{{\hat{D}_{1} }} \hfill \\ + \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) - \hat{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\delta \hat{D}_{2} ) + \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{\hat{D}_{2} }} \hfill \\ \end{gathered}$$
(B.2)

After adding zero from HJ equations in (17) and (18) and using the fact that \(\tilde{W}_{1} = W_{1} - \hat{W}_{1}\), \(\tilde{W}_{2} = W_{2} - \hat{W}_{2}\), \(\tilde{W}_{3} = W_{1} - \hat{W}_{3}\), \(\tilde{W}_{4} = W_{2} - \hat{W}_{4}\), one has

$$\begin{gathered} H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = - \tilde{W}_{1}^{T} \nabla \sigma_{1} f + \tilde{W}_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \tilde{W}_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \tilde{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta \hat{D}_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + \tilde{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\delta \hat{D}_{2} ) - \tanh (\hat{D}_{2} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{2} \bar{u}_{2} \left( {\tanh (D_{2} ) - \tanh (\hat{D}_{2} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} \left( {\tanh (\delta D_{1} ) - \tanh (\delta \hat{D}_{1} )} \right)\, \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} R_{22}^{ - T} R_{12} \left( {\tanh (\hat{D}_{2} - \tanh (D_{2} )} \right) \hfill \\ + \bar{u}_{1}^{2} \bar{R}_{11} \bar{\varepsilon }_{{\hat{D}_{1} }} + \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{u}_{1}^{2} \bar{R}_{11} \bar{\varepsilon }_{{D_{1} }} - \bar{u}_{2}^{2} \bar{R}_{12} \bar{\varepsilon }_{{D_{2} }} + \varepsilon_{CHJ1} \hfill \\ \end{gathered}$$
(B.3)
$$\begin{gathered} H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right) = - \tilde{W}_{2}^{T} \nabla \sigma_{2} f + \tilde{W}_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \tanh (\hat{D}_{1} ) + \tilde{W}_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \tanh (\hat{D}_{2} ) \hfill \\ + \tilde{W}_{4}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta \hat{D}_{2} ) - \tanh (\hat{D}_{2} )} \right) \hfill \\ + \tilde{W}_{3}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\delta \hat{D}_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\hat{D}_{1} ) - \tanh (D_{1} )} \right) \hfill \\ + W_{1}^{T} \nabla \sigma_{1} g_{1} \bar{u}_{1} R_{11}^{ - T} R_{21} \left( {\tanh (\delta D_{1} ) - \tanh (\delta \hat{D}_{1} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{1} \bar{u}_{1} \left( {\tanh (D_{1} ) - \tanh (\hat{D}_{1} )} \right) \hfill \\ + W_{2}^{T} \nabla \sigma_{2} g_{2} \bar{u}_{2} \left( {\tanh (\delta D_{2} ) - \tanh (\delta \hat{D}_{2} )} \right) \hfill \\ + \bar{u}_{1}^{2} \bar{R}_{21} \bar{\varepsilon }_{{\hat{D}_{1} }} + \bar{u}_{2}^{2} \bar{R}_{22} \bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{u}_{1}^{2} \bar{R}_{21} \bar{\varepsilon }_{{D_{1} }} - \bar{u}_{2}^{2} \bar{R}_{22} \bar{\varepsilon }_{{D_{2} }} + \varepsilon_{CHJ2} \hfill \\ \end{gathered}$$
(B.4)

Now using Assumptions 1 and 2, taking norms in (B.3) and (B.4) reveals

$$\begin{gathered} \left\| {H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\| \le b_{f} b_{\sigma 1x} \left\| x \right\|\left\| {\tilde{W}_{1}^{T} } \right\| + b_{\sigma 1x} \left( {b_{g1} \bar{u}_{1} + b_{g2} \bar{u}_{2} } \right)\left\| {\tilde{W}_{1}^{T} } \right\| + b_{\sigma 1x} b_{g1} \bar{u}_{1} \left\| {\tilde{W}_{3}^{T} } \right\| \hfill \\ + b_{\sigma 2x} b_{g2} \bar{u}_{2} \lambda_{\hbox{max} } \left( {R_{22}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{12} } \right)\left\| {\tilde{W}_{4}^{T} } \right\| \hfill \\ + b_{\sigma 1x} b_{g2} \bar{u}_{2} W_{1m}^{T} + b_{\sigma 1x} b_{g1} \bar{u}_{1} W_{1m}^{T} + 2b_{\sigma 2x} b_{g2} \bar{u}_{2} \lambda_{\hbox{max} } \left( {R_{22}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{12} } \right)W_{2m}^{T} + \left\| {\varepsilon_{1} } \right\| \hfill \\ \end{gathered}$$
(B.5)
$$\begin{gathered} \left\| {H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\| \le b_{f} b_{\sigma 2x} \left\| x \right\|\left\| {\tilde{W}_{2}^{T} } \right\| + b_{\sigma 2x} b_{g1} \bar{u}_{1} \left\| {\tilde{W}_{2}^{T} } \right\| + b_{\sigma 2x} b_{g2} \bar{u}_{2} \left\| {\tilde{W}_{2}^{T} } \right\| \hfill \\ + b_{\sigma 2x} b_{g2} \bar{u}_{2} \left\| {\tilde{W}_{4}^{T} } \right\| + 2b_{\sigma 1x} b_{g1} \bar{u}_{1} \lambda_{\hbox{max} } \left( {R_{11}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{21} } \right)\left\| {\tilde{W}_{3}^{T} } \right\| \hfill \\ + 2b_{\sigma 1x} b_{g1} \bar{u}_{1} \lambda_{\hbox{max} } \left( {R_{11}^{ - T} } \right)\lambda_{\hbox{max} } \left( {R_{21} } \right)W_{1m}^{T} + b_{\sigma 2x} b_{g1} \bar{u}_{1} W_{2m}^{T} + b_{\sigma 2x} b_{g2} \bar{u}_{2} W_{2m}^{T} + \left\| {\varepsilon_{2} } \right\| \hfill \\ \end{gathered}$$
(B.6)

where

$$\begin{gathered} \left\| {\varepsilon_{1} } \right\| \le \bar{u}_{1}^{2} \bar{R}_{11} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{12} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) + \varepsilon_{h1} \hfill \\ \left\| {\varepsilon_{2} } \right\| \le \bar{u}_{1}^{2} \bar{R}_{21} \left( {\bar{\varepsilon }_{{\hat{D}_{1} }} - \bar{\varepsilon }_{{D_{1} }} } \right) + \bar{u}_{2}^{2} \bar{R}_{22} \left( {\bar{\varepsilon }_{{\hat{D}_{2} }} - \bar{\varepsilon }_{{D_{2} }} } \right) + \varepsilon_{h2}, \hfill \\ \end{gathered}$$

and \(\varepsilon_{h1}\), \(\varepsilon_{h2}\) are bounds for \(\varepsilon_{CHJ1}\), \(\varepsilon_{CHJ2}\), respectively. All the signals on the right hand side of (B.5) and (B.6) are UUB. Therefore, \(\left\| {H_{1} \left( {x,\hat{W}_{1} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\|\) and \(\left\| {H_{2} \left( {x,\hat{W}_{2} ,\hat{u}_{1} ,\hat{u}_{2} } \right)} \right\|\) are UUB and convergence to the approximate coupled HJ solutions is obtained.

b. Consider \(\hat{u}_{1}\) and \(\hat{u}_{2}\) in (25) and (26). Then one has

$$\begin{gathered} \left\| {u_{1} - \hat{u}_{1} } \right\| = \bar{u}_{1} \left\| { - \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} W_{1} } \right) + \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} \hat{W}_{3} } \right)} \right\| \hfill \\ \le \bar{u}_{1} \left\| { - \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} W_{1} } \right) + \tanh \left( {1/(2\bar{u}_{1} )R_{11}^{ - 1} g_{1}^{T} \nabla \sigma_{1}^{T} \left( {W_{1} - \tilde{W}_{3} } \right)} \right)} \right\| \hfill \\ \end{gathered}$$
(B.7)
$$\begin{gathered} \left\| {u_{2} - \hat{u}_{2} } \right\| = \bar{u}_{2} \left\| { - \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} W_{2} } \right) + \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} \hat{W}_{4} } \right)} \right\| \hfill \\ \le \bar{u}_{1} \left\| { - \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} W_{2} } \right) + \tanh \left( {1/(2\bar{u}_{2} )R_{22}^{ - 1} g_{2}^{T} \nabla \sigma_{2}^{T} \left( {W_{2} - \tilde{W}_{4} } \right)} \right)} \right\| \hfill \\ \end{gathered}$$
(B.8)

Hence, \(\left\| {u_{1} - \hat{u}_{1} } \right\|\) and \(\left\| {u_{2} - \hat{u}_{2} } \right\|\) are UUB. Therefore, the pair \(\left( {\hat{u}_{1} ,\hat{u}_{2} } \right)\) gives the approximate Nash equilibrium solution of the game and this completes the proof.\(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yasini, S., Naghibi Sitani, M.B. & Kirampor, A. Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems. Int. J. Mach. Learn. & Cyber. 7, 967–980 (2016). https://doi.org/10.1007/s13042-014-0300-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-014-0300-y

Keywords

Navigation