Skip to main content
Log in

Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares

  • Optimization
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A novel online critical neural network weight adjustment algorithm is proposed in this paper by combining policy iteration and recursive least squares (RLSs) to address the problem of optimal control of players in nonlinear systems with nonzero-sum games. The interaction between players and the nonlinearity of the system make it difficult to solve the Hamiltonian function directly. From a linear regression perspective, this paper regards any admissible control and its corresponding value function as the input and output affected by perturbations. By using RLS to process the current data and store the historical data’s covariance matrix to adjust the weights, the calculation is greatly simplified by avoiding the space waste caused by a lot of historical data storage and the time waste caused by data collection. When calculating the cumulative error, a discount factor is introduced to avoid the total error value tending to infinity and the effect of historical policy evaluation, where the covariance matrix tends to zero and loses its adjustment effect. When the error of the Hamiltonian equation is zero, the proposed adjustment law will also tend to zero. After that, the stability of the covariance matrix was analysed as well as the convergence of the weight errors was proved. Finally, two simulations are conducted to verify the effectiveness of the proposed algorithm based on the RLS method for solving the Nash equilibrium solution online.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Abualigah L, Elaziz MA, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158

    Google Scholar 

  • Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41:779–791

  • Agrawal N, Kumar A, Bajaj V (2017) A new design method for stable IIR filters with nearly linear-phase response based on fractional derivative and swarm intelligence. IEEE Trans Emerg Top Comput Intell 1(6):464–477

    Google Scholar 

  • Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570

    MathSciNet  MATH  Google Scholar 

  • Akinola O, Oyelade ON, Ezugwu AE (2022) Binary ebola optimization search algorithm for feature selection and classification problems. Appl Sci 12(22):11787

    Google Scholar 

  • Alireza N, Marziyeh M (2020) Stabilization of a class of nonlinear control systems via a neural network scheme with convergence analysis. Soft Comput 24(3):1957–1970

    MATH  Google Scholar 

  • Bertsekas D (2021) Multiagent reinforcement learning: rollout and policy iteration. IEEE/CAA J Autom Sin 8(2):249–272

    MathSciNet  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Boston

    MATH  Google Scholar 

  • Bhasin S, Johnson M, Dixon WE (2010) A model-free robust policy iteration algorithm for optimal control of nonlinear systems. In: 49th IEEE conf decision control. pp 3060–3065

  • Bian T, Jiang ZP (2022) Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: a value iteration approach. IEEE Trans Neural Netw Learn Syst 33(7):2781–2790

    MathSciNet  Google Scholar 

  • Bruce AL, Goel A, Bernstein DS (2020) Convergence and consistency of recursive least squares with variable-rate forgetting. Automatica 119:109052

    MathSciNet  MATH  Google Scholar 

  • Ezugwu AE, Agushaka JO, Abualigah L et al (2022) Prairie dog optimization algorithm. Neural Comput Appl 34:20017–20065

    MATH  Google Scholar 

  • Fabbiani E, Nahata P, De Nicolao G, Ferrari-Trecate G (2022) Identification of AC distribution networks with recursive least squares and optimal design of experiment. IEEE Trans Control Syst Technol 30(4):1750–1757

    Google Scholar 

  • Fan QY, Wang D, Xu B (2022) H\(\infty \) codesign for uncertain nonlinear control systems based on policy iteration method. IEEE Trans Cybern 52(10):10101–10110

    Google Scholar 

  • Ha M, Wang D, Liu D (2022) Offline and online adaptive critic control designs with stability guarantee through value iteration. IEEE Trans Cybern 52(12):13262–13274

    Google Scholar 

  • Huo Y, Wang D, Qiao J, Li M (2022) Off-policy model-free learning for multi-player non-zero-sum games with constrained inputs. IEEE Trans Circuits Syst I Regul Pap. https://doi.org/10.1109/TCSI.2022.3221274

    Article  Google Scholar 

  • Islam SAU, Bernstein DS (2019) Recursive least squares for real-time implementation. IEEE Control Syst Mag 39(3):82–85

    Google Scholar 

  • Jiang Y, Jiang Z (2012) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–1704

    MathSciNet  MATH  Google Scholar 

  • Jiang Y, Jiang Z (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929

    MathSciNet  MATH  Google Scholar 

  • Jiang H, Zhang H, Zhang K, Cui X (2018) Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 275(31):649–658

    Google Scholar 

  • Kamalapurkar R, Klotz JR, Dixon WE (2014) Concurrent learning-based approximate feedback-nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 3:239–247

    Google Scholar 

  • Kumar A, Agrawal N, Sharma I, Lee S, Lee H-N (2020) Hilbert transform design based on fractional derivatives and swarm optimization. IEEE Trans Cybern 50(5):2311–2320

    Google Scholar 

  • Li ZJ, Adeli H (2022) New adaptive robust H\(\infty \) control of smart structures using synchrosqueezed wavelet transform and recursive least-squares algorithm. Eng Appl Artif Intel 116:105473

    Google Scholar 

  • Liu Z, Li C (2022) A note on the convergence of distributed RLS. IEEE Trans Autom Control 67(12):6762–6769

    MathSciNet  Google Scholar 

  • Luo X, Wang Z, Shang M (2021) An instance-frequency-weighted regularization scheme for non-negative latent factor analysis on high-dimensional and sparse data. IEEE Trans Syst Man Cybern Syst 51(6):3522–3532

    Google Scholar 

  • Pang B, Bian T, Jiang ZP (2022) Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans Automat Contr 67(1):504–511

    MathSciNet  MATH  Google Scholar 

  • Ren H, Zhang H, Wen Y, Liu C (2019) Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator. Neurocomputing 335:96–104

    Google Scholar 

  • Si J, Barto AG, Powell WB, Wunsch DC (2004) Handbook of learning and approximate dynamic programming. IEEE Press, New York

    Google Scholar 

  • Song R, Lewis FL, Wei Q (2017) Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst 28(3):704–713

    MathSciNet  Google Scholar 

  • Su H, Zhang H, Sun S, Cai Y (2020) Integral reinforcement learning-based online adaptive event-triggered control for non-zero-sum games of partially unknown nonlinear systems. Neurocomputing 377:243–255

    Google Scholar 

  • Towliat M, Guo Z, Cimini LJ, Xia XG, Song A (2022) Multi-layered recursive least squares for time-varying system identification. IEEE Trans Signal Process 70:2280–2292

    MathSciNet  Google Scholar 

  • Vamvoudakis KG, Lewis FL (2010) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatic 46(5):878–888

    MathSciNet  MATH  Google Scholar 

  • Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569

    MathSciNet  MATH  Google Scholar 

  • Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22(3):237–246

    MATH  Google Scholar 

  • Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL (2009) Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2):477–484

    MathSciNet  MATH  Google Scholar 

  • Wang F, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47

    Google Scholar 

  • Wang D, Liu D, Li H, Ma H, Li C (2016) A neural-network-based online optimal control approach for nonlinear robust decentralized stabilization. Soft Comput. 20(2):707–716

    MATH  Google Scholar 

  • Wang D, Wu J, Ren J, Qiao J (2022) Online value iteration for intelligent discounted tracking design of constrained systems. IEEE Trans Circuits Syst II Express Briefs 69(9):3829–3833

    Google Scholar 

  • Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879

    MathSciNet  Google Scholar 

  • Wei Q, Liu D, Shi G (2015) A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans Ind Inf 62(4):2509–2518

    Google Scholar 

  • Wei Q, Liu D, Xu Y (2016) Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach. Soft Comput 20(2):697–706

    MATH  Google Scholar 

  • Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavior sciences, Ph.D. thesis

  • Wu D, He Y, Luo X, Zhou M (2022) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Trans Syst Man Cybern Syst 52(11):6744–6758

    Google Scholar 

  • Zhang Q, Zhao D (2019) Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Trans Cybern 49(8):2874–2885

    Google Scholar 

  • Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503

    Google Scholar 

  • Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236

    Google Scholar 

  • Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214

    MathSciNet  MATH  Google Scholar 

  • Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216

    Google Scholar 

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62273036, 61873300, and in part by Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities) under Grant FRF-IDRY-20-030.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruizhuo Song.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, R., Yang, G. Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares. Soft Comput 27, 16659–16673 (2023). https://doi.org/10.1007/s00500-023-08934-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-08934-y

Keywords

Navigation