Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares

Song, Ruizhuo; Yang, Gaofu

doi:10.1007/s00500-023-08934-y

Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares

Optimization
Published: 04 August 2023

Volume 27, pages 16659–16673, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Ruizhuo Song¹ &
Gaofu Yang¹

183 Accesses
Explore all metrics

Abstract

A novel online critical neural network weight adjustment algorithm is proposed in this paper by combining policy iteration and recursive least squares (RLSs) to address the problem of optimal control of players in nonlinear systems with nonzero-sum games. The interaction between players and the nonlinearity of the system make it difficult to solve the Hamiltonian function directly. From a linear regression perspective, this paper regards any admissible control and its corresponding value function as the input and output affected by perturbations. By using RLS to process the current data and store the historical data’s covariance matrix to adjust the weights, the calculation is greatly simplified by avoiding the space waste caused by a lot of historical data storage and the time waste caused by data collection. When calculating the cumulative error, a discount factor is introduced to avoid the total error value tending to infinity and the effect of historical policy evaluation, where the covariance matrix tends to zero and loses its adjustment effect. When the error of the Hamiltonian equation is zero, the proposed adjustment law will also tend to zero. After that, the stability of the covariance matrix was analysed as well as the convergence of the weight errors was proved. Finally, two simulations are conducted to verify the effectiveness of the proposed algorithm based on the RLS method for solving the Nash equilibrium solution online.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integral Policy Iteration for Zero-Sum Games with Completely Unknown Nonlinear Dynamics

Finite-Horizon Near Optimal Design of Nonlinear Two-Player Zero-Sum Game in Presence of Completely Unknown Dynamics

Article 31 March 2015

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Article 13 December 2018

Data availability

Enquiries about data availability should be directed to the authors.

References

Abualigah L, Elaziz MA, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158
Google Scholar
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41:779–791
Agrawal N, Kumar A, Bajaj V (2017) A new design method for stable IIR filters with nearly linear-phase response based on fractional derivative and swarm intelligence. IEEE Trans Emerg Top Comput Intell 1(6):464–477
Google Scholar
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570
MathSciNet MATH Google Scholar
Akinola O, Oyelade ON, Ezugwu AE (2022) Binary ebola optimization search algorithm for feature selection and classification problems. Appl Sci 12(22):11787
Google Scholar
Alireza N, Marziyeh M (2020) Stabilization of a class of nonlinear control systems via a neural network scheme with convergence analysis. Soft Comput 24(3):1957–1970
MATH Google Scholar
Bertsekas D (2021) Multiagent reinforcement learning: rollout and policy iteration. IEEE/CAA J Autom Sin 8(2):249–272
MathSciNet Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
MATH Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Boston
MATH Google Scholar
Bhasin S, Johnson M, Dixon WE (2010) A model-free robust policy iteration algorithm for optimal control of nonlinear systems. In: 49th IEEE conf decision control. pp 3060–3065
Bian T, Jiang ZP (2022) Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: a value iteration approach. IEEE Trans Neural Netw Learn Syst 33(7):2781–2790
MathSciNet Google Scholar
Bruce AL, Goel A, Bernstein DS (2020) Convergence and consistency of recursive least squares with variable-rate forgetting. Automatica 119:109052
MathSciNet MATH Google Scholar
Ezugwu AE, Agushaka JO, Abualigah L et al (2022) Prairie dog optimization algorithm. Neural Comput Appl 34:20017–20065
MATH Google Scholar
Fabbiani E, Nahata P, De Nicolao G, Ferrari-Trecate G (2022) Identification of AC distribution networks with recursive least squares and optimal design of experiment. IEEE Trans Control Syst Technol 30(4):1750–1757
Google Scholar
Fan QY, Wang D, Xu B (2022) H\(\infty \) codesign for uncertain nonlinear control systems based on policy iteration method. IEEE Trans Cybern 52(10):10101–10110
Google Scholar
Ha M, Wang D, Liu D (2022) Offline and online adaptive critic control designs with stability guarantee through value iteration. IEEE Trans Cybern 52(12):13262–13274
Google Scholar
Huo Y, Wang D, Qiao J, Li M (2022) Off-policy model-free learning for multi-player non-zero-sum games with constrained inputs. IEEE Trans Circuits Syst I Regul Pap. https://doi.org/10.1109/TCSI.2022.3221274
Article Google Scholar
Islam SAU, Bernstein DS (2019) Recursive least squares for real-time implementation. IEEE Control Syst Mag 39(3):82–85
Google Scholar
Jiang Y, Jiang Z (2012) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–1704
MathSciNet MATH Google Scholar
Jiang Y, Jiang Z (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929
MathSciNet MATH Google Scholar
Jiang H, Zhang H, Zhang K, Cui X (2018) Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 275(31):649–658
Google Scholar
Kamalapurkar R, Klotz JR, Dixon WE (2014) Concurrent learning-based approximate feedback-nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 3:239–247
Google Scholar
Kumar A, Agrawal N, Sharma I, Lee S, Lee H-N (2020) Hilbert transform design based on fractional derivatives and swarm optimization. IEEE Trans Cybern 50(5):2311–2320
Google Scholar
Li ZJ, Adeli H (2022) New adaptive robust H\(\infty \) control of smart structures using synchrosqueezed wavelet transform and recursive least-squares algorithm. Eng Appl Artif Intel 116:105473
Google Scholar
Liu Z, Li C (2022) A note on the convergence of distributed RLS. IEEE Trans Autom Control 67(12):6762–6769
MathSciNet Google Scholar
Luo X, Wang Z, Shang M (2021) An instance-frequency-weighted regularization scheme for non-negative latent factor analysis on high-dimensional and sparse data. IEEE Trans Syst Man Cybern Syst 51(6):3522–3532
Google Scholar
Pang B, Bian T, Jiang ZP (2022) Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans Automat Contr 67(1):504–511
MathSciNet MATH Google Scholar
Ren H, Zhang H, Wen Y, Liu C (2019) Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator. Neurocomputing 335:96–104
Google Scholar
Si J, Barto AG, Powell WB, Wunsch DC (2004) Handbook of learning and approximate dynamic programming. IEEE Press, New York
Google Scholar
Song R, Lewis FL, Wei Q (2017) Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst 28(3):704–713
MathSciNet Google Scholar
Su H, Zhang H, Sun S, Cai Y (2020) Integral reinforcement learning-based online adaptive event-triggered control for non-zero-sum games of partially unknown nonlinear systems. Neurocomputing 377:243–255
Google Scholar
Towliat M, Guo Z, Cimini LJ, Xia XG, Song A (2022) Multi-layered recursive least squares for time-varying system identification. IEEE Trans Signal Process 70:2280–2292
MathSciNet Google Scholar
Vamvoudakis KG, Lewis FL (2010) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatic 46(5):878–888
MathSciNet MATH Google Scholar
Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569
MathSciNet MATH Google Scholar
Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22(3):237–246
MATH Google Scholar
Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL (2009) Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2):477–484
MathSciNet MATH Google Scholar
Wang F, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47
Google Scholar
Wang D, Liu D, Li H, Ma H, Li C (2016) A neural-network-based online optimal control approach for nonlinear robust decentralized stabilization. Soft Comput. 20(2):707–716
MATH Google Scholar
Wang D, Wu J, Ren J, Qiao J (2022) Online value iteration for intelligent discounted tracking design of constrained systems. IEEE Trans Circuits Syst II Express Briefs 69(9):3829–3833
Google Scholar
Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879
MathSciNet Google Scholar
Wei Q, Liu D, Shi G (2015) A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans Ind Inf 62(4):2509–2518
Google Scholar
Wei Q, Liu D, Xu Y (2016) Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach. Soft Comput 20(2):697–706
MATH Google Scholar
Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavior sciences, Ph.D. thesis
Wu D, He Y, Luo X, Zhou M (2022) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Trans Syst Man Cybern Syst 52(11):6744–6758
Google Scholar
Zhang Q, Zhao D (2019) Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Trans Cybern 49(8):2874–2885
Google Scholar
Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503
Google Scholar
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Google Scholar
Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214
MathSciNet MATH Google Scholar
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62273036, 61873300, and in part by Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities) under Grant FRF-IDRY-20-030.

Author information

Authors and Affiliations

School of Automation and Electrical Engineering, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing, 100083, China
Ruizhuo Song & Gaofu Yang

Authors

Ruizhuo Song
View author publications
You can also search for this author in PubMed Google Scholar
Gaofu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruizhuo Song.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, R., Yang, G. Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares. Soft Comput 27, 16659–16673 (2023). https://doi.org/10.1007/s00500-023-08934-y

Download citation

Accepted: 08 March 2023
Published: 04 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00500-023-08934-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares

Abstract

Access this article

Similar content being viewed by others

Integral Policy Iteration for Zero-Sum Games with Completely Unknown Nonlinear Dynamics

Finite-Horizon Near Optimal Design of Nonlinear Two-Player Zero-Sum Game in Presence of Completely Unknown Dynamics

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares

Abstract

Access this article

Similar content being viewed by others

Integral Policy Iteration for Zero-Sum Games with Completely Unknown Nonlinear Dynamics

Finite-Horizon Near Optimal Design of Nonlinear Two-Player Zero-Sum Game in Presence of Completely Unknown Dynamics

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation