Abstract
A novel online critical neural network weight adjustment algorithm is proposed in this paper by combining policy iteration and recursive least squares (RLSs) to address the problem of optimal control of players in nonlinear systems with nonzero-sum games. The interaction between players and the nonlinearity of the system make it difficult to solve the Hamiltonian function directly. From a linear regression perspective, this paper regards any admissible control and its corresponding value function as the input and output affected by perturbations. By using RLS to process the current data and store the historical data’s covariance matrix to adjust the weights, the calculation is greatly simplified by avoiding the space waste caused by a lot of historical data storage and the time waste caused by data collection. When calculating the cumulative error, a discount factor is introduced to avoid the total error value tending to infinity and the effect of historical policy evaluation, where the covariance matrix tends to zero and loses its adjustment effect. When the error of the Hamiltonian equation is zero, the proposed adjustment law will also tend to zero. After that, the stability of the covariance matrix was analysed as well as the convergence of the weight errors was proved. Finally, two simulations are conducted to verify the effectiveness of the proposed algorithm based on the RLS method for solving the Nash equilibrium solution online.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Abualigah L, Elaziz MA, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41:779–791
Agrawal N, Kumar A, Bajaj V (2017) A new design method for stable IIR filters with nearly linear-phase response based on fractional derivative and swarm intelligence. IEEE Trans Emerg Top Comput Intell 1(6):464–477
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570
Akinola O, Oyelade ON, Ezugwu AE (2022) Binary ebola optimization search algorithm for feature selection and classification problems. Appl Sci 12(22):11787
Alireza N, Marziyeh M (2020) Stabilization of a class of nonlinear control systems via a neural network scheme with convergence analysis. Soft Comput 24(3):1957–1970
Bertsekas D (2021) Multiagent reinforcement learning: rollout and policy iteration. IEEE/CAA J Autom Sin 8(2):249–272
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Boston
Bhasin S, Johnson M, Dixon WE (2010) A model-free robust policy iteration algorithm for optimal control of nonlinear systems. In: 49th IEEE conf decision control. pp 3060–3065
Bian T, Jiang ZP (2022) Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: a value iteration approach. IEEE Trans Neural Netw Learn Syst 33(7):2781–2790
Bruce AL, Goel A, Bernstein DS (2020) Convergence and consistency of recursive least squares with variable-rate forgetting. Automatica 119:109052
Ezugwu AE, Agushaka JO, Abualigah L et al (2022) Prairie dog optimization algorithm. Neural Comput Appl 34:20017–20065
Fabbiani E, Nahata P, De Nicolao G, Ferrari-Trecate G (2022) Identification of AC distribution networks with recursive least squares and optimal design of experiment. IEEE Trans Control Syst Technol 30(4):1750–1757
Fan QY, Wang D, Xu B (2022) H\(\infty \) codesign for uncertain nonlinear control systems based on policy iteration method. IEEE Trans Cybern 52(10):10101–10110
Ha M, Wang D, Liu D (2022) Offline and online adaptive critic control designs with stability guarantee through value iteration. IEEE Trans Cybern 52(12):13262–13274
Huo Y, Wang D, Qiao J, Li M (2022) Off-policy model-free learning for multi-player non-zero-sum games with constrained inputs. IEEE Trans Circuits Syst I Regul Pap. https://doi.org/10.1109/TCSI.2022.3221274
Islam SAU, Bernstein DS (2019) Recursive least squares for real-time implementation. IEEE Control Syst Mag 39(3):82–85
Jiang Y, Jiang Z (2012) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–1704
Jiang Y, Jiang Z (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929
Jiang H, Zhang H, Zhang K, Cui X (2018) Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems. Neurocomputing 275(31):649–658
Kamalapurkar R, Klotz JR, Dixon WE (2014) Concurrent learning-based approximate feedback-nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 3:239–247
Kumar A, Agrawal N, Sharma I, Lee S, Lee H-N (2020) Hilbert transform design based on fractional derivatives and swarm optimization. IEEE Trans Cybern 50(5):2311–2320
Li ZJ, Adeli H (2022) New adaptive robust H\(\infty \) control of smart structures using synchrosqueezed wavelet transform and recursive least-squares algorithm. Eng Appl Artif Intel 116:105473
Liu Z, Li C (2022) A note on the convergence of distributed RLS. IEEE Trans Autom Control 67(12):6762–6769
Luo X, Wang Z, Shang M (2021) An instance-frequency-weighted regularization scheme for non-negative latent factor analysis on high-dimensional and sparse data. IEEE Trans Syst Man Cybern Syst 51(6):3522–3532
Pang B, Bian T, Jiang ZP (2022) Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans Automat Contr 67(1):504–511
Ren H, Zhang H, Wen Y, Liu C (2019) Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator. Neurocomputing 335:96–104
Si J, Barto AG, Powell WB, Wunsch DC (2004) Handbook of learning and approximate dynamic programming. IEEE Press, New York
Song R, Lewis FL, Wei Q (2017) Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst 28(3):704–713
Su H, Zhang H, Sun S, Cai Y (2020) Integral reinforcement learning-based online adaptive event-triggered control for non-zero-sum games of partially unknown nonlinear systems. Neurocomputing 377:243–255
Towliat M, Guo Z, Cimini LJ, Xia XG, Song A (2022) Multi-layered recursive least squares for time-varying system identification. IEEE Trans Signal Process 70:2280–2292
Vamvoudakis KG, Lewis FL (2010) Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatic 46(5):878–888
Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569
Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22(3):237–246
Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL (2009) Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2):477–484
Wang F, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47
Wang D, Liu D, Li H, Ma H, Li C (2016) A neural-network-based online optimal control approach for nonlinear robust decentralized stabilization. Soft Comput. 20(2):707–716
Wang D, Wu J, Ren J, Qiao J (2022) Online value iteration for intelligent discounted tracking design of constrained systems. IEEE Trans Circuits Syst II Express Briefs 69(9):3829–3833
Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879
Wei Q, Liu D, Shi G (2015) A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans Ind Inf 62(4):2509–2518
Wei Q, Liu D, Xu Y (2016) Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach. Soft Comput 20(2):697–706
Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavior sciences, Ph.D. thesis
Wu D, He Y, Luo X, Zhou M (2022) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Trans Syst Man Cybern Syst 52(11):6744–6758
Zhang Q, Zhao D (2019) Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Trans Cybern 49(8):2874–2885
Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Funding
This work was supported in part by the National Natural Science Foundation of China under Grants 62273036, 61873300, and in part by Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities) under Grant FRF-IDRY-20-030.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, R., Yang, G. Online solving Nash equilibrium solution of N-player nonzero-sum differential games via recursive least squares. Soft Comput 27, 16659–16673 (2023). https://doi.org/10.1007/s00500-023-08934-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-08934-y