Iterative ADP learning algorithms for discrete-time multi-player games

Jiang, He; Zhang, Huaguang

doi:10.1007/s10462-017-9603-1

Iterative ADP learning algorithms for discrete-time multi-player games

Published: 12 January 2018

Volume 50, pages 75–91, (2018)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

1108 Accesses
57 Citations
Explore all metrics

Abstract

Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton–Jacobi–Isaacs equations for zero-sum games and a set of coupled Hamilton–Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Iterative ADP Method to Solve for a Class of Nonlinear Zero-Sum Differential Games

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Article 08 April 2024

Approximate dynamic programming for two-player zero-sum game related to H ∞ control of unknown nonlinear continuous-time systems

Article 18 December 2014

References

Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to \(H_{\infty }\) control. IEEE Trans Syst Man Cybern B Cybern 37(1):240–247
Article MATH Google Scholar
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2007) Model-free Q-learning designs for linear discrete-time zero-sum games with application to \(H_{\infty }\) control. Automatica 43(3):473–481
Article MathSciNet MATH Google Scholar
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949
Article Google Scholar
Jiang H, Zhang H, Xiao G, Cui X (2017) Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming. Neurocomputing 1–8:12. https://doi.org/10.1016/j.neucom.2017.05.086
Google Scholar
Jiang H, Zhang H, Luo Y, Cui X (2017) \(H_\infty \) control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method. Neurocomputing 237:226–234
Article Google Scholar
Johnson M, Kamalapurkar R, Bhasin S, Dixon WE (2015) Approximate \(N\)-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst 1(3):1645–1658
Article MathSciNet Google Scholar
Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of \(N\)-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247
Article Google Scholar
Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634
Article Google Scholar
Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
Article Google Scholar
Liu F, Sun J, Si J, Guo W, Mei S (2012) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235
Article MATH Google Scholar
Liu D, Li H, Wang D (2013) Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing 110:92–100
Article Google Scholar
Liu D, Li H, Wang D (2014) Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics. IEEE Trans Syst Man Cybern Syst 44(8):1015–1027
Article Google Scholar
Liu D, Yang X, Wang D, Wei Q (2015) Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans Cybern 45(7):1372–1385
Article Google Scholar
Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290
Article MathSciNet MATH Google Scholar
Luo B, Wu HN, Huang T, Liu D (2015) Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Netw 71:150–158
Article Google Scholar
Luo B, Wu HN, Huang T (2015) Off-policy reinforcement learning for \(H_\infty \) control design. IEEE Trans Cybern 45(1):65–76
Article Google Scholar
Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144
Article MathSciNet Google Scholar
Mehraeen S, Dierks T, Jagannathan S (2013) Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. IEEE Trans Cybern 43(6):1641–1655
Article Google Scholar
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153
Article Google Scholar
Sokolov Y, Kozma R, Werbos L, Werbos P (2015) Complete stability analysis of a heuristic approximate dynamic programming control design. Automatica 59:9–18
Article MathSciNet MATH Google Scholar
Song R, Lewis FL, Wei Q, Zhang H, Jiang ZP, Levine D (2015) Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans Neural Netw Learn Syst 26(4):851–865
Article MathSciNet Google Scholar
Song R, Lewis FL, Wei Q (2017) Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst 28(3):704–713
Article MathSciNet Google Scholar
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Article MathSciNet MATH Google Scholar
Vamvoudakis KG, Lewis FL (2011) Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569
Article MathSciNet MATH Google Scholar
Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47
Article Google Scholar
Wang D, Liu D, Li H, Ma H (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167–179
Article MathSciNet MATH Google Scholar
Wang D, Liu D, Li H, Luo B, Ma H (2016) An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybern Syst 46(5):713–717
Article Google Scholar
Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst Man Cybern Syst 46(11):1544–1555
Article Google Scholar
Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybern 47(10):3429–3451
Article Google Scholar
Wang D, Mu C, Liu D, Ma H (2017) On mixed data and event driven design for adaptive-critic-based nonlinear \(H_{\infty }\) control. IEEE Trans Neural Netw Learn Syst 99:1–13
Google Scholar
Wang D, He H, Mu C, Liu D (2017) Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system. IEEE Trans Ind Electron 64(6):4935–4944
Article Google Scholar
Wei Q, Wang FY, Liu D, Yang X (2014) Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans Cybern 44(12):2820–2833
Article Google Scholar
Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879
Article MathSciNet Google Scholar
Wei Q, Liu D, Lin H (2016) Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern 46(3):840–853
Article Google Scholar
Wei Q, Lewis FL, Liu D, Song R, Lin H (2016) Discrete-time local value iteration adaptive dynamic programming: convergence analysis. IEEE Trans Syst Man Cybern Syst 99:1–17
Article Google Scholar
Wei Q, Liu D, Qiao L, Song R (2017) Adaptive dynamic programming for discrete-time zero-sum games. IEEE Trans Neural Netw Learn Syst 99:1–13
Article Google Scholar
Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearb 22(6):25–38
Google Scholar
Yang X, Liu D, Wei Q, Wang D (2016) Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing 198:80–90
Article Google Scholar
Yang X, Liu D, Ma H, Xu Y (2016) Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems. Inf Sci 328:435–454
Article Google Scholar
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Article Google Scholar
Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_ {\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718
Article Google Scholar
Zhang H, Jiang H, Luo C, Xiao G (2016) Discrete-time nonzero-sum games for multiplayer using policy iteration-based adaptive dynamic programming algorithms. IEEE Trans Cybern 99:1–10
Google Scholar
Zhang H, Cui X, Luo Y, Jiang H (2017) Finite-horizon \(H_\infty \) tracking control for unknown nonlinear systems with saturating actuators. IEEE Trans Neural Netw Learn Syst 99:1–13
Google Scholar
Zhao D, Zhu Y (2015) MEC—a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst 26(2):346–356
Article MathSciNet Google Scholar
Zhao D, Xia Z, Wang D (2015) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468
Article Google Scholar
Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865
Article Google Scholar
Zhu Y, Zhao D, Li X (2016) Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl 10(12):1339–1347
Article MathSciNet Google Scholar
Zhu Y, Zhao D, He H, Ji J (2017) Event-triggered optimal control for partially-unknown constrained-input systems via adaptive dynamic programming. IEEE Trans Ind Electron 64(5):4101–4109
Article Google Scholar
Zhu Y, Zhao D (2017) Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 1-17. https://doi.org/10.1007/s10462-017-9548-4

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61433004, 61627809, 61621004), and IAPI Fundamental Research Funds 2013ZCX14.

Author information

Authors and Affiliations

College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, People’s Republic of China
He Jiang & Huaguang Zhang

Authors

He Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Huaguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaguang Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, H., Zhang, H. Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev 50, 75–91 (2018). https://doi.org/10.1007/s10462-017-9603-1

Download citation

Published: 12 January 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10462-017-9603-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative ADP learning algorithms for discrete-time multi-player games

Abstract

Access this article

Similar content being viewed by others

An Iterative ADP Method to Solve for a Class of Nonlinear Zero-Sum Differential Games

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Approximate dynamic programming for two-player zero-sum game related to H ∞ control of unknown nonlinear continuous-time systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Iterative ADP learning algorithms for discrete-time multi-player games

Abstract

Access this article

Similar content being viewed by others

An Iterative ADP Method to Solve for a Class of Nonlinear Zero-Sum Differential Games

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Approximate dynamic programming for two-player zero-sum game related to H ∞ control of unknown nonlinear continuous-time systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation