Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Li, Xinxing; Peng, Zhihong; Jiao, Lei; Xi, Lele; Cai, Junqi

doi:10.1007/s11432-018-9865-9

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Research Paper
Published: 12 November 2019

Volume 62, article number 222201, (2019)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Xinxing Li^1,2,
Zhihong Peng^1,2,
Lei Jiao^1,2,
Lele Xi^1,2 &
…
Junqi Cai^1,2

237 Accesses
8 Citations
Explore all metrics

Abstract

A model-based offline policy iteration (PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can learn the feedback Nash equilibrium online using the state samples generated by behavior policies, without sending inquiries to the system model. Unlike the existing Q-learning methods, this novel Q-learning algorithm executes both policy evaluation and policy improvement in an adaptive manner. We prove the convergence of the offline PI algorithm by proving its equivalence to Newton’s method while solving the game algebraic Riccati equation (GARE). Furthermore, we prove that the proposed Q-learning method will converge to the Nash equilibrium under a small learning rate if the method satisfies certain persistence of excitation conditions, which can be easily met by suitable behavior policies. Our simulation results demonstrate the good performance of the proposed online adaptive Q-learning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Article 02 April 2019

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Article 08 April 2024

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

References

Basar T, Olsder G J. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics). 2nd ed. Philadelphia: SIAM, 1999
MATH Google Scholar
Falugi P, Kountouriotis P A, Vinter R B. Differential games controllers that confine a system to a safe region in the state space, with applications to surge tank control. IEEE Trans Autom Contr, 2012, 57: 2778–2788
Article MathSciNet MATH Google Scholar
Lin F H, Liu Q, Zhou X W, et al. Towards green for relay in InterPlaNetary Internet based on differential game model. Sci China Inf Sci, 2014, 57: 042306
Article Google Scholar
Luo B, Wu H N, Huang T. Off-policy reinforcement learning for H _∞ control design. IEEE Trans Cyber, 2015, 45: 65–76
Article Google Scholar
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press 1998
MATH Google Scholar
Xia R S, Wu Q X, Chen M. Disturbance observer-based optimal longitudinal trajectory control of near space vehicle. Sci China Inf Sci, 2019, 62: 050212
Article Google Scholar
Wang D, Mu C X. Developing nonlinear adaptive optimal regulators through an improved neural learning mechanism. Sci China Inf Sci, 2017, 60: 058201
Article Google Scholar
Yan X H, Zhu J H, Kuang M C, et al. Missile aerodynamic design using reinforcement learning and transfer learning. Sci China Inf Sci, 2018, 61: 119204
Article Google Scholar
Watkins C, Dayan P. Q-learning. Mach Learn, 1992, 8: 279–292
Article MATH Google Scholar
Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic control using policy iteration. In: Proceedings of American Control Conference, Baltimore, 1994. 3475–3479
Chen C L, Dong D Y, Li H X, et al. Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci, 2011, 54: 2279–2294
Article MathSciNet MATH Google Scholar
Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci China Inf Sci, 2015, 58: 122203
Article Google Scholar
Wei Q L, Lewis F L, Sun Q Y, et al. Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans Cybern, 2017, 47: 1224–1237
Article Google Scholar
Luo B, Liu D R, Huang T W, et al. Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst, 2016, 27: 2134–2144
Article MathSciNet Google Scholar
Vamvoudakis K G. Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst Control Lett, 2017, 100: 14–20
Article MathSciNet MATH Google Scholar
Vrabie D, Lewis F L. Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl, 2011, 9: 353–360
Article MathSciNet MATH Google Scholar
Zhu Y H, Zhao D B, Li X G. Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst, 2017, 28: 714–725
Article MathSciNet Google Scholar
Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47: 1556–1569
Article MathSciNet MATH Google Scholar
Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cyber, 2013, 43: 206–216
Article Google Scholar
Liu D R, Li H L, Wang D. Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics. IEEE Trans Syst Man Cyber Syst, 2014, 44: 1015–1027
Article Google Scholar
Vamvoudakis K G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica, 2015, 61: 274–281
Article MathSciNet MATH Google Scholar
Zhao D B, Zhang Q C, Wang D, et al. Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cyber, 2016, 46: 854–865
Article Google Scholar
Song R Z, Lewis F L, Wei Q L. Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst, 2017, 28: 704–713
Article MathSciNet Google Scholar
Mehraeen S, Dierks T, Jagannathan S, et al. Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. IEEE Trans Cyber, 2013, 43: 1641–1655
Article Google Scholar
Zhang H G, Jiang H, Luo C M, et al. Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms. IEEE Trans Cyber, 2017, 47: 3331–3340
Article Google Scholar
Zhang H G, Jiang H, Luo Y H, et al. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 2017, 64: 4091–4100
Article Google Scholar
Kiumarsi B, Lewis F L, Jiang Z P. H _∞ control of linear discrete-time systems: off-policy reinforcement learning. Automatica, 2017, 78: 144–152
Article MathSciNet MATH Google Scholar
Vamvoudakis K G, Modares H, Kiumarsi B, et al. Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst, 2017, 37: 33–52
Article MathSciNet Google Scholar
Tamimi A A, Lewis F L, Khalaf M A. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 2007, 43: 473–481
Article MathSciNet MATH Google Scholar
Rizvi S A A, Lin Z L. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control. Automatica, 2018, 95: 213–221
Article MathSciNet MATH Google Scholar
Li J N, Chai T Y, Lewis F L, et al. Off-policy Q-learning: set-point design for optimizing dual-rate rougher flotation operational processes. IEEE Trans Ind Electron, 2018, 65: 4092–4102
Article Google Scholar
Leake R J, Liu R W. Construction of suboptimal control sequences. J SIAM Control, 1967, 5: 54–63
Article MathSciNet MATH Google Scholar
Ioannou P, Fidan B. Adaptive Control Tutorial. Philadelphia: SIAM 2006
Book MATH Google Scholar

Download references

Acknowledgements

This work was supported by Key Program of National Natural Science Foundation of China (Grant No. U1613225).

Author information

Authors and Affiliations

School of Automation, Beijing Institute of Technology, Beijing, 100081, China
Xinxing Li, Zhihong Peng, Lei Jiao, Lele Xi & Junqi Cai
State Key Laboratory of Intelligent Control and Decision of Complex Systems, Beijing, 100081, China
Xinxing Li, Zhihong Peng, Lei Jiao, Lele Xi & Junqi Cai

Authors

Xinxing Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Lele Xi
View author publications
You can also search for this author in PubMed Google Scholar
Junqi Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihong Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Peng, Z., Jiao, L. et al. Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games. Sci. China Inf. Sci. 62, 222201 (2019). https://doi.org/10.1007/s11432-018-9865-9

Download citation

Received: 19 December 2018
Revised: 04 March 2019
Accepted: 29 March 2019
Published: 12 November 2019
DOI: https://doi.org/10.1007/s11432-018-9865-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Abstract

Access this article

Similar content being viewed by others

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Abstract

Access this article

Similar content being viewed by others

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation