Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm

Shi, Haoen; Feng, Yanghe; Mu, Chaoxu; Wu, Yunkai

doi:10.1007/s11063-021-10641-4

Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm

Published: 19 October 2021

Volume 54, pages 501–521, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Haoen Shi¹,
Yanghe Feng²,
Chaoxu Mu ORCID: orcid.org/0000-0003-1055-9513¹ &
…
Yunkai Wu³

467 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper developes a novel model-free dual heuristic dynamic programming (DHP) algorithm combined with policy iteration and least square techniques to implement optimal consensus control of discrete-time multi-agent systems. The coupled Hamilton-Jacobi-Bellman (HJB) equations are required to be solved to achieve optimal consensus control, which is generally difficult especially under the case of unknown mathematical models. To overcome above difficulties, the DHP method is carried out by reinforcement learning utilizing online collected data rather than the accurate system dynamics. First, the performance index and corresponding Bellman equation are acquired. Each agent’s value function has quadratic form. Then, a model network is employed to approximate the accurate system dynamics. The Q-function Bellman equation is obtained next. By taking the derivative of Q-function, the DHP method is applied to construct the update formula. Convergence and stability analysis of proposed algorithm are presented. Two simulation examples are provided to illustrate the validity of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dung beetle optimizer: a new meta-heuristic algorithm for global optimization

Article 27 November 2022

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Consensus in multi-agent systems: a review

Article 17 November 2021

References

Dong XW, Zhou Y, Zhang R, Zhong YS (2016) Time-varying formation control for unmanned aerial vehicles with switching interaction topologies. Control Eng Pract 46:26–36
Article Google Scholar
Ge XH, Han QL, Zhang XM (2018) Achieving cluster formation of multi-agent systems under aperiodic sampling and communication delays. IEEE Trans Ind Electron 65(4):3417–3426
Article Google Scholar
Su HS, Zhang NZ, Chen MZQ, Wang HW, Wang XF (2013) Adaptive flocking with a virtual leader of multiple agents governed by locally Lipschitz nonlinearity. Nonlinear Anal Real World Appl 14(1):310–325
Article MathSciNet Google Scholar
Ding L, Han QL, Ge XH, Zhang XM (2018) An overview of recent advances in event-triggered consensus of multiagent systems. IEEE Trans Cybern 48(4):1110–1123
Article Google Scholar
Lin J, Morse AS, Anderson BDO (2004) The multi-agent rendezvous problem—the asynchronous case. In: 43rd IEEE conference on decision and control, pp 1926–1931
Olfati-Saber R, Murray RM (2004) Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Control 49(9):1520–1533
Article MathSciNet Google Scholar
Cao YC, Yu WW, Ren W, Chen GR (2013) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438
Article Google Scholar
Abouheaf MI, Lewis FL, Vamvoudakis KG, Haesaert S, Babuska R (2014) Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica 50(12):3038–3053
Article MathSciNet Google Scholar
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Article Google Scholar
Zhang HG, Luo YH, Liu DR (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503
Article Google Scholar
Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792
Article MathSciNet Google Scholar
Abu-Khalaf M, Lewis FL (2008) Neuro dynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252
Article Google Scholar
Shi J, Yue D, Xie XP, Karimpour A, Naghibi-Sistani MB (2020) Adaptive optimal tracking control for nonlinear continuous-time systems with time delay using value iteration algorithm. Neurocomputing 396:172–178
Article Google Scholar
Wei QL, Zhang HG, Liu DR (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Autom Sin 36(1):121–129
Article MathSciNet Google Scholar
Kiumarsi B, Lewis FL, Naghibi-Sistani MB, Karimpour A (2015) Optimal tracking control of unknown discrete-time linear systems using input–output measured data. IEEE Trans Cybern 45(12):2770–2779
Article Google Scholar
Mu CX, Zhao Q, Sun CY, Gao ZK (2019) An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Appl Soft Comput 82:1–13
Article Google Scholar
Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175
Article MathSciNet Google Scholar
Wei QL, Song RZ, Yan PF (2016) Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans Neural Netw 27(2):444–458
Article MathSciNet Google Scholar
Vamvoudakis K, Lewis FL (2012) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int J Robust Nonlinear Control 22(13):1460–1483
Article MathSciNet Google Scholar
Wen YL, Zhang HG, Su HG, Ren H (2020) Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning. Opt Control Appl Methods 41(4):1233–1250
Article MathSciNet Google Scholar
Zhang HG, Cui LL, Luo YH (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single network ADP. IEEE Trans Cybern 43(1):206–216
Article Google Scholar
Mu CX, Sun CY, Song AG, Yu HL (2016) Iterative GDHP-based approxiamte optimal tracking control for a class of discrete-time nonlinear systems. Neurocomputing 214:775–784
Article Google Scholar
Zhang HW, Lewis FL (2012) Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7):1432–1439
Article MathSciNet Google Scholar
Zhang K, Zhang HG, Gao ZY, Su HG (2018) Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures. J Frankl Inst 355(15):6947–6968
Article MathSciNet Google Scholar
Li MH, Gao X, Wen Y, Si J, Huang H (2019) Offline policy iteration based reinforcement learning controller for online robotic knee prosthesis parameter tuning. In: 2019 International conference on robotics and automation (ICRA), pp 2831–2837
Vamvoudakis K, Lewis FL, Hudas G (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611
Article MathSciNet Google Scholar
Abouheaf M, Lewis FL (2013) Multi-agent differential graphical games: Nash online adaptive learning solutions. In: 52nd IEEE annual conference on decision and control (CDC), pp 5803–5809
Zhang HG, Zhang JL, Yang GH, Luo YH (2015) Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 23(1):152–163
Article Google Scholar
Wei QL, Liu DR, Lewis FL (2015) Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci 317:96–113
Article Google Scholar
Abouheaf M, Lewis FL, Haesaert S, Babuska R, Vamvoudakis K (2013) Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. In: 2013 American control conference (ACC), pp 4189–4195
Wang CY, Zuo ZY, Sun JY, Yang J, Ding ZT (2017) Consensus disturbance rejection for Lipschitz nonlinear multi-agent systems with input delay: a DOBC approach. J Frankl Inst 354(1):298–315
Article MathSciNet Google Scholar
Zhang HG, Jiang H, Luo YH, Xiao GY (2017) Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron 64(5):4091–4100
Article Google Scholar
Zhang J, Wang Z, Zhang H (2019) Data-based optimal control of multiagent systems: a reinforcement learning design approach. IEEE Trans Cybern 49(12):4441–4449
Article Google Scholar
Mu CX, Zhao Q, Gao ZK, Sun CY (2019) Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Frankl Inst Eng Appl Math 356(13):6946–6967
Article MathSciNet Google Scholar
Abouheaf MI, Lewis FL, Mahmoud MS (2019) Action dependent dual heuristic programming solution for the dynamic graphical games. In: 2018 IEEE conference on decision and control (CDC), pp 2741–2746
Khoo S, Xie L, Man Z (2009) Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Trans Mechatron 14(2):219–228
Article Google Scholar
Abouheaf MI, Lewis FL, Mahmoud MS, Mikulski DG (2015) Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol 13(1):55–69
Article MathSciNet Google Scholar
Tijs S (2003) Introduction to game theory. Hindustan Book Agency, Gurgaon
Book Google Scholar
Modares H, Lewis FL, Naghibi-Sistani M (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525
Article Google Scholar
Rehan M, Ahn CK, Chadli M (2020) Consensus of one-sided lipschitz multi-agents under input saturation. IEEE Trans Circuits Syst II Exp 67(4):745–749
Razaq MA, Rehan M, Tufail M, Ahn CK (2020) Multiple Lyapunov functions approach for consensus of one-sided Lipschitz multi-agents over switching topologies and input saturation. IEEE Trans Circuits Syst II Exp 67(12):3267–3271
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by Tianjin Natural Science Foundation under Grant 20JCYBJC00880 and the Tianjin Research Innovation Project for Postgraduate Students under Grant 2020YJSB005.

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Haoen Shi & Chaoxu Mu
College of Systems Engineering, National University of Defense Technology, Changsha, 410073, China
Yanghe Feng
School of Electronics and Information, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212003, China
Yunkai Wu

Authors

Haoen Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yanghe Feng
View author publications
You can also search for this author in PubMed Google Scholar
Chaoxu Mu
View author publications
You can also search for this author in PubMed Google Scholar
Yunkai Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoxu Mu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, H., Feng, Y., Mu, C. et al. Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm. Neural Process Lett 54, 501–521 (2022). https://doi.org/10.1007/s11063-021-10641-4

Download citation

Accepted: 02 September 2021
Published: 19 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11063-021-10641-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm

Abstract

Access this article

Similar content being viewed by others

Dung beetle optimizer: a new meta-heuristic algorithm for global optimization

A practical guide to multi-objective reinforcement learning and planning

Consensus in multi-agent systems: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm

Abstract

Access this article

Similar content being viewed by others

Dung beetle optimizer: a new meta-heuristic algorithm for global optimization

A practical guide to multi-objective reinforcement learning and planning

Consensus in multi-agent systems: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation