Skip to main content
Log in

Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

This paper developes a novel model-free dual heuristic dynamic programming (DHP) algorithm combined with policy iteration and least square techniques to implement optimal consensus control of discrete-time multi-agent systems. The coupled Hamilton-Jacobi-Bellman (HJB) equations are required to be solved to achieve optimal consensus control, which is generally difficult especially under the case of unknown mathematical models. To overcome above difficulties, the DHP method is carried out by reinforcement learning utilizing online collected data rather than the accurate system dynamics. First, the performance index and corresponding Bellman equation are acquired. Each agent’s value function has quadratic form. Then, a model network is employed to approximate the accurate system dynamics. The Q-function Bellman equation is obtained next. By taking the derivative of Q-function, the DHP method is applied to construct the update formula. Convergence and stability analysis of proposed algorithm are presented. Two simulation examples are provided to illustrate the validity of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Dong XW, Zhou Y, Zhang R, Zhong YS (2016) Time-varying formation control for unmanned aerial vehicles with switching interaction topologies. Control Eng Pract 46:26–36

    Article  Google Scholar 

  2. Ge XH, Han QL, Zhang XM (2018) Achieving cluster formation of multi-agent systems under aperiodic sampling and communication delays. IEEE Trans Ind Electron 65(4):3417–3426

    Article  Google Scholar 

  3. Su HS, Zhang NZ, Chen MZQ, Wang HW, Wang XF (2013) Adaptive flocking with a virtual leader of multiple agents governed by locally Lipschitz nonlinearity. Nonlinear Anal Real World Appl 14(1):310–325

    Article  MathSciNet  Google Scholar 

  4. Ding L, Han QL, Ge XH, Zhang XM (2018) An overview of recent advances in event-triggered consensus of multiagent systems. IEEE Trans Cybern 48(4):1110–1123

    Article  Google Scholar 

  5. Lin J, Morse AS, Anderson BDO (2004) The multi-agent rendezvous problem—the asynchronous case. In: 43rd IEEE conference on decision and control, pp 1926–1931

  6. Olfati-Saber R, Murray RM (2004) Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Control 49(9):1520–1533

    Article  MathSciNet  Google Scholar 

  7. Cao YC, Yu WW, Ren W, Chen GR (2013) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438

    Article  Google Scholar 

  8. Abouheaf MI, Lewis FL, Vamvoudakis KG, Haesaert S, Babuska R (2014) Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica 50(12):3038–3053

    Article  MathSciNet  Google Scholar 

  9. Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50

    Article  Google Scholar 

  10. Zhang HG, Luo YH, Liu DR (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503

    Article  Google Scholar 

  11. Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792

    Article  MathSciNet  Google Scholar 

  12. Abu-Khalaf M, Lewis FL (2008) Neuro dynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252

    Article  Google Scholar 

  13. Shi J, Yue D, Xie XP, Karimpour A, Naghibi-Sistani MB (2020) Adaptive optimal tracking control for nonlinear continuous-time systems with time delay using value iteration algorithm. Neurocomputing 396:172–178

    Article  Google Scholar 

  14. Wei QL, Zhang HG, Liu DR (2010) An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Autom Sin 36(1):121–129

    Article  MathSciNet  Google Scholar 

  15. Kiumarsi B, Lewis FL, Naghibi-Sistani MB, Karimpour A (2015) Optimal tracking control of unknown discrete-time linear systems using input–output measured data. IEEE Trans Cybern 45(12):2770–2779

    Article  Google Scholar 

  16. Mu CX, Zhao Q, Sun CY, Gao ZK (2019) An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Appl Soft Comput 82:1–13

    Article  Google Scholar 

  17. Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175

    Article  MathSciNet  Google Scholar 

  18. Wei QL, Song RZ, Yan PF (2016) Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans Neural Netw 27(2):444–458

    Article  MathSciNet  Google Scholar 

  19. Vamvoudakis K, Lewis FL (2012) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int J Robust Nonlinear Control 22(13):1460–1483

    Article  MathSciNet  Google Scholar 

  20. Wen YL, Zhang HG, Su HG, Ren H (2020) Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning. Opt Control Appl Methods 41(4):1233–1250

    Article  MathSciNet  Google Scholar 

  21. Zhang HG, Cui LL, Luo YH (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single network ADP. IEEE Trans Cybern 43(1):206–216

    Article  Google Scholar 

  22. Mu CX, Sun CY, Song AG, Yu HL (2016) Iterative GDHP-based approxiamte optimal tracking control for a class of discrete-time nonlinear systems. Neurocomputing 214:775–784

    Article  Google Scholar 

  23. Zhang HW, Lewis FL (2012) Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7):1432–1439

    Article  MathSciNet  Google Scholar 

  24. Zhang K, Zhang HG, Gao ZY, Su HG (2018) Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures. J Frankl Inst 355(15):6947–6968

    Article  MathSciNet  Google Scholar 

  25. Li MH, Gao X, Wen Y, Si J, Huang H (2019) Offline policy iteration based reinforcement learning controller for online robotic knee prosthesis parameter tuning. In: 2019 International conference on robotics and automation (ICRA), pp 2831–2837

  26. Vamvoudakis K, Lewis FL, Hudas G (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611

    Article  MathSciNet  Google Scholar 

  27. Abouheaf M, Lewis FL (2013) Multi-agent differential graphical games: Nash online adaptive learning solutions. In: 52nd IEEE annual conference on decision and control (CDC), pp 5803–5809

  28. Zhang HG, Zhang JL, Yang GH, Luo YH (2015) Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 23(1):152–163

    Article  Google Scholar 

  29. Wei QL, Liu DR, Lewis FL (2015) Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci 317:96–113

    Article  Google Scholar 

  30. Abouheaf M, Lewis FL, Haesaert S, Babuska R, Vamvoudakis K (2013) Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. In: 2013 American control conference (ACC), pp 4189–4195

  31. Wang CY, Zuo ZY, Sun JY, Yang J, Ding ZT (2017) Consensus disturbance rejection for Lipschitz nonlinear multi-agent systems with input delay: a DOBC approach. J Frankl Inst 354(1):298–315

    Article  MathSciNet  Google Scholar 

  32. Zhang HG, Jiang H, Luo YH, Xiao GY (2017) Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron 64(5):4091–4100

    Article  Google Scholar 

  33. Zhang J, Wang Z, Zhang H (2019) Data-based optimal control of multiagent systems: a reinforcement learning design approach. IEEE Trans Cybern 49(12):4441–4449

    Article  Google Scholar 

  34. Mu CX, Zhao Q, Gao ZK, Sun CY (2019) Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Frankl Inst Eng Appl Math 356(13):6946–6967

    Article  MathSciNet  Google Scholar 

  35. Abouheaf MI, Lewis FL, Mahmoud MS (2019) Action dependent dual heuristic programming solution for the dynamic graphical games. In: 2018 IEEE conference on decision and control (CDC), pp 2741–2746

  36. Khoo S, Xie L, Man Z (2009) Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Trans Mechatron 14(2):219–228

    Article  Google Scholar 

  37. Abouheaf MI, Lewis FL, Mahmoud MS, Mikulski DG (2015) Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol 13(1):55–69

    Article  MathSciNet  Google Scholar 

  38. Tijs S (2003) Introduction to game theory. Hindustan Book Agency, Gurgaon

    Book  Google Scholar 

  39. Modares H, Lewis FL, Naghibi-Sistani M (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525

    Article  Google Scholar 

  40. Rehan M, Ahn CK, Chadli M (2020) Consensus of one-sided lipschitz multi-agents under input saturation. IEEE Trans Circuits Syst II Exp 67(4):745–749

  41. Razaq MA, Rehan M, Tufail M, Ahn CK (2020) Multiple Lyapunov functions approach for consensus of one-sided Lipschitz multi-agents over switching topologies and input saturation. IEEE Trans Circuits Syst II Exp 67(12):3267–3271

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by Tianjin Natural Science Foundation under Grant 20JCYBJC00880 and the Tianjin Research Innovation Project for Postgraduate Students under Grant 2020YJSB005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaoxu Mu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, H., Feng, Y., Mu, C. et al. Model-Free Optimal Consensus Control for Multi-agent Systems Based on DHP Algorithm. Neural Process Lett 54, 501–521 (2022). https://doi.org/10.1007/s11063-021-10641-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10641-4

Keywords

Navigation