Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

Zhang, Zhenyi; Huang, Jie; Pan, Congjie

doi:10.1631/FITEE.2300394

Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

非线性二阶系统的多智能体强化学习行为控制

Research Article
Published: 05 July 2024

Volume 25, pages 869–886, (2024)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

204 Accesses
Explore all metrics

Abstract

Reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

摘要

强化学习行为控制局限于没有群体任务的单个智能体, 因为其将行为优先级学习建模为马尔可夫决策过程. 本文提出一种新颖的多智能体强化学习行为控制方法, 该方法通过执行联合学习克服上述缺陷. 具体而言, 针对一组非线性二阶系统, 设计一个多智能体强化学习任务监管器以在任务层分配行为优先级. 通过将行为优先级切换建模为协作式马尔可夫博弈, 多智能体强化学习任务监管器学习最优联合行为优先级, 以减少对人类智能和高性能计算硬件的依赖. 在控制层, 设计了一组二阶强化学习控制器用以学习最优控制策略, 实现位置和速度信号的同步跟踪. 特别地, 设计了一组自适应补偿器以保证输入饱和约束. 数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、固有时间和强化学习行为控制具有更低的切换频率和控制代价.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Ahmad S, Feng Z, Hu GQ, 2014. Multi-robot formation control using distributed null space behavioral approach. IEEE Int Conf on Robotics and Automation, p.3607–3612. https://doi.org/10.1109/icra.2014.6907380
Anschel O, Baram N, Shimkin N, 2017. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. Proc 34^th Int Conf on Machine Learning, p.176–185.
Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Robot, 22(6):1285–1292. https://doi.org/10.1109/TRO.2006.886272
Article Google Scholar
Arkin RC, 1989. Motor schema-based mobile robot navigation. Int J Robot Res, 8(4):92–112. https://doi.org/10.1177/027836498900800406
Article Google Scholar
Balch T, Arkin RC, 1998. Behavior-based formation control for multirobot teams. IEEE Trans Robot Autom, 14(6):926–939. https://doi.org/10.1109/70.736776
Article Google Scholar
Brooks RA, 1986. A robust layered control system for a mobile robot. IEEE J Robot Autom, 2(1):14–23. https://doi.org/10.1109/JRA.1986.1087032
Article Google Scholar
Brooks RA, 1991. New approaches to robotics. Science, 253(5025):1227–1232. https://doi.org/10.1126/science.253.5025.1227
Article Google Scholar
Cao SJ, Sun L, Jiang JJ, et al., 2023. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans Neur Netw Learn Syst, 34(8):4584–4595. https://doi.org/10.1109/TNNLS.2021.3116713
Article MathSciNet Google Scholar
Cao YC, Yu WW, Ren W, et al., 2013. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inform, 9(1):427–438. https://doi.org/10.1109/TII.2012.2219061
Article Google Scholar
Chen J, Gan MG, Huang J, et al., 2016. Formation control of multiple Euler–Lagrange systems via null-space-based behavioral control. Sci China Inform Sci, 59(1):1–11. https://doi.org/10.1007/s11432-015-5504-6
Article Google Scholar
Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643–149651. https://doi.org/10.1109/ACCESS.2020.3016347
Article Google Scholar
Dong XW, Zhou Y, Ren Z, et al., 2017. Time-varying formation tracking for second-order multi-agent systems subjected to switching topologies with application to quadrotor formation flying. IEEE Trans Ind Electron, 64(6):5014–5024. https://doi.org/10.1109/TIE.2016.2593656
Article Google Scholar
Garattoni L, Birattari M, 2018. Autonomous task sequencing in a robot swarm. Sci Robot, 3(20):eaat0430. https://doi.org/10.1126/scirobotics.aat0430
Article Google Scholar
Huang J, Cao M, Zhou N, et al., 2017. Distributed behavioral control for second-order nonlinear multi-agent systems. IFAC-PapersOnLine, 50(1):2445–2450. https://doi.org/10.1016/j.ifacol.2017.08.407
Article Google Scholar
Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612–9622. https://doi.org/10.1109/TIE.2019.2892669
Article Google Scholar
Huang J, Mo ZB, Zhang ZY, et al., 2022a. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems. Front Inform Technol Electron Eng, 23(8):1174–1188. https://doi.org/10.1631/FITEE.2100280
Article Google Scholar
Huang J, Wu WH, Zhang ZY, et al., 2022b. Human decision-making modeling and cooperative controller design for human–agent interaction systems. IEEE Trans Human-Mach Syst, 52(6):1122–1134. https://doi.org/10.1109/THMS.2022.3185333
Article Google Scholar
Littman ML, 1994. Markov games as a framework for multiagent reinforcement learning. Proc 11^th Int Conf on Machine Learning, p.157–163. https://doi.org/10.1016/b978-1-55860-335-6.50027-1
Liu DR, Xue S, Zhao B, et al., 2021. Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst, 51(1):142–160. https://doi.org/10.1109/TSMC.2020.3042876
Article Google Scholar
Liu Y, Li HY, Lu RQ, et al., 2022. An overview of finite/fixed-time control and its application in engineering systems. IEEE/CAA J Autom Sin, 9(12):2106–2120. https://doi.org/10.1109/JAS.2022.105413
Article Google Scholar
Marino A, Caccavale F, Parker LE, et al., 2009. Fuzzy behavioral control for multi-robot border patrol. Proc 17^th Mediterranean Conf on Control and Automation, p.246–251. https://doi.org/10.1109/med.2009.5164547
Marino A, Parker LE, Antonelli G, et al., 2013. A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multirobot border patrolling. J Intell Robot Syst, 71(3):423–444. https://doi.org/10.1007/s10846-012-9783-5
Article Google Scholar
Ott C, Dietrich A, Albu-Schäffer A, 2015. Prioritized multi-task compliance control of redundant manipulators. Automatica, 53:416–423. https://doi.org/10.1016/j.automatica.2015.01.015
Article MathSciNet Google Scholar
Santos MCP, Rosales CD, Sarcinelli-Filho M, et al., 2017. A novel null-space-based UAV trajectory tracking controller with collision avoidance. IEEE/ASME Trans Mech, 22(6):2543–2553. https://doi.org/10.1109/tmech.2017.2752302
Article Google Scholar
Schlanbusch R, Kristiansen R, Nicklasson PJ, 2011. Spacecraft formation reconfiguration with collision avoidance. Automatica, 47(7):1443–1449. https://doi.org/10.1016/j.automatica.2011.02.014
Article MathSciNet Google Scholar
Vadakkepat P, Miin OC, Peng X, et al., 2004. Fuzzy behavior-based control of mobile robots. IEEE Trans Fuzzy Syst, 12(4):559–565. https://doi.org/10.1109/TFUZZ.2004.832536
Article Google Scholar
Wang WJ, Li CJ, Guo YN, 2021. Relative position coordinated control for spacecraft formation flying with obstacle/collision avoidance. Nonl Dyn, 104(2):1329–1342. https://doi.org/10.1007/s11071-021-06348-9
Article MathSciNet Google Scholar
Wang ZY, Schaul T, Hessel M, et al., 2016. Dueling network architectures for deep reinforcement learning. Proc 33^rd Int Conf on Machine Learning, p.1995–2003.
Wei EM, Luke S, 2016. Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res, 17(1):2914–2955.
MathSciNet Google Scholar
Wen GX, Chen CLP, Liu YJ, et al., 2017. Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems. IEEE Trans Cybern, 47(8):2151–2160. https://doi.org/10.1109/TCYB.2016.2608499
Article Google Scholar
Wen GX, Chen CLP, Feng J, et al., 2018. Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans Fuzzy Syst, 26(5):2719–2731. https://doi.org/10.1109/TFUZZ.2017.2787561
Article Google Scholar
Wen GX, Chen CLP, Ge SS, 2021. Simplified optimized back-stepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans Cybern, 51(9):4567–4580. https://doi.org/10.1109/TCYB.2020.3002108
Article Google Scholar
Yao DY, Li HY, Lu RQ, et al., 2020. Distributed sliding-mode tracking control of second-order nonlinear multiagent systems: an event-triggered approach. IEEE Trans Cybern, 50(9):3892–3902. https://doi.org/10.1109/TCYB.2019.2963087
Article Google Scholar
Yao P, Wei YX, Zhao ZY, 2022. Null-space-based modulated reference trajectory generator for multi-robots formation in obstacle environment. ISA Trans, 123:168–178. https://doi.org/10.1016/j.isatra.2021.05.033
Article Google Scholar
Zhang ZY, Mo ZB, Chen YT, et al., 2022. Reinforcement learning behavioral control for nonlinear autonomous system. IEEE/CAA J Autom Sin, 9(9):1561–1573. https://doi.org/10.1109/JAS.2022.105797
Article Google Scholar
Zheng CB, Pang ZH, Wang JX, et al., 2023. Null-space-based time-varying formation control of uncertain nonlinear second-order multiagent systems with collision avoidance. IEEE Trans Ind Electron, 70(10):10476–10485. https://doi.org/10.1109/TIE.2022.3217585
Article Google Scholar
Zhou N, Xia YQ, Wang ML, et al., 2015. Finite-time attitude control of multiple rigid spacecraft using terminal sliding mode. Int J Robust Nonl Contr, 25(12):1862–1876. https://doi.org/10.1002/rnc.3182
Article MathSciNet Google Scholar
Zhou N, Cheng XD, Sun ZQ, et al., 2022. Fixed-time cooperative behavioral control for networked autonomous agents with second-order nonlinear dynamics. IEEE Trans Cybern, 52(9):9504–9518. https://doi.org/10.1109/TCYB.2021.3057219
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, 350108, China
Zhenyi Zhang (张祯毅), Jie Huang (黄捷) & Congjie Pan (潘聪捷)
5G+ Industrial Internet Institute, Fuzhou University, Fuzhou, 350108, China
Zhenyi Zhang (张祯毅), Jie Huang (黄捷) & Congjie Pan (潘聪捷)

Authors

Zhenyi Zhang (张祯毅)
View author publications
You can also search for this author in PubMed Google Scholar
Jie Huang (黄捷)
View author publications
You can also search for this author in PubMed Google Scholar
Congjie Pan (潘聪捷)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jie HUANG and Zhenyi ZHANG designed the research. Zhenyi ZHANG and Congjie PAN processed the data and drafted the paper. Jie HUANG revised and finalized the paper.

Corresponding author

Correspondence to Jie Huang (黄捷).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (No. 92367109)

List of supplementary materials

1 Proof of Theorem 1

2 Proof of mission stability

3 Proof of boundedness

Algorithm S1 MARLMS

Supplementary materials for