Skip to main content

Advertisement

Log in

Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

非线性二阶系统的多智能体强化学习行为控制

  • Research Article
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

摘要

强化学习行为控制局限于没有群体任务的单个智能体, 因为其将行为优先级学习建模为马尔可夫决策过程. 本文提出一种新颖的多智能体强化学习行为控制方法, 该方法通过执行联合学习克服上述缺陷. 具体而言, 针对一组非线性二阶系统, 设计一个多智能体强化学习任务监管器以在任务层分配行为优先级. 通过将行为优先级切换建模为协作式马尔可夫博弈, 多智能体强化学习任务监管器学习最优联合行为优先级, 以减少对人类智能和高性能计算硬件的依赖. 在控制层, 设计了一组二阶强化学习控制器用以学习最优控制策略, 实现位置和速度信号的同步跟踪. 特别地, 设计了一组自适应补偿器以保证输入饱和约束. 数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、 固有时间和强化学习行为控制具有更低的切换频率和控制代价.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Jie HUANG and Zhenyi ZHANG designed the research. Zhenyi ZHANG and Congjie PAN processed the data and drafted the paper. Jie HUANG revised and finalized the paper.

Corresponding author

Correspondence to Jie Huang  (黄捷).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Natural Science Foundation of China (No. 92367109)

List of supplementary materials

1 Proof of Theorem 1

2 Proof of mission stability

3 Proof of boundedness

Algorithm S1 MARLMS

Supplementary materials for

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Huang, J. & Pan, C. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems. Front Inform Technol Electron Eng 25, 869–886 (2024). https://doi.org/10.1631/FITEE.2300394

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2300394

Key words

关键词

CLC number