Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain

Cao, Jingyu; Dong, Lu; Yuan, Xin; Wang, Yuanda; Sun, Changyin

doi:10.1007/s00521-023-08882-6

Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain

Original Article
Published: 10 October 2023

Volume 36, pages 273–287, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jingyu Cao^1,3,
Lu Dong²,
Xin Yuan¹,
Yuanda Wang¹ &
…
Changyin Sun ORCID: orcid.org/0000-0001-9269-334X^1,3

383 Accesses
Explore all metrics

Abstract

The sparse reward problem has long been one of the most challenging topics in the application of reinforcement learning (RL), especially in complex multi-agent systems. In this paper, a hierarchical multi-agent RL architecture is developed to address the sparse reward problem of cooperative tasks in continuous domain. The proposed architecture is divided into two levels: the higher-level meta-agent implements state transitions on a larger time scale to alleviate the sparse reward problem, which receives global observation as spatial information and formulates sub-goals for the lower-level agents; the lower-level agent receives local observation and sub-goal and completes the cooperative tasks. In addition, to improve the stability of the higher-level policy, a channel is built to transmit the lower-level policy to the meta-agent as temporal information, and then a two-stream structure is adopted in the actor-critic networks of the meta-agent to process spatial and temporal information. Simulation experiments on different tasks demonstrate that the proposed algorithm effectively alleviates the sparse reward problem, so as to learn desired cooperative policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Game-theoretic multi-agent motion planning in a mixed environment

Article 15 March 2024

Introduction to Reinforcement Learning

Data availability

The data that support the findings of this study are available on request from the first author.

References

Wang Y, Dong L, Sun C (2020) Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 412:101–114
Article Google Scholar
Sun C, Liu W, Dong L (2020) Reinforcement learning with task decomposition for cooperative multiagent systems. IEEE Trans Neural Netw Learn Syst 32(5):2054–2065
Article MathSciNet Google Scholar
Zhang Z, Wang D, Gao J (2021) Learning automata-based multiagent reinforcement learning for optimization of cooperative tasks. IEEE Trans Neural Netw Learn Syst 32(10):4639–4652
Article MathSciNet Google Scholar
Shike Y, Jingchen L, Haobin S (2023) Mix-attention approximation for homogeneous large-scale multi-agent reinforcement learning. Neural Comput Appl 35(4):3143–3154
Article Google Scholar
Tan M (1993) Multi-agent reinforcement learning-independent vs. cooperative agent. In: Proceedings of the 10th International Conference on Machine Learning, pp 330–337
Chu T, Wang J, Codecà L, Li Z (2020) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086–1095
Article Google Scholar
Lowe R, Wu Y, Tamar A, Harb J (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275
Wen C, Yao X, Wang Y, Tan X (2020) Smix (\(\lambda \)): enhancing centralized value functions for cooperative multi-agent reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (vol 34, pp 7301–7308)
Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Appl Intell (in Press)
Fu C, Xu X, Zhang Y, Lyu Y, Xia Y, Zhou Z, Wu W (2022) Memory-enhanced deep reinforcement learning for UAV navigation in 3D environment. Neural Comput Appl 34(17):14599–14607
Article Google Scholar
Yang Z, Merrick K, Jin L, Abbass HA (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Trans Neural Netw Learn Syst 29(11):5174–5184
Article MathSciNet Google Scholar
Passalis N, Tefas A (2020) Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput Appl 32(9):4227–4238
Article Google Scholar
Lee SY, Sungik C, Chung S-Y (2019) Sample-efficient deep reinforcement learning via episodic backward update. In: Proceedings of the NeurIPS, pp 2110–2119
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952
Lee S, Lee J, Hasuo I (2021) Predictive per: Balancing priority and diversity towards stable deep reinforcement learning. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–10
Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34(7):5649–5664
Article Google Scholar
Tao X, Hafid AS (2020) Deepsensing: a novel mobile crowdsensing framework with double deep q-network and prioritized experience replay. IEEE Internet Things J 7(12):11547–11558
Article Google Scholar
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, Mcgrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. arXiv preprint arXiv:1707.01495
Andres A, Villar-Rodriguez E, Ser JD (2022) Collaborative training of heterogeneous reinforcement learning agents in environments with sparse rewards: what and when to share?. Neural Comput Appl (in Press)
Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inf Process Syst 29:1471–1479
Google Scholar
Ostrovski G, Bellemare MG, Oord AVD, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, pp 2721–2730. PMLR
Tang H, Houthooft R, Foote D, Stooke A, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2017) #Exploration: a study of count-based exploration for deep reinforcement learning. In: 31st Conference on Neural Information Processing Systems(NIPS), vol 30, pp 1–18
Stadie BC, Levine S, Abbeel P (2015) Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814
Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the 34th international conference on machine learning, pp 2778–2787. PMLR
Wang X, Chen Y, Zhu W (2021) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44(9):4555–4576
Google Scholar
Rafati J, Noelle DC (2019) Learning representations in model-free hierarchical reinforcement learning. In: Proceeding of the AAAI conference on artificial intelligence, vol 33, pp 10009–10010
Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceeding of the AAAI conference on artificial intelligence, vol 31
Yang X, Ji Z, Wu J, Lai Y-K, Wei C, Liu G, Setchi R (2021) Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans Neural Netw Learn Syst 33(9):4727–4741
Article MathSciNet Google Scholar
Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan M (2019) Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans Neural Netw Learn Syst 30(11):3409–3418
Article Google Scholar
Pateria S, Subagdja B, Tan A-H, Quek C (2021) End-to-end hierarchical reinforcement learning with integrated subgoal discovery. IEEE Trans Neural Netw Learn Syst 33(12):7778–7790
Article MathSciNet Google Scholar
Vezhnevets AS, Osindero S (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, pp 3540–3549. PMLR
Whlke J, Schmitt F, Hoof HV (2021) Hierarchies of planning and reinforcement learning for robot navigation. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 10682–10688
Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv:1805.08296
Ma J, Wu F (2020) Feudal multi-agent deep reinforcement learning for traffic signal control. In: Proceeding of the 19th international conference on autonomous agents and multiagent systems (AAMAS), pp 816–824
Ren T, Niu J, Liu X, Wu J, Zhang Z (2020) An efficient model-free approach for controlling large-scale canals via hierarchical reinforcement learning. IEEE Trans Ind Inform 17(6):4367–4378
Article Google Scholar
Jin Y, Wei S, Yuan J, Zhang X (2021) Hierarchical and stable multiagent reinforcement learning for cooperative navigation control. IEEE Trans Neural Netw Learn Syst (in Press)
Zhou J, Chen J, Tong Y, Zhang J (2022) Screening goals and selecting policies in hierarchical reinforcement learning. Appl Intell (in Press)
Howard RA (1960) Dynamic programming and Markov processes. Math Gaz 3(358):120
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Zhang T, Guo S, Tan T, Hu X, Chen F (2022) Adjacency constraint for efficient hierarchical reinforcement learning. IEEE Trans Pattern Anal Mach Intell (in Press)
Wang Y, He H, Sun C (2018) Learning to navigate through complex dynamic environment with modular deep reinforcement learning. IEEE Trans Games 10(4):400–412
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, pp 66–83
Martin A, Barham P, Chen J, Chen Z, Zhang X (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, pp 265–283
Senadeera M, Karimpanal TG, Gupta S, Rana S (2022) Sympathy-based reinforcement learning agents. In: Proceedings of the 21st international conference on autonomous agents and multiagent systems, pp 1164–1172

Download references

Funding

Funding was provided by National Natural Science Foundation of China (Grant No. 61921004, 62173251, U1713209, 62103104, and 62136008), Natural Science Foundation of Jiangsu Province (Grant No. BK20210215).

Author information

Authors and Affiliations

School of Automation, Southeast University, Nanjing, 210096, China
Jingyu Cao, Xin Yuan, Yuanda Wang & Changyin Sun
School of Cyber Science and Engineering, Southeast University, Nanjing, 211189, China
Lu Dong
Peng Cheng Laboratory, Shenzhen, 518055, China
Jingyu Cao & Changyin Sun

Authors

Jingyu Cao
View author publications
You can also search for this author in PubMed Google Scholar
Lu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yuanda Wang
View author publications
You can also search for this author in PubMed Google Scholar
Changyin Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changyin Sun.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, J., Dong, L., Yuan, X. et al. Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain. Neural Comput & Applic 36, 273–287 (2024). https://doi.org/10.1007/s00521-023-08882-6

Download citation

Received: 26 January 2023
Accepted: 12 July 2023
Published: 10 October 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00521-023-08882-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Game-theoretic multi-agent motion planning in a mixed environment

Introduction to Reinforcement Learning

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Game-theoretic multi-agent motion planning in a mixed environment

Introduction to Reinforcement Learning

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation