Skip to main content
Log in

Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The sparse reward problem has long been one of the most challenging topics in the application of reinforcement learning (RL), especially in complex multi-agent systems. In this paper, a hierarchical multi-agent RL architecture is developed to address the sparse reward problem of cooperative tasks in continuous domain. The proposed architecture is divided into two levels: the higher-level meta-agent implements state transitions on a larger time scale to alleviate the sparse reward problem, which receives global observation as spatial information and formulates sub-goals for the lower-level agents; the lower-level agent receives local observation and sub-goal and completes the cooperative tasks. In addition, to improve the stability of the higher-level policy, a channel is built to transmit the lower-level policy to the meta-agent as temporal information, and then a two-stream structure is adopted in the actor-critic networks of the meta-agent to process spatial and temporal information. Simulation experiments on different tasks demonstrate that the proposed algorithm effectively alleviates the sparse reward problem, so as to learn desired cooperative policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the first author.

References

  1. Wang Y, Dong L, Sun C (2020) Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 412:101–114

    Article  Google Scholar 

  2. Sun C, Liu W, Dong L (2020) Reinforcement learning with task decomposition for cooperative multiagent systems. IEEE Trans Neural Netw Learn Syst 32(5):2054–2065

    Article  MathSciNet  Google Scholar 

  3. Zhang Z, Wang D, Gao J (2021) Learning automata-based multiagent reinforcement learning for optimization of cooperative tasks. IEEE Trans Neural Netw Learn Syst 32(10):4639–4652

    Article  MathSciNet  Google Scholar 

  4. Shike Y, Jingchen L, Haobin S (2023) Mix-attention approximation for homogeneous large-scale multi-agent reinforcement learning. Neural Comput Appl 35(4):3143–3154

    Article  Google Scholar 

  5. Tan M (1993) Multi-agent reinforcement learning-independent vs. cooperative agent. In: Proceedings of the 10th International Conference on Machine Learning, pp 330–337

  6. Chu T, Wang J, Codecà L, Li Z (2020) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086–1095

    Article  Google Scholar 

  7. Lowe R, Wu Y, Tamar A, Harb J (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275

  8. Wen C, Yao X, Wang Y, Tan X (2020) Smix (\(\lambda \)): enhancing centralized value functions for cooperative multi-agent reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (vol 34, pp 7301–7308)

  9. Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Appl Intell (in Press)

  10. Fu C, Xu X, Zhang Y, Lyu Y, Xia Y, Zhou Z, Wu W (2022) Memory-enhanced deep reinforcement learning for UAV navigation in 3D environment. Neural Comput Appl 34(17):14599–14607

    Article  Google Scholar 

  11. Yang Z, Merrick K, Jin L, Abbass HA (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Trans Neural Netw Learn Syst 29(11):5174–5184

    Article  MathSciNet  Google Scholar 

  12. Passalis N, Tefas A (2020) Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput Appl 32(9):4227–4238

    Article  Google Scholar 

  13. Lee SY, Sungik C, Chung S-Y (2019) Sample-efficient deep reinforcement learning via episodic backward update. In: Proceedings of the NeurIPS, pp 2110–2119

  14. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952

  15. Lee S, Lee J, Hasuo I (2021) Predictive per: Balancing priority and diversity towards stable deep reinforcement learning. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–10

  16. Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34(7):5649–5664

    Article  Google Scholar 

  17. Tao X, Hafid AS (2020) Deepsensing: a novel mobile crowdsensing framework with double deep q-network and prioritized experience replay. IEEE Internet Things J 7(12):11547–11558

    Article  Google Scholar 

  18. Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, Mcgrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. arXiv preprint arXiv:1707.01495

  19. Andres A, Villar-Rodriguez E, Ser JD (2022) Collaborative training of heterogeneous reinforcement learning agents in environments with sparse rewards: what and when to share?. Neural Comput Appl (in Press)

  20. Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inf Process Syst 29:1471–1479

    Google Scholar 

  21. Ostrovski G, Bellemare MG, Oord AVD, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, pp 2721–2730. PMLR

  22. Tang H, Houthooft R, Foote D, Stooke A, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2017) #Exploration: a study of count-based exploration for deep reinforcement learning. In: 31st Conference on Neural Information Processing Systems(NIPS), vol 30, pp 1–18

  23. Stadie BC, Levine S, Abbeel P (2015) Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814

  24. Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the 34th international conference on machine learning, pp 2778–2787. PMLR

  25. Wang X, Chen Y, Zhu W (2021) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44(9):4555–4576

    Google Scholar 

  26. Rafati J, Noelle DC (2019) Learning representations in model-free hierarchical reinforcement learning. In: Proceeding of the AAAI conference on artificial intelligence, vol 33, pp 10009–10010

  27. Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceeding of the AAAI conference on artificial intelligence, vol 31

  28. Yang X, Ji Z, Wu J, Lai Y-K, Wei C, Liu G, Setchi R (2021) Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans Neural Netw Learn Syst 33(9):4727–4741

    Article  MathSciNet  Google Scholar 

  29. Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan M (2019) Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans Neural Netw Learn Syst 30(11):3409–3418

    Article  Google Scholar 

  30. Pateria S, Subagdja B, Tan A-H, Quek C (2021) End-to-end hierarchical reinforcement learning with integrated subgoal discovery. IEEE Trans Neural Netw Learn Syst 33(12):7778–7790

    Article  MathSciNet  Google Scholar 

  31. Vezhnevets AS, Osindero S (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, pp 3540–3549. PMLR

  32. Whlke J, Schmitt F, Hoof HV (2021) Hierarchies of planning and reinforcement learning for robot navigation. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 10682–10688

  33. Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv:1805.08296

  34. Ma J, Wu F (2020) Feudal multi-agent deep reinforcement learning for traffic signal control. In: Proceeding of the 19th international conference on autonomous agents and multiagent systems (AAMAS), pp 816–824

  35. Ren T, Niu J, Liu X, Wu J, Zhang Z (2020) An efficient model-free approach for controlling large-scale canals via hierarchical reinforcement learning. IEEE Trans Ind Inform 17(6):4367–4378

    Article  Google Scholar 

  36. Jin Y, Wei S, Yuan J, Zhang X (2021) Hierarchical and stable multiagent reinforcement learning for cooperative navigation control. IEEE Trans Neural Netw Learn Syst (in Press)

  37. Zhou J, Chen J, Tong Y, Zhang J (2022) Screening goals and selecting policies in hierarchical reinforcement learning. Appl Intell (in Press)

  38. Howard RA (1960) Dynamic programming and Markov processes. Math Gaz 3(358):120

    Google Scholar 

  39. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  40. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  41. Zhang T, Guo S, Tan T, Hu X, Chen F (2022) Adjacency constraint for efficient hierarchical reinforcement learning. IEEE Trans Pattern Anal Mach Intell (in Press)

  42. Wang Y, He H, Sun C (2018) Learning to navigate through complex dynamic environment with modular deep reinforcement learning. IEEE Trans Games 10(4):400–412

    Article  Google Scholar 

  43. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199

  44. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  45. Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, pp 66–83

  46. Martin A, Barham P, Chen J, Chen Z, Zhang X (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, pp 265–283

  47. Senadeera M, Karimpanal TG, Gupta S, Rana S (2022) Sympathy-based reinforcement learning agents. In: Proceedings of the 21st international conference on autonomous agents and multiagent systems, pp 1164–1172

Download references

Funding

Funding was provided by National Natural Science Foundation of China (Grant No. 61921004, 62173251, U1713209, 62103104, and 62136008), Natural Science Foundation of Jiangsu Province (Grant No. BK20210215).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changyin Sun.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, J., Dong, L., Yuan, X. et al. Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain. Neural Comput & Applic 36, 273–287 (2024). https://doi.org/10.1007/s00521-023-08882-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08882-6

Keywords

Navigation