Abstract
The sparse reward problem has long been one of the most challenging topics in the application of reinforcement learning (RL), especially in complex multi-agent systems. In this paper, a hierarchical multi-agent RL architecture is developed to address the sparse reward problem of cooperative tasks in continuous domain. The proposed architecture is divided into two levels: the higher-level meta-agent implements state transitions on a larger time scale to alleviate the sparse reward problem, which receives global observation as spatial information and formulates sub-goals for the lower-level agents; the lower-level agent receives local observation and sub-goal and completes the cooperative tasks. In addition, to improve the stability of the higher-level policy, a channel is built to transmit the lower-level policy to the meta-agent as temporal information, and then a two-stream structure is adopted in the actor-critic networks of the meta-agent to process spatial and temporal information. Simulation experiments on different tasks demonstrate that the proposed algorithm effectively alleviates the sparse reward problem, so as to learn desired cooperative policies.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the first author.
References
Wang Y, Dong L, Sun C (2020) Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 412:101–114
Sun C, Liu W, Dong L (2020) Reinforcement learning with task decomposition for cooperative multiagent systems. IEEE Trans Neural Netw Learn Syst 32(5):2054–2065
Zhang Z, Wang D, Gao J (2021) Learning automata-based multiagent reinforcement learning for optimization of cooperative tasks. IEEE Trans Neural Netw Learn Syst 32(10):4639–4652
Shike Y, Jingchen L, Haobin S (2023) Mix-attention approximation for homogeneous large-scale multi-agent reinforcement learning. Neural Comput Appl 35(4):3143–3154
Tan M (1993) Multi-agent reinforcement learning-independent vs. cooperative agent. In: Proceedings of the 10th International Conference on Machine Learning, pp 330–337
Chu T, Wang J, Codecà L, Li Z (2020) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst 21(3):1086–1095
Lowe R, Wu Y, Tamar A, Harb J (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275
Wen C, Yao X, Wang Y, Tan X (2020) Smix (\(\lambda \)): enhancing centralized value functions for cooperative multi-agent reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (vol 34, pp 7301–7308)
Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Appl Intell (in Press)
Fu C, Xu X, Zhang Y, Lyu Y, Xia Y, Zhou Z, Wu W (2022) Memory-enhanced deep reinforcement learning for UAV navigation in 3D environment. Neural Comput Appl 34(17):14599–14607
Yang Z, Merrick K, Jin L, Abbass HA (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Trans Neural Netw Learn Syst 29(11):5174–5184
Passalis N, Tefas A (2020) Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput Appl 32(9):4227–4238
Lee SY, Sungik C, Chung S-Y (2019) Sample-efficient deep reinforcement learning via episodic backward update. In: Proceedings of the NeurIPS, pp 2110–2119
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952
Lee S, Lee J, Hasuo I (2021) Predictive per: Balancing priority and diversity towards stable deep reinforcement learning. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–10
Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34(7):5649–5664
Tao X, Hafid AS (2020) Deepsensing: a novel mobile crowdsensing framework with double deep q-network and prioritized experience replay. IEEE Internet Things J 7(12):11547–11558
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, Mcgrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. arXiv preprint arXiv:1707.01495
Andres A, Villar-Rodriguez E, Ser JD (2022) Collaborative training of heterogeneous reinforcement learning agents in environments with sparse rewards: what and when to share?. Neural Comput Appl (in Press)
Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inf Process Syst 29:1471–1479
Ostrovski G, Bellemare MG, Oord AVD, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, pp 2721–2730. PMLR
Tang H, Houthooft R, Foote D, Stooke A, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2017) #Exploration: a study of count-based exploration for deep reinforcement learning. In: 31st Conference on Neural Information Processing Systems(NIPS), vol 30, pp 1–18
Stadie BC, Levine S, Abbeel P (2015) Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814
Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the 34th international conference on machine learning, pp 2778–2787. PMLR
Wang X, Chen Y, Zhu W (2021) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44(9):4555–4576
Rafati J, Noelle DC (2019) Learning representations in model-free hierarchical reinforcement learning. In: Proceeding of the AAAI conference on artificial intelligence, vol 33, pp 10009–10010
Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceeding of the AAAI conference on artificial intelligence, vol 31
Yang X, Ji Z, Wu J, Lai Y-K, Wei C, Liu G, Setchi R (2021) Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans Neural Netw Learn Syst 33(9):4727–4741
Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan M (2019) Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans Neural Netw Learn Syst 30(11):3409–3418
Pateria S, Subagdja B, Tan A-H, Quek C (2021) End-to-end hierarchical reinforcement learning with integrated subgoal discovery. IEEE Trans Neural Netw Learn Syst 33(12):7778–7790
Vezhnevets AS, Osindero S (2017) Feudal networks for hierarchical reinforcement learning. In: International conference on machine learning, pp 3540–3549. PMLR
Whlke J, Schmitt F, Hoof HV (2021) Hierarchies of planning and reinforcement learning for robot navigation. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 10682–10688
Nachum O, Gu S, Lee H, Levine S (2018) Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv:1805.08296
Ma J, Wu F (2020) Feudal multi-agent deep reinforcement learning for traffic signal control. In: Proceeding of the 19th international conference on autonomous agents and multiagent systems (AAMAS), pp 816–824
Ren T, Niu J, Liu X, Wu J, Zhang Z (2020) An efficient model-free approach for controlling large-scale canals via hierarchical reinforcement learning. IEEE Trans Ind Inform 17(6):4367–4378
Jin Y, Wei S, Yuan J, Zhang X (2021) Hierarchical and stable multiagent reinforcement learning for cooperative navigation control. IEEE Trans Neural Netw Learn Syst (in Press)
Zhou J, Chen J, Tong Y, Zhang J (2022) Screening goals and selecting policies in hierarchical reinforcement learning. Appl Intell (in Press)
Howard RA (1960) Dynamic programming and Markov processes. Math Gaz 3(358):120
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Zhang T, Guo S, Tan T, Hu X, Chen F (2022) Adjacency constraint for efficient hierarchical reinforcement learning. IEEE Trans Pattern Anal Mach Intell (in Press)
Wang Y, He H, Sun C (2018) Learning to navigate through complex dynamic environment with modular deep reinforcement learning. IEEE Trans Games 10(4):400–412
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, pp 66–83
Martin A, Barham P, Chen J, Chen Z, Zhang X (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, pp 265–283
Senadeera M, Karimpanal TG, Gupta S, Rana S (2022) Sympathy-based reinforcement learning agents. In: Proceedings of the 21st international conference on autonomous agents and multiagent systems, pp 1164–1172
Funding
Funding was provided by National Natural Science Foundation of China (Grant No. 61921004, 62173251, U1713209, 62103104, and 62136008), Natural Science Foundation of Jiangsu Province (Grant No. BK20210215).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, J., Dong, L., Yuan, X. et al. Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain. Neural Comput & Applic 36, 273–287 (2024). https://doi.org/10.1007/s00521-023-08882-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08882-6