Abstract
Cooperation in multi-agent reinforcement learning (MARL) facilitates the acquisition of complex problem-solving skills and promotes more efficient and effective decision-making among agents. Numerous strategies for cooperative learning in MARL exist, including joint action learning, task decomposition, role assignment, and communication protocols. However, deploying these strategies in a complex and dynamic environment remains challenging. To address such challenges, we propose a technique that uses reward sharing to enhance cooperation in partially observable multi-agent environments. As an extension of reward shaping, reward sharing allows agents to work together towards a global objective while still pursuing their local objectives. This approach can foster cooperation and reduce competition between agents without explicit communication, ultimately leading to faster learning and better performance. This study compares three different reward sharing techniques: the Performance Incentive (PI), the Observer’s Share (OS), and the Synergy Achievement (SA) in the context of dynamic target localization, focusing on simulation studies. Thereafter, the proposed reward sharing techniques are evaluated under the effects of objective prioritization, various agent counts, and a variety of map sizes. The research reveals that the proposed reward sharing techniques enhance agent performance, scaling the number of agents leads to higher rewards, and demonstrates a negative correlation between map size and average rewards.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yang, J., Borovikov, I., Zha, H.: Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In: Adaptive Agents and Multi-Agent Systems (2019)
Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Hu, Z., Zhao, D.: Reinforcement learning for multi-agent patrol policy. In: 9th IEEE International Conference on Cognitive Informatics (ICCI’10), pp. 530–535 (2010)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI (1998)
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning (2018). ArXiv, abs/1803.11485
Marzari, L., Pore, A., Dall’Alba, D., Aragon-Camarasa, G., Farinelli, A., Fiorini, P.: Towards hierarchical task decomposition using deep reinforcement learning for pick and place subtasks. In: 2021 20th International Conference on Advanced Robotics (ICAR), pp. 640–645 (2021)
Chaimowicz, L., Campos, M.F., Kumar, V.: Dynamic role assignment for cooperative robots. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), vol. 1, pp. 293–298 (2002)
Foerster, J.N., Assael, Y., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning (2016). ArXiv, abs/1605.06676
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). ArXiv, abs/1312.5602
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: International Conference on Machine Learning (1999)
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, pp. 792–799. AAAI Press (2003)
Mannion, P., Devlin, S., Mason, K., Duggan, J., Howley, E.: Policy invariance under reward transformations for multi-objective reinforcement learning. Neurocomputing 263, 60–73 (2017)
Mannion, P., Devlin, S., Duggan, J., Howley, E.: Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. Knowl. Eng. Rev. 33, e23 (2018). https://doi.org/10.1017/S0269888918000292. Cambridge University Press
Grześ, M., Kudenko, D.: Multigrid reinforcement learning with reward shaping. In: Kurková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008. LNCS, vol. 5163, pp. 357–366. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87536-9_37
Grzes, M., Kudenko, D.: Reinforcement learning with reward shaping and mixed resolution function approximation. Int. J. Agent Technol. Syst. 1, 36–54 (2009)
Ferreira, E., Lefèvre, F.: Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards. Comput. Speech Lang. 34, 256–274 (2015)
Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: Adaptive Agents and Multi-Agent Systems (2014)
Kim, D., et al.: Learning to schedule communication in multi-agent reinforcement learning (2019). ArXiv, abs/1902.01554
Hostallero, D.E., Kim, D., Moon, S., Son, K., Kang, W.J., Yi, Y.: Inducing cooperation through reward reshaping based on peer evaluations in deep multi-agent reinforcement learning. In: AAMAS (2020)
Co-Reyes, J.D., Sanjeev, S., Berseth, G., Gupta, A., Levine, S.: Ecological reinforcement learning (2020). ArXiv, abs/2006.12478
Huang, B., Jin, Y.: Reward shaping in multiagent reinforcement learning for self-organizing systems in assembly tasks. Adv. Eng. Inform. 54, 101800 (2022)
Konidaris, G.D., Barto, A.G.: Autonomous shaping: knowledge transfer in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Rouček, T., et al.: DARPA subterranean challenge: multi-robotic exploration of underground environments. In: Mazal, J., Fagiolini, A., Vasik, P. (eds.) MESAS 2019. LNCS, vol. 11995, pp. 274–290. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43890-6_22
Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective (2000)
Chen, X., Ghadirzadeh, A., Björkman, M., Jensfelt, P.: Meta-learning for multi-objective reinforcement learning (2018)
Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). ArXiv, abs/1707.06347
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization (2015). ArXiv, abs/1502.05477
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wickramaarachchi, H., Kirley, M., Geard, N. (2024). Cooperative Multi-Agent Reinforcement Learning with Dynamic Target Localization: A Reward Sharing Approach. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_25
Download citation
DOI: https://doi.org/10.1007/978-981-99-8391-9_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)