Learning Reward Machines in Cooperative Multi-agent Tasks

Ardon, Leo; Furelos-Blanco, Daniel; Russo, Alessandra

doi:10.1007/978-3-031-56255-6_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14456))

Included in the following conference series:

International Conference on Autonomous Agents and Multiagent Systems

400 Accesses

Abstract

This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of Reward Machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete a cooperative task. The RMs associated with the sub-tasks are learnt in a decentralised manner and then used to guide the behaviour of each agent in a team acting towards a common goal. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in cooperative MARL, especially in complex and partially observable environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Cooperative Multi-Agent Reinforcement Learning with Dynamic Target Localization: A Reward Sharing Approach

Learning in the Presence of Multiple Agents

Notes

1.
This is equivalent of using a 2-state RM, reaching the final state only when the task is completed.

References

Albrecht, S.V., Christianos, F., Schäfer, L.: Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, Cambridge (2023)
Google Scholar
Ardon, L., Vadori, N., Spooner, T., Xu, M., Vann, J., Ganesh, S.: Towards a fully RL-based market simulator. In: Proceedings of the ACM International Conference on AI in Finance (ICAIF), pp. 7:1–7:9 (2021)
Google Scholar
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(2), 156–172 (2008)
Article Google Scholar
Camacho, A., Toro Icarte, R., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 6065–6073 (2019)
Google Scholar
Camacho, A., Varley, J., Zeng, A., Jain, D., Iscen, A., Kalashnikov, D.: Reward machines for vision-based robotic manipulation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 14284–14290 (2021)
Google Scholar
Christoffersen, P.J.K., Li, A.C., Toro Icarte, R., McIlraith, S.A.: Learning symbolic representations for reinforcement learning of non-Markovian behavior. In: Proceedings of the Knowledge Representation and Reasoning Meets Machine Learning (KR2ML) Workshop at the Advances in Neural Information Processing Systems (NeurIPS) Conference (2020)
Google Scholar
Dai, J., Lin, H.: Automatic synthesis of cooperative multi-agent systems. In: Proceedings of the IEEE Conference on Decision and Control (CDC), pp. 6173–6178 (2014)
Google Scholar
Dann, M., Yao, Y., Alechina, N., Logan, B., Thangarajah, J.: Multi-agent intention progression with reward machines. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 215–222 (2022)
Google Scholar
De Giacomo, G., Favorito, M., Iocchi, L., Patrizi, F., Ronca, A.: Temporal logic monitoring rewards via transducers. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pp. 860–870 (2020)
Google Scholar
Eappen, J., Jagannathan, S.: DistSPECTRL: distributing specifications in multi-agent reinforcement learning systems. arXiv preprint arXiv:2206.13754 (2022)
Elsefy, A.E.: A task decomposition using (HDec-POSMDPs) approach for multi-robot exploration and fire searching. Int. J. Robot. Mechatron. 7(1), 22–30 (2020)
Google Scholar
Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D., Durr, P.: Super-human performance in gran turismo sport using deep reinforcement learning. IEEE Robot. Autom. Lett. 6(3), 4257–4264 (2021)
Article Google Scholar
Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Induction and exploitation of subgoal automata for reinforcement learning. J. Artif. Intell. Res. 70, 1031–1116 (2021)
Article MathSciNet Google Scholar
Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Hierarchies of reward machines. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 10494–10541 (2023)
Google Scholar
Gaon, M., Brafman, R.I.: Reinforcement learning with non-Markovian rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 3980–3987 (2020)
Google Scholar
Gold, E.M.: Complexity of automaton identification from given data. Inf. Control 37(3), 302–320 (1978)
Article MathSciNet Google Scholar
Hasanbeig, M., Jeppu, N.Y., Abate, A., Melham, T., Kroening, D.: DeepSynth: automata synthesis for automatic task segmentation in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 7647–7656 (2021)
Google Scholar
Kaelbling, L.P.: Learning to achieve goals. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1094–1099 (1993)
Google Scholar
Law, M., Russo, A., Broda, K.: The ILASP System for Learning Answer Set Programs (2015). www.ilasp.com
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Neary, C., Xu, Z., Wu, B., Topcu, U.: Reward machines for cooperative multi-agent reinforcement learning. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 934–942 (2021)
Google Scholar
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 21(1), 7234–7284 (2020)
MathSciNet Google Scholar
Shalev-Shwartz, S., Shammah, S., Shashua, A.: Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295 (2016)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sultana, N.N., Meisheri, H., Baniwal, V., Nath, S., Ravindran, B., Khadilkar, H.: Reinforcement learning for multi-product multi-node inventory management in supply chains. arXiv preprint arXiv:2006.04037 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Toro Icarte, R., Klassen, T., Valenzano, R., McIlraith, S.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 2107–2116 (2018)
Google Scholar
Toro Icarte, R., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)
Article MathSciNet Google Scholar
Toro Icarte, R., Waldie, E., Klassen, T.Q., Valenzano, R.A., Castro, M.P., McIlraith, S.A.: Learning reward machines for partially observable reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) Conference, pp. 15497–15508 (2019)
Google Scholar
Xu, Z., et al.: Joint inference of reward machines and policies for reinforcement learning. In: Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), pp. 590–598 (2020)
Google Scholar

Download references

Acknowledgements

Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-22-2-0243. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorised to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

Author information

Authors and Affiliations

Imperial College London, London, UK
Leo Ardon, Daniel Furelos-Blanco & Alessandra Russo

Authors

Leo Ardon
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Furelos-Blanco
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Russo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leo Ardon .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Francesco Amigoni
Rutgers University, Piscataway, NJ, USA
Arunesh Sinha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ardon, L., Furelos-Blanco, D., Russo, A. (2024). Learning Reward Machines in Cooperative Multi-agent Tasks. In: Amigoni, F., Sinha, A. (eds) Autonomous Agents and Multiagent Systems. Best and Visionary Papers. AAMAS 2023. Lecture Notes in Computer Science(), vol 14456. Springer, Cham. https://doi.org/10.1007/978-3-031-56255-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-56255-6_3
Published: 30 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56254-9
Online ISBN: 978-3-031-56255-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Reward Machines in Cooperative Multi-agent Tasks