Abstract
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) that combines cooperative task decomposition with the learning of Reward Machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments and improves the interpretability of the learnt policies required to complete a cooperative task. The RMs associated with the sub-tasks are learnt in a decentralised manner and then used to guide the behaviour of each agent in a team acting towards a common goal. By doing so, the complexity of a cooperative multi-agent problem is reduced, allowing for more effective learning. The results suggest that our approach is a promising direction for future research in cooperative MARL, especially in complex and partially observable environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This is equivalent of using a 2-state RM, reaching the final state only when the task is completed.
References
Albrecht, S.V., Christianos, F., Schäfer, L.: Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, Cambridge (2023)
Ardon, L., Vadori, N., Spooner, T., Xu, M., Vann, J., Ganesh, S.: Towards a fully RL-based market simulator. In: Proceedings of the ACM International Conference on AI in Finance (ICAIF), pp. 7:1–7:9 (2021)
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(2), 156–172 (2008)
Camacho, A., Toro Icarte, R., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 6065–6073 (2019)
Camacho, A., Varley, J., Zeng, A., Jain, D., Iscen, A., Kalashnikov, D.: Reward machines for vision-based robotic manipulation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 14284–14290 (2021)
Christoffersen, P.J.K., Li, A.C., Toro Icarte, R., McIlraith, S.A.: Learning symbolic representations for reinforcement learning of non-Markovian behavior. In: Proceedings of the Knowledge Representation and Reasoning Meets Machine Learning (KR2ML) Workshop at the Advances in Neural Information Processing Systems (NeurIPS) Conference (2020)
Dai, J., Lin, H.: Automatic synthesis of cooperative multi-agent systems. In: Proceedings of the IEEE Conference on Decision and Control (CDC), pp. 6173–6178 (2014)
Dann, M., Yao, Y., Alechina, N., Logan, B., Thangarajah, J.: Multi-agent intention progression with reward machines. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 215–222 (2022)
De Giacomo, G., Favorito, M., Iocchi, L., Patrizi, F., Ronca, A.: Temporal logic monitoring rewards via transducers. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pp. 860–870 (2020)
Eappen, J., Jagannathan, S.: DistSPECTRL: distributing specifications in multi-agent reinforcement learning systems. arXiv preprint arXiv:2206.13754 (2022)
Elsefy, A.E.: A task decomposition using (HDec-POSMDPs) approach for multi-robot exploration and fire searching. Int. J. Robot. Mechatron. 7(1), 22–30 (2020)
Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D., Durr, P.: Super-human performance in gran turismo sport using deep reinforcement learning. IEEE Robot. Autom. Lett. 6(3), 4257–4264 (2021)
Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Induction and exploitation of subgoal automata for reinforcement learning. J. Artif. Intell. Res. 70, 1031–1116 (2021)
Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Hierarchies of reward machines. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 10494–10541 (2023)
Gaon, M., Brafman, R.I.: Reinforcement learning with non-Markovian rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 3980–3987 (2020)
Gold, E.M.: Complexity of automaton identification from given data. Inf. Control 37(3), 302–320 (1978)
Hasanbeig, M., Jeppu, N.Y., Abate, A., Melham, T., Kroening, D.: DeepSynth: automata synthesis for automatic task segmentation in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 7647–7656 (2021)
Kaelbling, L.P.: Learning to achieve goals. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1094–1099 (1993)
Law, M., Russo, A., Broda, K.: The ILASP System for Learning Answer Set Programs (2015). www.ilasp.com
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Neary, C., Xu, Z., Wu, B., Topcu, U.: Reward machines for cooperative multi-agent reinforcement learning. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 934–942 (2021)
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 21(1), 7234–7284 (2020)
Shalev-Shwartz, S., Shammah, S., Shashua, A.: Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295 (2016)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sultana, N.N., Meisheri, H., Baniwal, V., Nath, S., Ravindran, B., Khadilkar, H.: Reinforcement learning for multi-product multi-node inventory management in supply chains. arXiv preprint arXiv:2006.04037 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Toro Icarte, R., Klassen, T., Valenzano, R., McIlraith, S.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 2107–2116 (2018)
Toro Icarte, R., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)
Toro Icarte, R., Waldie, E., Klassen, T.Q., Valenzano, R.A., Castro, M.P., McIlraith, S.A.: Learning reward machines for partially observable reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) Conference, pp. 15497–15508 (2019)
Xu, Z., et al.: Joint inference of reward machines and policies for reinforcement learning. In: Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), pp. 590–598 (2020)
Acknowledgements
Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-22-2-0243. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorised to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ardon, L., Furelos-Blanco, D., Russo, A. (2024). Learning Reward Machines in Cooperative Multi-agent Tasks. In: Amigoni, F., Sinha, A. (eds) Autonomous Agents and Multiagent Systems. Best and Visionary Papers. AAMAS 2023. Lecture Notes in Computer Science(), vol 14456. Springer, Cham. https://doi.org/10.1007/978-3-031-56255-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-56255-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56254-9
Online ISBN: 978-3-031-56255-6
eBook Packages: Computer ScienceComputer Science (R0)