Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction

Kujirai, Toshihiro; Yokota, Takayoshi

doi:10.1007/978-3-030-29908-8_58

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11670))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2116 Accesses
2 Citations

Abstract

Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy that will enable each agent to accomplish specific tasks, the state-action space increased exponentially. Coordinating Q-learning (CQ-learning) effectively reduces the state-action space by having each agent determine when it should consider the states of other agents on the basis of a comparison between the immediate rewards in a single-agent environment and those in a multi-agent environment. One way to improve the performance of CQ-learning is to have agents greedily select actions and switch between Q-value update equations in accordance with the state of each agent in the next step. Although this “GPCQ-learning” usually outperforms CQ-learning, a deadlock can occur if there is no difference in the immediate rewards between a single-agent environment and a multi-agent environment. A method has been developed to break such a deadlock by detecting its occurrence and augmenting the state of a deadlocked agent to include the state of the other agent. Evaluation of the method using pursuit games demonstrated that it improves the performance of GPCQ-learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-gent learning: a survey. J. Artif. Intell. Res. 53(1), 659–697 (2015)
Article Google Scholar
Vlassis, N.: A concise introduction to multiagent systems and distributed artificial intelligence. Synth. Lect. Artif. Intell. Mach. Learn. 1(1), 1–71 (2007)
Article Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 746–752 (1998)
Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the 17th International Conference on Machine Learning, pp. 535–542 (2000)
Google Scholar
Sen, S., Sekaran, M., Hale, J.: Learning to coordinate without sharing information. In: Proceedings of the 12th National Conference on Artificial Intelligence, pp. 426–431 (1994)
Google Scholar
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the 10th International Conference on Machine Learning, pp. 330–337 (1993)
Chapter Google Scholar
Melo, F., Veloso, M.: Learning of coordination: exploiting sparse interactions in multiagent systems. In: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, pp. 773–780 (2009)
Google Scholar
Hauwere, Y., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, pp. 715–722 (2010)
Google Scholar
Hauwere, Y.: Sparse interactions in multi-agent reinforcement learning. Ph.D. thesis, Vrije Universiteit Brussel (2011)
Google Scholar
Kujirai, T., Yokota, T.: Greedy action selection and pessimistic Q-value updates in cooperative Q-learning. In: Proceedings of the SICE Annual Conference, pp. 821–826 (2018)
Google Scholar
Kujirai, T., Yokota, T.: Greedy action selection and pessimistic Q-value updating in multi-agent reinforcement learning with sparse interaction. SICE J. Control Meas. Syst. Integr. 12(3), 76–84 (2019)
Article Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, Cambridge University (1989)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, pp. 195–210 (1996)
Google Scholar
Melo, F., Veloso, M.: Decentralized MDPs with sparse interactions. Artif. Intell. 175(11), 1757–1789 (2011)
Article MathSciNet Google Scholar
Aras, R., Dutech, A., Charpillet, F.: Cooperation through communication in decentralized Markov games. In: Proceedings of the International Conference on Advances in Intelligent Systems - Theory and Applications (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Tottori University, 4-101, Koyama-cho Minami, Tottori, 680-8550, Japan
Toshihiro Kujirai & Takayoshi Yokota

Authors

Toshihiro Kujirai
View author publications
You can also search for this author in PubMed Google Scholar
Takayoshi Yokota
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshihiro Kujirai .

Editor information

Editors and Affiliations

Department of Computing, Macquarie University, Sydney, NSW, Australia
Abhaya C. Nayak
RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Alok Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kujirai, T., Yokota, T. (2019). Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11670. Springer, Cham. https://doi.org/10.1007/978-3-030-29908-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-29908-8_58
Published: 23 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29907-1
Online ISBN: 978-3-030-29908-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics