Hybrid Independent Learning in Cooperative Markov Games

Yehoshua, Roi; Amato, Christopher

doi:10.1007/978-3-030-64096-5_6

Roi Yehoshua¹² &
Christopher Amato¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12547))

Included in the following conference series:

International Conference on Distributed Artificial Intelligence

607 Accesses

Abstract

Independent agents learning by reinforcement must overcome several difficulties, including non-stationarity, miscoordination, and relative overgeneralization. An independent learner may receive different rewards for the same state and action at different time steps, depending on the actions of the other agents in that state. Existing multi-agent learning methods try to overcome these issues by using various techniques, such as hysteresis or leniency. However, they all use the latest reward signal to update the Q function. Instead, we propose to keep track of the rewards received for each state-action pair, and use a hybrid approach for updating the Q values: the agents initially adopt an optimistic disposition by using the maximum reward observed, and then transform into average reward learners. We show both analytically and empirically that this technique can improve the convergence and stability of the learning, and is able to deal robustly with overgeneralization, miscoordination, and high degree of stochasticity in the reward and transition functions. Our method outperforms state-of-the-art multi-agent learning algorithms across a spectrum of stochastic and partially observable games, while requiring little parameter tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We sometimes omit the subscript i, when it is clear that we are referring to a specific agent.

References

Agogino, A.K., Tumer, K.: A multiagent approach to managing air traffic flow. Auton. Agent. Multi-Agent Syst. 24(1), 1–25 (2012). https://doi.org/10.1007/s10458-010-9142-5
Article Google Scholar
Amato, C., Dibangoye, J.S., Zilberstein, S.: Incremental policy generation for finite-horizon DEC-POMDPs. In: International Conference on Automated Planning and Scheduling (ICAPS) (2009)
Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998, 746–752 (1998)
Google Scholar
Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative q-learning systems. Int. Joint Conf. Artif. Intell. (IJCAI). 2007, 780–785 (2007)
Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: International Conference on Machine Learning (ICML) (2000)
Google Scholar
Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 464–473 (2017)
Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 157–163 (1994)
Google Scholar
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 64–69 (2007)
Google Scholar
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Article Google Scholar
Nash, J.F.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. U.S.A. 36(1), 48–49 (1950)
Article MathSciNet Google Scholar
Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning. In: International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp. 43–51 (2019)
Google Scholar
Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 801–803 (2006)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Verbeeck, K., Nowé, A., Parent, J., Tuyls, K.: Exploring selfish reinforcement learning in repeated games with stochastic rewards. Auton. Agent. Multi-Agent Syst. 14(3), 239–269 (2007)
Article Google Scholar
Vrancx, P., Tuyls, K., Westra, R.: Switching dynamics of multi-agent learning. Int. Conf. Auton. Agent. Multiagent Syst. (AAMAS) 1, 307–313 (2008)
Google Scholar
Wang, Y., De Silva, C.W.: Multi-robot box-pushing: single-agent q-learning vs. team q-learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3694–3699 (2006)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Wei, E., Luke, S.: Lenient learning in independent-learner stochastic cooperative games. J. Mach. Learn. Res. 17(1), 2914–2955 (2016)
MathSciNet MATH Google Scholar
Yang, E., Gu, D.: Multiagent reinforcement learning for multi-robot systems: A survey. Technical report, Department of Computer Science, University of Essex, Technical report (2004)
Google Scholar

Download references

Acknowledgments

This work is funded by the U.S. Air Force Research Laboratory (AFRL), BAA Number: FA8750-18-S-7007, and NSF grant no. 1816382.

Author information

Authors and Affiliations

Northeastern University, Boston, MA, 02115, USA
Roi Yehoshua & Christopher Amato

Authors

Roi Yehoshua
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Amato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roi Yehoshua .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, AB, Canada
Matthew E. Taylor
Nanjing University, Nanjing, China
Yang Yu
University of Oxford, Oxford, UK
Edith Elkind
Nanjing University, Nanjing, China
Yang Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yehoshua, R., Amato, C. (2020). Hybrid Independent Learning in Cooperative Markov Games. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-64096-5_6
Published: 25 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64095-8
Online ISBN: 978-3-030-64096-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics