A Convergent Multiagent Reinforcement Learning Approach for a Subclass of Cooperative Stochastic Games

Kemmerich, Thomas; Kleine Büning, Hans

doi:10.1007/978-3-642-28499-1_3

Thomas Kemmerich²² &
Hans Kleine Büning²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7113))

Included in the following conference series:

International Workshop on Adaptive and Learning Agents

872 Accesses
2 Citations

Abstract

We present a distributed Q-Learning approach for independently learning agents in a subclass of cooperative stochastic games called cooperative sequential stage games. In this subclass, several stage games are played one after the other. We also propose a transformation function for that class and prove that transformed and original games have the same set of optimal joint strategies. Under the condition that the played game is obtained through transformation, it will be proven that our approach converges to an optimal joint strategy for the last stage game of the transformed game and thus also for the original game. In addition, the ability to converge to ε-optimal joint strategies for each of the stage games is shown. The environment in our approach does not need to present a state signal to the agents. Instead, by the use of the aforementioned transformation function, the agents gain knowledge about state changes from an engineered reward. This allows agents to omit storing strategies for each single state, but to use only one strategy that is adapted to the currently played stage game. Thus, the algorithm has very low space requirements and its complexity is comparable to single agent Q-Learning. Besides theoretical analyses, we also underline the convergence properties with some experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of markov decision processes. Math. Oper. Res. 27, 819–840 (2002)
Article MathSciNet MATH Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent Reinforcement Learning: An Overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-Agent Systems and Applications - 1. SCI, vol. 310, pp. 183–221. Springer, Heidelberg (2010)
Chapter Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proc. of the 15th National Conf. on Artificial Intelligence, pp. 746–752. AAAI Press (1998)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. J. Artif. Intell. Res. 101(1-2), 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Google Scholar
Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: AAAI/IAAI, pp. 326–331. AAAI Press (2002)
Google Scholar
Kemmerich, T., Kleine Büning, H.: Region-based heuristics for an iterative partitioning problem in multiagent systems. In: Proc. 3rd Intl. Conf. on Agents and Artificial Intelligence (ICAART 2011), vol. 2, pp. 200–205. SciTePress (2011)
Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proc. of the 17th Intl. Conf. on Machine Learning (ICML 2000), pp. 535–542. Morgan Kaufmann (2000)
Google Scholar
Matignon, L., Laurent, G.J., Fort-Piat, L.: A study of fmq heuristic in cooperative multi-agent games. In: Proc. of Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains (MSDM), AAMAS 2008 Workshop, pp. 77–91 (2008)
Google Scholar
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proc. of the 25th Intl. Conf. on Machine learning (ICML 2008), pp. 664–671. ACM (2008)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)
Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press (1998)
Google Scholar
Wang, X., Sandholm, T.: Reinforcement learning to play an optimal nash equilibrium in team markov games. In: NIPS, vol. 15, pp. 1571–1578. MIT Press (2003)
Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Technical note q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

International Graduate School Dynamic Intelligent Systems, University of Paderborn, 33095, Paderborn, Germany
Thomas Kemmerich
Department of Computer Science, University of Paderborn, 33095, Paderborn, Germany
Hans Kleine Büning

Authors

Thomas Kemmerich
View author publications
You can also search for this author in PubMed Google Scholar
Hans Kleine Büning
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AI& Computational Modeling Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussel, Belgium
Peter Vrancx
NASA Ames Research Park, Carnegie Mellon University, Building 23 (MS 23-11), P.O.Box 1, 94035, Moffet Field, CA, USA
Matthew Knudson
School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, Ontario, Canada
Marek Grześ

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kemmerich, T., Kleine Büning, H. (2012). A Convergent Multiagent Reinforcement Learning Approach for a Subclass of Cooperative Stochastic Games. In: Vrancx, P., Knudson, M., Grześ, M. (eds) Adaptive and Learning Agents. ALA 2011. Lecture Notes in Computer Science(), vol 7113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28499-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-28499-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28498-4
Online ISBN: 978-3-642-28499-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics