Exploring selfish reinforcement learning in repeated games with stochastic rewards

Verbeeck, Katja; Nowé, Ann; Parent, Johan; Tuyls, Karl

doi:10.1007/s10458-006-9007-0

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Published: 10 November 2006

Volume 14, pages 239–269, (2007)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Katja Verbeeck¹,
Ann Nowé¹,
Johan Parent¹ &
…
Karl Tuyls²

344 Accesses
25 Citations
Explore all metrics

Abstract

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Article 01 December 2023

Learning in the Presence of Multiple Agents

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

References

Aumann R. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1, 67–96
Article MATH MathSciNet Google Scholar
Brafman R., Tennenholtz M. (2003). Learning to coordinate efficiently: A model-based approach. Journal on Artificial Intelligence Research (JAIR), 19, 11–23
MATH MathSciNet Google Scholar
Carpenter, M., & Kudenko, D. (2004). Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 4th symposium on adaptive agents and multi-agent systems, (AISB04) Society for the study of Artificial Intelligence and Simulation of Behaviour (pp. 10–19).
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th national conference on artificial intelligence (pp. 746–752).
Geist, A., & Beguelin, A. (1994). PVM: Parallel virtual machine. MIT Press.
Gintis H. (2000). Game theory evolving: A problem-centered introduction to modeling strategic behavior. Princeton, New Jersey, Princeton University Press
Google Scholar
Greenwald, A., & Hall, K. (2003). Correlated q-learning. In Proceedings of the twentieth international conference on machine learning (pp. 242–249).
Hu J., Wellman M. (2003). Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039–1069
Article MathSciNet Google Scholar
Kapetanakis, S., & Kudenko, D. (2002). Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 18th national conference on artificial intelligence (pp. 326–331).
Kapetanakis, S., Kudenko, D., & Strens, M. (2003). Learning to coordinate using commitment sequences in cooperative multi-agent systems. In Proceedings of the 3rd symposium on adaptive agents and multi-agent systems, (AISB03) Society for the study of Artificial Intelligence and Simulation of Behaviour.
Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (pp. 535–542).
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (pp. 322–328).
Littman, M. (2001). Friend-or-foe q-learning in general-sum games. In Proceedings of the 18th international conference on machine learning (pp. 157–163).
Littman, M., & Szepesvári, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In Proceedings of the 13th international conference on machine learning (pp. 310–318).
Narendra, K., & Thathachar, M. (1989). Learning automata: An introduction. Prentice-Hall International, Inc.
Nash, J. (1950). Equilibrium points in n-person games. Proceedings of the national academy of siences 36, 48–49.
Google Scholar
Nowé, A., Parent, J., & Verbeeck, K. (2001). Social agents playing a periodical policy. In Proceedings of the 12th European conference on machine learning pp. 382–393. Freiburg, Germany: Springer-Verlag LNAI2168.
Osborne J., Rubinstein A. (1994). A course in game theory. Cambridge, MA, MIT Press
Google Scholar
Samuelson L. (1997). Evolutionary games and equilibrium selection. Cambridge, MA, MIT Press
MATH Google Scholar
Sastry P., Phansalkar V., Thathachar M. (1994). Decentralized learning of nash equilibria in multi-person stochastic games with incomplete information. IEEE Transactions on Systems, Man, and Cybernetics, 24(5):769–777
Article MathSciNet Google Scholar
Sutton R., Barto A. (1998). Reinforcement learning: An introduction. Cambridge, MA, MIT Press
Google Scholar
Thathachar M., Sastry P. (2002). Varieties of learning automata: An overview. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 32(6):711–722
Article Google Scholar
Tsitsiklis J. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 16, 185–202
MATH Google Scholar
Tuyls, K. (2004). Multiagent reinforcement learning: A game theoretic approach. PhD Thesis, Computational Modeling Lab, Vrije Universiteit Brussel, Belgium.
Verbeeck, K. (2004). Coordinated exploration in multi-agent reinforcement learning. PhD Thesis, Computational Modeling Lab, Vrije Universiteit Brussel, Belgium.
Verbeeck, K., Nowé, A., & Parent, J. (2002). Homo egualis reinforcement learning agents for load balancing. In Proceedings of the 1st NASA workshop on radical agent concepts, pp. 81–91. Springer-Verlag LNAI 2564.

Download references

Author information

Authors and Affiliations

Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium
Katja Verbeeck, Ann Nowé & Johan Parent
Institute for Knowledge and Agent Technology (IKAT), University of Maastricht, Maastricht, The Netherlands
Karl Tuyls

Authors

Katja Verbeeck
View author publications
You can also search for this author in PubMed Google Scholar
Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar
Johan Parent
View author publications
You can also search for this author in PubMed Google Scholar
Karl Tuyls
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katja Verbeeck.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verbeeck, K., Nowé, A., Parent, J. et al. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Auton Agent Multi-Agent Syst 14, 239–269 (2007). https://doi.org/10.1007/s10458-006-9007-0

Download citation

Published: 10 November 2006
Issue Date: June 2007
DOI: https://doi.org/10.1007/s10458-006-9007-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Abstract

Access this article

Similar content being viewed by others

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Learning in the Presence of Multiple Agents

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Abstract

Access this article

Similar content being viewed by others

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Learning in the Presence of Multiple Agents

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation