Anytime Self-play Learning to Satisfy Functional Optimality Criteria

Burkov, Andriy; Chaib-draa, Brahim

doi:10.1007/978-3-642-04428-1_39

Andriy Burkov²¹ &
Brahim Chaib-draa²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5783))

Included in the following conference series:

International Conference on Algorithmic Decision Theory

1072 Accesses

Abstract

We present an anytime multiagent learning approach to satisfy any given optimality criterion in repeated game self-play. Our approach is opposed to classical learning approaches for repeated games: namely, learning of equilibrium, Pareto-efficient learning, and their variants. The comparison is given from a practical (or engineering) standpoint, i.e., from a point of view of a multiagent system designer whose goal is to maximize the system’s overall performance according to a given optimality criterion. Extensive experiments in a wide variety of repeated games demonstrate the efficacy of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

Multi-objective Adaptation of a Parameterized GVGAI Agent Towards Several Games

References

Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)
Article MathSciNet Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of AAAI 1998 (1998)
Google Scholar
Hu, J., Wellman, M.: Nash Q-learning for general-sum stochastic games. Journal of ML Research 4, 1039–1069 (2003)
MathSciNet MATH Google Scholar
Banerjee, B., Peng, J.: Performance bounded reinforcement learning in strategic interactions. In: Proceedings of AAAI 2004 (2004)
Google Scholar
Greenwald, A.: Correlated-Q learning. In: AAAI Spring Symposium (2003)
Google Scholar
Crandall, J., Goodrich, M.: Learning to compete, compromise, and cooperate in repeated general-sum games. In: Proceedings ICML 2005 (2005)
Google Scholar
Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. Decision Support Systems 39(1), 55–66 (2005)
Article Google Scholar
Nash, J.: The Bargaining Problem. Econometrica 18(2), 155–162 (1950)
Article MathSciNet Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
de Farias, D., Megiddo, N., Cambridge, M., San Jose, C.: Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments. In: Advances in Neural Information Processing Systems 17: Proceedings of The 2004 Conference. MIT Press, Cambridge (2005)
Google Scholar
Singh, S., Jaakkola, T., Littman, M., Szepesvári, C.: Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. Machine Learning 38(3), 287–308 (2000)
Article Google Scholar
Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: A bayesian approach. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2003), Melbourne, Australia (2003)
Google Scholar
Brams, S.: Theory of Moves. American Scientist 81(6), 562–570 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Laval University, Quebec, QC, Canada
Andriy Burkov & Brahim Chaib-draa

Authors

Andriy Burkov
View author publications
You can also search for this author in PubMed Google Scholar
Brahim Chaib-draa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Pure and Applied Mathematics, University of Padova, Via Trieste 63, 35121, Padova, Italy
Francesca Rossi
LAMSADE-CNRS, University of Paris Dauphine, Place du Maréchal De Lattre de Tassigny, 75775, Paris Cedex 16, France
Alexis Tsoukias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burkov, A., Chaib-draa, B. (2009). Anytime Self-play Learning to Satisfy Functional Optimality Criteria. In: Rossi, F., Tsoukias, A. (eds) Algorithmic Decision Theory. ADT 2009. Lecture Notes in Computer Science(), vol 5783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04428-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-04428-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04427-4
Online ISBN: 978-3-642-04428-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics