Abstract
We present an anytime multiagent learning approach to satisfy any given optimality criterion in repeated game self-play. Our approach is opposed to classical learning approaches for repeated games: namely, learning of equilibrium, Pareto-efficient learning, and their variants. The comparison is given from a practical (or engineering) standpoint, i.e., from a point of view of a multiagent system designer whose goal is to maximize the system’s overall performance according to a given optimality criterion. Extensive experiments in a wide variety of repeated games demonstrate the efficacy of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of AAAI 1998 (1998)
Hu, J., Wellman, M.: Nash Q-learning for general-sum stochastic games. Journal of ML Research 4, 1039–1069 (2003)
Banerjee, B., Peng, J.: Performance bounded reinforcement learning in strategic interactions. In: Proceedings of AAAI 2004 (2004)
Greenwald, A.: Correlated-Q learning. In: AAAI Spring Symposium (2003)
Crandall, J., Goodrich, M.: Learning to compete, compromise, and cooperate in repeated general-sum games. In: Proceedings ICML 2005 (2005)
Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. Decision Support Systems 39(1), 55–66 (2005)
Nash, J.: The Bargaining Problem. Econometrica 18(2), 155–162 (1950)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
de Farias, D., Megiddo, N., Cambridge, M., San Jose, C.: Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments. In: Advances in Neural Information Processing Systems 17: Proceedings of The 2004 Conference. MIT Press, Cambridge (2005)
Singh, S., Jaakkola, T., Littman, M., Szepesvári, C.: Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. Machine Learning 38(3), 287–308 (2000)
Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: A bayesian approach. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2003), Melbourne, Australia (2003)
Brams, S.: Theory of Moves. American Scientist 81(6), 562–570 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burkov, A., Chaib-draa, B. (2009). Anytime Self-play Learning to Satisfy Functional Optimality Criteria. In: Rossi, F., Tsoukias, A. (eds) Algorithmic Decision Theory. ADT 2009. Lecture Notes in Computer Science(), vol 5783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04428-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-04428-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04427-4
Online ISBN: 978-3-642-04428-1
eBook Packages: Computer ScienceComputer Science (R0)