Generalized multiagent learning with performance bound

Banerjee, Bikramjit; Peng, Jing

doi:10.1007/s10458-007-9013-x

Generalized multiagent learning with performance bound

Published: 08 June 2007

Volume 15, pages 281–312, (2007)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Bikramjit Banerjee¹ &
Jing Peng¹

149 Accesses
12 Citations
Explore all metrics

Abstract

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV _σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV _σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV _σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV _σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-arm bandit problem. In Proceedings of the thirtysixth annual symposium on foundations of computer science (pp. 322–331). Milwaukee, WI: IEEE Computer Society Press.
Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic interactions. In Proceedings of the nineteenth national conference on artificial intelligence (AAAI-04) (pp. 2–7). San Jose, CA: AAAI Press.
Bowling, M. (2005). Convergence and no-regret in multiagent learning. In Proceedings of NIPS 2004/5.
Bowling, M., & Veloso, M. (2001). Rational and convergent learning in stochastic games. In Proceedings of the seventeenth international joint conference on artificial intelligence (pp. 1021–1026). Seattle, WA.
Bowling M., Veloso M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence 136: 215–250
Article MATH MathSciNet Google Scholar
Brafman R.I., Tennenholtz M. (2002). R-max - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3: 213–231
Article MathSciNet Google Scholar
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the fifteenth national conference on artificial intelligence (pp. 746–752). Menlo Park, CA: AAAI Press/MIT Press.
Conitzer, V., & Sandholm, T. (2003). AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the twentieth international conference on machine learning.
Cover, T. M., & Thomas, J. A. (1991). Elements of Information Theory. Wiley.
Flaxman, A., Kalai, A., & McMahan, H. B. (2005). Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on discrete algorithms (SODA), (To appear)
Freund Y., Schapire R.E. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior 29: 79–103
Article MATH MathSciNet Google Scholar
Fudenberg D., Levine D.K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics and Control 19: 1065–1089
Article MATH MathSciNet Google Scholar
Fudenberg D., Levine K. (1998). The theory of learning in games. Cambridge, MA, MIT Press
MATH Google Scholar
Greenwald, A., & Hall, K. (2002). Correlated q-learning. In Proceedings of the AAAI symposium on collaborative learning agents.
Hart S., Mas-Colell A. (2003) Uncoupled dynamics do not lead to nash equilibrium. American Economic Review 93(3): 1830–1836
Article Google Scholar
Hu J., Wellman M.P. (2003). Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4: 1039–1069
Article MathSciNet Google Scholar
Jafari, A., Greenwald, A., Gondek, D., & Ercal, G. (2001). On no-regret learning, fictitious play, and nash equilibrium. In Proceedings of the eighteenth international conference on machine learning, pp. 226–223.
Littlestone N., Warmuth M. (1994). The weighted majority algorithm. Information and Computation 108: 212–261
Article MATH MathSciNet Google Scholar
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, (pp. 157–163). San Mateo, CA: Morgan Kaufmann.
Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. In Proceedings of the eighteenth international conference on machine learnig, Williams College, MA, USA.
Littman, M. L., & Szepesvari, C. (1996). A generalized reinforcement learning model: Convergence and applications. In Proceedings of the 13th international conference on machine learning, pp. 310–318.
Nash J.F. (1951). Non-cooperative games. Annals of Mathematics 54: 286–295
Article MathSciNet Google Scholar
Nowak M., Sigmund K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature 364: 56–58
Article Google Scholar
Owen G. (1995). Game Theory. UK, Academic Press
Google Scholar
Posch, M., & Brannath, W. (1997). Win-stay, lose-shift. A general learning rule for repeated normal form games. In Proceedings of the third international conference on computing in economics and finance, Stanford, CA, June 30–July 2, 1997.
Powers, R., & Shoham, Y. (2005). New criteria and a new algorithm for learning in multi-agent systems. In Proceedings of NIPS 2004/5.
Sandholm T., Crites R. (1996). On multiagent Q-learning in a semi-competitive domain. In G. Weiß & S. Sen, (Eds.) Adaptation and learning in multi-agent systems. pp. 191–205, Springer-Verlag.
Sen, S., Sekaran, M., & Hale, J. (1994). Learning to coordinate without sharing information. In National conference on artificial intelligence, p. 426–431, Menlo Park, CA: AAAI Press/MIT Press. (Also published in READINGS in AGENTS, Michael Huhns, N, and Munindar Singh (Editors), p. 509–514, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998.).
Shapley, L. S. (1974). A note on the lemke howson algorithm. Mathematical programming study 1: Pivoting and extensions, pp. 175–189.
Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence, pp. 541–548.
Sutton, R., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330–337.
Tesauro, G. (2004). Extending Q-learning to general adaptive multi-agent systems. In S. Thrun, L. Saul, & B. Schölkopf, (Eds), Advances in neural information processing systems Vol. 16. Cambridge, MA: MIT Press.
Wang, X., & Sandholm, T. (2002). Reinforcement learning to play an optimal nash equilibrium in team markov games. In Advances in neural information processing systems 15, NIPS.
Weinberg, M., & Rosenschein, J. S. (2004). Best-Response multiagent learning in non-stationary environments. In Proceedings of the third international joint conference on autonomous agents and multiagent systems (AAMAS), (vol. 2, pp. 506–513, New York, NY: ACM.
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the twentieth international conference on machine learning, Washington DC.

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering & Computer Science, Tulane University, New Orleans, 70118, LA, USA
Bikramjit Banerjee & Jing Peng

Authors

Bikramjit Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Jing Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bikramjit Banerjee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, B., Peng, J. Generalized multiagent learning with performance bound. Auton Agent Multi-Agent Syst 15, 281–312 (2007). https://doi.org/10.1007/s10458-007-9013-x

Download citation

Published: 08 June 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10458-007-9013-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized multiagent learning with performance bound

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized multiagent learning with performance bound

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation