ABSTRACT
We present a new multiagent learning algorithm (RVσ(t) that can guarantee both no-regret performance (all games) and policy convergence (some games of arbitrary size). Unlike its predecessor ReDVaLeR, it (1) does not need to distinguish whether its opponents are self-play or otherwise non-stationary, (2) is allowed to know its portion of any equilibrium that, we argue, leads to convergence in some games in addition to no-regret. Although the regret of RVσ(t) is analyzed in continuous time, we show that it grows slower than in other no-regret techniques like GIGA and GIGA-WoLF. We show that RVσ(t) can converge to coordinated behavior in coordination games, while GIGA, GIGA-WoLF may converge to poorly coordinated (mixed) behaviors.
- B. Banerjee and J. Peng. Performance bounded reinforcement learning in strategic intercations. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04), pages 2--7, San Jose, CA, 2004. AAAI Press. Google ScholarDigital Library
- B. Banerjee and J. Peng. Convergence of no-regret learning in multiagent systems. In Proceedings of the First International Workshop on Learning and Adaptation in Multiagent Systems (LAMAS), Utrecht, The Netherlands, 2005. Held in conjunction with AAMAS-05.Google Scholar
- M. Bowling. Convergence and no-regret in multiagent learning. In Proceedings of NIPS 2004/5, 2005.Google Scholar
- M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artificial Intelligence, 136:215--250, 2002. Google ScholarDigital Library
- V. Conitzer and T. Sandholm. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the 20th International Conference on Machine Learning, 2003.Google Scholar
- M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, Washington DC, 2003.Google Scholar
Recommendations
Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other's actions but not the payoffs received by the other player. The concept of Nash ...
Game of Thrones: Fully Distributed Learning for Multiplayer Bandits
We consider an N-player multiarmed bandit game in which each player chooses one out of M arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed or Markovian. ...
Game theory-based opponent modeling in large imperfect-information games
AAMAS '11: The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2We develop an algorithm for opponent modeling in large extensive-form games of imperfect information. It works by observing the opponent's action frequencies and building an opponent model by combining information from a precomputed equilibrium strategy ...
Comments