ABSTRACT
Multiagent learning literature has investigated iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we work with agents who select actions based on expected utility calculations that incorporates the observed frequencies of the actions of the opponent(s). We augment this stochastically-greedy agents with an interesting action revelation strategy that involves strategic revealing of one's action to avoid worst-case, pessimistic moves. We argue that in certain situations, such apparently risky revealing can indeed produce better payoff than a non-revealing approach. In particular, it is possible to obtain Pareto-optimal solutions that dominate Nash Equilibrium. We present results over a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.
- Michael Bowling and Manuela Veloso. Rational and convergent learning in stochastic games. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 1021--1026, 2001. Google ScholarDigital Library
- Steven J. Brams. Theory of Moves. Cambridge University Press, Cambridge: UK, 1994.Google Scholar
- Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 746--752, Menlo Park, CA, 1998. AAAI Press/MIT Press. Google ScholarDigital Library
- Junling Hu and Michael P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Jude Shavlik, editor, Proceedings of the Fifteenth International Conference on Machine Learning, pages 242--250, San Francisco, CA, 1998. Morgan Kaufmann. Google ScholarDigital Library
- Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157--163, San Mateo, CA, 1994. Morgan Kaufmann.Google ScholarDigital Library
- Michael L. Littman. Friend-or-foe q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 322--328, San Francisco: CA, 2001. Morgan Kaufmann. Google ScholarDigital Library
- Michael L. Littman and Peter Stone. Implicit negotiation in repeated games. In Intelligent Agents VIII: AGENT THEORIES, ARCHITECTURE, AND LANGUAGES, pages 393--404, 2001. Google ScholarDigital Library
- R. Duncan Luce and Howard Raiffa. Games and Decisions: Introduction and Critical Survey. Dover, New York, NY, 1957.Google Scholar
- O. L. Mangasarian and H. Stone. Two-person nonzero-sum games and quadratic programming. Journal of Mathematical Analysis and Applications, 9:348--355, 1964.Google ScholarCross Ref
- J. Nachbar. Prediction, optimization and learning in repeated games. Econometrica, 65:275--309, 1997.Google ScholarCross Ref
- John F. Nash. Non-cooperative games. Annals of Mathematics, 54:286--295, 1951.Google ScholarCross Ref
- Jeff L. Stimpson, Michael A. Goodrich, and Lawrence C. Walters. Satisficing and learning cooperation in the prisoner's dilemma. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 535--540, 2001. Google ScholarDigital Library
- C. J. C. H. Watkins and P. D. Dayan. Q-learning. Machine Learning, 3:279--292, 1992. Google ScholarDigital Library
Index Terms
- Towards a pareto-optimal solution in general-sum games
Recommendations
Repeated zero-sum games with budget
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2When a zero-sum game is played once, a risk-neutral player will want to maximize his expected outcome in that single play. However, if that single play instead only determines how much one player must pay to the other, and the same game must be played ...
A near-optimal strategy for a heads-up no-limit Texas Hold'em poker tournament
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systemsWe analyze a heads-up no-limit Texas Hold'em poker tournament with a fixed small blind of 300 chips, a fixed big blind of 600 chips and a total amount of 8000 chips on the table (until recently, these parameters defined the heads-up endgame of sit-n-go ...
Comments