Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning

Banerjee, Dipyaman; Sen, Sandip

doi:10.1007/s10458-007-0020-8

Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning

Published: 30 April 2007

Volume 15, pages 91–108, (2007)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Dipyaman Banerjee¹ &
Sandip Sen¹

632 Accesses
31 Citations
Explore all metrics

Abstract

We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analytical Model of Active Inference in the Iterated Prisoner’s Dilemma

Efficiently detecting switches against non-stationary opponents

Article 26 November 2016

Robust program equilibrium

Article Open access 23 November 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bowling M.H. and Veloso M.M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence 136(2): 215–250
Article MATH MathSciNet Google Scholar
Bowling M.H. and Veloso M.M. (2004). Existence of multiagent equilibria with limited agents. Journal of Artificial Intelligence Res. (JAIR) 22: 353–384
MATH MathSciNet Google Scholar
Brams S.J. (1994). Theory of moves. Cambridge University Press, Cambridge, UK
MATH Google Scholar
Brown G.W. (1951). Iterative solution of games by fictiious play. In activity analysis of production and allocation. Wiley, New York
Google Scholar
Claus, C., & Boutilier, C. (1997). The dynamics of reinforcement learning in cooperative multiagent systems. In Collected papers from AAAI-97 workshop on Multiagent Learning, (pp. 13–18). AAAI.
Conitzer, V., &Sandholm, T. (2003). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In ICML, (pp. 83–90).
Crandall, J. W., & Goodrich, M. A. (2005). Learning to compete, compromise, and cooperate in repeated general-sum games. In Proceedings of the nineteenth international conference on machine learning, pp. 161–168.
de Farias, D. P., & Megiddo, N. (2003). How to combine expert (and novice) advice when actions impact the environment? In NIPS.
Fudenberg D. and Levinem K. (1998). The theory of learning in games. MIT Press, Cambridge, MA
MATH Google Scholar
Greenwald, A. R., & Hall, K. (2003). Correlated q-learning. In ICML, pp. 242–249.
Greenwald, A. R., & Jafari, A. (2003). A general class of no-regret learning algorithms and game-theoretic equilibria. In COLT, pp. 2–12.
Hu J. and Wellman M.P. (2003). Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4: 1039–1069
Article MathSciNet Google Scholar
Kalai, A., & Vempala, S. (2002). Geometric algorithms for online optimization. Technical Report MIT-LCS-TR-861, MIT Laboratory for Computer Science.
Kapetanakis, S., Kudenko, D., & Strens, M. (2004). Learning of coordination in cooperative multi-agent systems using commitment sequences. Artificial Intelligence and the Simulation of Behavior, 1(5).
Littlestone, N., & Warmuth, M. K. (1989). The weighted majority algorithm. In IEEE symposium on foundations of computer science, pp. 256–261.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, (pp. 157–163). San Mateo, CA: Morgan Kaufmann.
Littman, N. L. (2001). Friend-or-foe q-learning in general-sum games. In Proceedings of the eighteenth international conference on machine learning, (pp. 322–328) San Francisco, CA: Morgan Kaufmann.
Littman, M. L., & Stone, P. (2001). Implicit negotiation in repeated games. In Intelligent agents VIII: Agent theories, architecture, and languages, pp. 393–404.
Littman M.L. and Stone P. (2005). A polynomial-time nash equilibrium algorithm for repeated games. Decision Support System 39: 55–66
Article Google Scholar
Mundhe, M., & Sen, S. (1999). Evaluating concurrent reinforcement learners. IJCAI-99 workshop on agents that learn about, from and with other agents.
Panait L. and Luke S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3): 387–434
Article Google Scholar
Sandholm T.W. and Crites R.H. (1995). Multiagent reinforcement learning and iterated prisoner’s dilemma. Biosystems Journal 37: 147–166
Article Google Scholar
Sekaran, M., & Sen, S. (1994). Learning with friends and foes. In Sixteenth annual conference of the cognitive science society, (pp. 800–805). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
Sen, S., Mukherjee, R., & Airiau, S. (2003). Towards a pareto-optimal solution in general-sum games. In Proceedings of the second intenational joint conference on autonomous agents and multiagent systems (pp. 153–160). New York, NY: ACM Press.
Mas-Colell A. and Hart S. (2001). A general class of adaptive strategies. Journal of Economic Theory 98(1): 26–54
Article MATH MathSciNet Google Scholar
Singh, S. P., Kearns, M. J., & Mansour, Y. (2000) Nash convergence of gradient dynamics in general-sum games. In UAI, pp. 541–548.
Stimpson, J. L., Goodrich, M. A., & Walters, L. C. (2001) Satisficing and learning cooperation in the prisoner’s dilemma. In Proceedings of the seventeenth international joint conference on artificial intelligence, pp. 535–540.
Tuyls K. and Nowé A. (2006). Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review 20(1): 63–90
Article Google Scholar
Verbeeck, K., Nowé, A., Lenaerts, T., & Parentm, J. (2002). Learning to reach the pareto optimal nash equilibrium as a team. In LNAI 2557: Proceedings of the fifteenth Australian joint conference on artificial intelligence, Vol. (pp. 407–418). Springer-Verlag.
Vidal J.M. and Durfee E.H. (2003). Predicting the expected behavior of agents that learn about agents: the CLRI framework. Autonomous Agents and Multi-Agent Systems 6(1): 77–107
Article Google Scholar
Weiß, G. Learning to coordinate actions in multi-agent systems. In Proceedings of the international joint conference on artificial intelligence, pp. 311–316, August 1993.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Tulsa, Tulsa, OK, USA
Dipyaman Banerjee & Sandip Sen

Authors

Dipyaman Banerjee
View author publications
You can also search for this author inPubMed Google Scholar
Sandip Sen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dipyaman Banerjee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, D., Sen, S. Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning. Auton Agent Multi-Agent Syst 15, 91–108 (2007). https://doi.org/10.1007/s10458-007-0020-8

Download citation

Published: 30 April 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s10458-007-0020-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Analytical Model of Active Inference in the Iterated Prisoner’s Dilemma

Efficiently detecting switches against non-stationary opponents

Robust program equilibrium

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now