skip to main content
10.1145/3356464.3357712acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdaiConference Proceedingsconference-collections
research-article

Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner's dilemmas

Published: 13 October 2019 Publication History

Abstract

The Iterated Prisoner's Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner's dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner's Dilemma (SPD) game to better capture the aforementioned characteristics. In this work, we propose a deep multiagent reinforcement-learning approach that investigates the evolution of mutual cooperation in SPD games. Our approach consists of two phases. The first phase is offline: it synthesizes policies with different cooperation degrees and then trains a cooperation degree detection network. The second phase is online: an agent adaptively selects its policy based on the detected degree of opponent cooperation. The effectiveness of our approach is demonstrated in two representative SPD 2D games: the Apple-Pear game and the Fruit Gathering game. Experimental results show that our strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents.

References

[1]
Robert Axelrod. 1984. The evolution of cooperation.
[2]
Dipyaman Banerjee and Sandip Sen. 2007. Reaching Pareto-optimality in prisoner's dilemma using conditional joint action learning. Autonomous Agents and Multi-Agent Systems 15, 1 (2007), 91--108.
[3]
Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008).
[4]
Jacob W Crandall. 2012. Just add Pepper: extending learning algorithms for repeated matrix games to repeated markov games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 399--406.
[5]
Jacob W Crandall and Michael A Goodrich. 2005. Learning to teach and follow in repeated games. In AAAI workshop on Multiagent Learning.
[6]
Steven Damer and Maria L Gini. 2008. Achieving Cooperation in a Minimally Constrained Environment. In AAAI. 57--62.
[7]
Mohamed Elidrisi, Nicholas Johnson, Maria Gini, and Jacob Crandall. 2014. Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1141--1148.
[8]
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 (2017).
[9]
Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip Torr, Pushmeet Kohli, Shimon Whiteson, et al. 2017. Stabilising experience replay for deep multi-agent reinforcement learning. arXiv preprint arXiv:1702.08887 (2017).
[10]
Jianye Hao and Ho-fung Leung. 2015. Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality. Autonomous Agents and Multi-Agent Systems 29, 4 (2015), 658--682.
[11]
Pablo Hernandez-Leal and Michael Kaisers. 2017. Towards a fast detection of opponents in repeated stochastic games. In The Workshop on Transfer in Reinforcement Learning.
[12]
Pablo Hernandez-Leal, Benjamin Rosman, Matthew E Taylor, L Enrique Sucar, and Enrique Munoz de Cote. 2016. A Bayesian approach for learning and tracking switching, non-stationary opponents. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 1315--1316.
[13]
Pablo Hernandez-Leal, Matthew E Taylor, Benjamin Rosman, L Enrique Sucar, and Enrique Munoz De Cote. 2016. Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach. In Multiagent Interaction without Prior Coordination Workshop at AAAI.
[14]
Junling Hu and Michael P Wellman. 2003. Nash Q-learning for general-sum stochastic games. Journal of machine learning research 4, Nov (2003), 1039--1069.
[15]
Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. 2017. Multi-agent Reinforcement Learning in Sequential Social Dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 464--473.
[16]
Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, Vol. 157. 157--163.
[17]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv preprint arXiv:1706.02275 (2017).
[18]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[19]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
[20]
Martin Nowak and Karl Sigmund. 1993. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game. Nature 364, 6432 (1993), 56--58.
[21]
Julien Perolat, Joel Z Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, and Thore Graepel. 2017. A multi-agent reinforcement learning model of common-pool resource appropriation. arXiv preprint arXiv:1707.06600 (2017).
[22]
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015).
[23]
Peter Stone and Manuela Veloso. 2000. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8, 3 (2000), 345--383.
[24]
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).
[25]
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.
[26]
Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 (2016).
[27]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256.

Cited By

View all
  • (2023)Pursuit Path Planning for Multiple Unmanned Ground Vehicles Based on Deep Reinforcement LearningElectronics10.3390/electronics1223475912:23(4759)Online publication date: 23-Nov-2023
  • (2021)Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive EnvironmentsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464151(1536-1538)Online publication date: 3-May-2021
  • (2021)Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement LearningProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464053(844-852)Online publication date: 3-May-2021

Index Terms

  1. Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner's dilemmas

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      DAI '19: Proceedings of the First International Conference on Distributed Artificial Intelligence
      October 2019
      85 pages
      ISBN:9781450376563
      DOI:10.1145/3356464
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 October 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. deep multiagent reinforcement learning
      2. mutual cooperation
      3. opponent model
      4. sequential prisoner's dilemmas

      Qualifiers

      • Research-article

      Conference

      DAI '19

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)39
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Pursuit Path Planning for Multiple Unmanned Ground Vehicles Based on Deep Reinforcement LearningElectronics10.3390/electronics1223475912:23(4759)Online publication date: 23-Nov-2023
      • (2021)Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive EnvironmentsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464151(1536-1538)Online publication date: 3-May-2021
      • (2021)Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement LearningProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464053(844-852)Online publication date: 3-May-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media