research-article

Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner's dilemmas

Authors:

Matthew TaylorAuthors Info & Claims

DAI '19: Proceedings of the First International Conference on Distributed Artificial Intelligence

Article No.: 11, Pages 1 - 7

https://doi.org/10.1145/3356464.3357712

Published: 13 October 2019 Publication History

Abstract

The Iterated Prisoner's Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner's dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner's Dilemma (SPD) game to better capture the aforementioned characteristics. In this work, we propose a deep multiagent reinforcement-learning approach that investigates the evolution of mutual cooperation in SPD games. Our approach consists of two phases. The first phase is offline: it synthesizes policies with different cooperation degrees and then trains a cooperation degree detection network. The second phase is online: an agent adaptively selects its policy based on the detected degree of opponent cooperation. The effectiveness of our approach is demonstrated in two representative SPD 2D games: the Apple-Pear game and the Fruit Gathering game. Experimental results show that our strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents.

References

[1]

Robert Axelrod. 1984. The evolution of cooperation.

[2]

Dipyaman Banerjee and Sandip Sen. 2007. Reaching Pareto-optimality in prisoner's dilemma using conditional joint action learning. Autonomous Agents and Multi-Agent Systems 15, 1 (2007), 91--108.

Digital Library

[3]

Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008).

Digital Library

[4]

Jacob W Crandall. 2012. Just add Pepper: extending learning algorithms for repeated matrix games to repeated markov games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 399--406.

[5]

Jacob W Crandall and Michael A Goodrich. 2005. Learning to teach and follow in repeated games. In AAAI workshop on Multiagent Learning.

[6]

Steven Damer and Maria L Gini. 2008. Achieving Cooperation in a Minimally Constrained Environment. In AAAI. 57--62.

[7]

Mohamed Elidrisi, Nicholas Johnson, Maria Gini, and Jacob Crandall. 2014. Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1141--1148.

[8]

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 (2017).

[9]

Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip Torr, Pushmeet Kohli, Shimon Whiteson, et al. 2017. Stabilising experience replay for deep multi-agent reinforcement learning. arXiv preprint arXiv:1702.08887 (2017).

[10]

Jianye Hao and Ho-fung Leung. 2015. Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality. Autonomous Agents and Multi-Agent Systems 29, 4 (2015), 658--682.

Digital Library

[11]

Pablo Hernandez-Leal and Michael Kaisers. 2017. Towards a fast detection of opponents in repeated stochastic games. In The Workshop on Transfer in Reinforcement Learning.

[12]

Pablo Hernandez-Leal, Benjamin Rosman, Matthew E Taylor, L Enrique Sucar, and Enrique Munoz de Cote. 2016. A Bayesian approach for learning and tracking switching, non-stationary opponents. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 1315--1316.

[13]

Pablo Hernandez-Leal, Matthew E Taylor, Benjamin Rosman, L Enrique Sucar, and Enrique Munoz De Cote. 2016. Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach. In Multiagent Interaction without Prior Coordination Workshop at AAAI.

[14]

Junling Hu and Michael P Wellman. 2003. Nash Q-learning for general-sum stochastic games. Journal of machine learning research 4, Nov (2003), 1039--1069.

[15]

Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. 2017. Multi-agent Reinforcement Learning in Sequential Social Dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 464--473.

Digital Library

[16]

Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, Vol. 157. 157--163.

Digital Library

[17]

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv preprint arXiv:1706.02275 (2017).

[18]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[19]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.

[20]

Martin Nowak and Karl Sigmund. 1993. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game. Nature 364, 6432 (1993), 56--58.

[21]

Julien Perolat, Joel Z Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, and Thore Graepel. 2017. A multi-agent reinforcement learning model of common-pool resource appropriation. arXiv preprint arXiv:1707.06600 (2017).

[22]

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015).

[23]

Peter Stone and Manuela Veloso. 2000. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8, 3 (2000), 345--383.

Digital Library

[24]

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).

[25]

Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.

[26]

Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 (2016).

[27]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256.

Cited By

Guo HXu YMa YXu SLi Z(2023)Pursuit Path Planning for Multiple Unmanned Ground Vehicles Based on Deep Reinforcement LearningElectronics10.3390/electronics1223475912:23(4759)Online publication date: 23-Nov-2023
https://doi.org/10.3390/electronics12234759
Ivanov DEgorov VShpilman ADignum FLomuscio AEndriss UNowé A(2021)Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive EnvironmentsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464151(1536-1538)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464151
Lyu XXiao YDaley BAmato CDignum FLomuscio AEndriss UNowé A(2021)Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement LearningProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464053(844-852)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464053

Index Terms

Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner's dilemmas
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Multi-agent Reinforcement Learning in Sequential Social Dilemmas
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. ...
Finding Cooperation in the N-Player Iterated Prisoner's Dilemma with Deep Reinforcement Learning Over Dynamic Complex Networks
Abstract
Biological, social and economical systems expose enormous levels of complexity, and studying situations of cooperation of conflict encompassing such systems is of particular interest. The N-Player Iterated Prisoner's Dilemma (NIPD) is a general ...
Risk consideration and cooperation in the iterated prisoner's dilemma

This paper investigates the cooperative behavior in the two-player iterated prisoner's dilemma (IPD) game with the consideration of income stream risk. The standard deviation of one-move payoffs for players is defined for measuring the income stream ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

DAI '19: Proceedings of the First International Conference on Distributed Artificial Intelligence

October 2019

85 pages

ISBN:9781450376563

DOI:10.1145/3356464

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAI '19

DAI '19: First International Conference on Distributed Artificial Intelligence

October 13 - 15, 2019

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
427
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Guo HXu YMa YXu SLi Z(2023)Pursuit Path Planning for Multiple Unmanned Ground Vehicles Based on Deep Reinforcement LearningElectronics10.3390/electronics1223475912:23(4759)Online publication date: 23-Nov-2023
https://doi.org/10.3390/electronics12234759
Ivanov DEgorov VShpilman ADignum FLomuscio AEndriss UNowé A(2021)Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive EnvironmentsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464151(1536-1538)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464151
Lyu XXiao YDaley BAmato CDignum FLomuscio AEndriss UNowé A(2021)Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement LearningProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464053(844-852)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464053

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten