Skip to main content

Centralized Norm Enforcement in Mixed-Motive Multiagent Reinforcement Learning

  • Conference paper
  • First Online:
Coordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems XV (COINE 2022)

Abstract

Mixed-motive games comprise a subset of games in which individual and collective incentives are not entirely aligned. These games are relevant because they frequently occur in real-world and artificial societies, and their outcome is often bad for the involved parties. Institutions and norms offer a good solution for governing mixed-motive systems. Still, they are usually incorporated into the system in a distributed fashion, or they are not able to dynamically adjust to the needs of the environment at run-time. We propose a way of reaching socially good outcomes in mixed-motive multiagent reinforcement learning settings by enhancing the environment with a normative system controlled by an external reinforcement learning agent. By adopting this proposal, we show it is possible to reach social welfare in a mixed-motive system of self-interested agents using only traditional reinforcement learning agent architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.bbc.com/news/business-34324772.

  2. 2.

    https://nypost.com/article/what-is-cancel-culture-breaking-down-the-toxic-online-trend/.

  3. 3.

    By traditional RL agent architectures we mean commonly used in other RL tasks such as A2C [21].

  4. 4.

    All relevant code and data for this project is available at https://github.com/rafacheang/social_dilemmas_regulation.

  5. 5.

    https://www.nobelprize.org/prizes/economic-sciences/2009/ostrom/facts/.

References

  1. Bou, E., López-Sánchez, M., Rodríguez-Aguilar, J.A., Sichman, J.S.: Adapting autonomic electronic institutions to heterogeneous agent societies. In: Vouros, G., Artikis, A., Stathis, K., Pitt, J. (eds.) OAMAS 2008. LNCS (LNAI), vol. 5368, pp. 18–35. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02377-4_2

    Chapter  Google Scholar 

  2. Bou, E., López-Sánchez, M., Rodríguez-Aguilar, J.A.: Towards self-configuration in autonomic electronic institutions. In: Noriega, P., Vázquez-Salceda, J., Boella, G., Boissier, O., Dignum, V., Fornara, N., Matson, E. (eds.) COIN 2006. LNCS (LNAI), vol. 4386, pp. 229–244. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74459-7_15

    Chapter  Google Scholar 

  3. Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Cardoso, H.L., Oliveira, E.: Adaptive deterrence sanctions in a normative framework. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, pp. 36–43. IEEE Computer Society (2009)

    Google Scholar 

  5. Castelfranchi, C.: Engineering social order. In: Omicini, A., Tolksdorf, R., Zambonelli, F. (eds.) ESAW 2000. LNCS (LNAI), vol. 1972, pp. 1–18. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44539-0_1

    Chapter  Google Scholar 

  6. Conte, R.: Emergent (info)institutions. Cogn. Syst. Res. 2(2), 97–110 (2001). https://doi.org/10.1016/S1389-0417(01)00020-1

    Article  Google Scholar 

  7. Crawford, S.E.S., Ostrom, E.: A grammar of institutions. Am. Polit. Sci. Rev. 89(3), 582–600 (1995). https://doi.org/10.2307/2082975

    Article  Google Scholar 

  8. Dawes, R.M.: Social dilemmas. Annu. Rev. Psychol. 31(1), 169–193 (1980). https://doi.org/10.1146/annurev.ps.31.020180.001125

    Article  Google Scholar 

  9. Eccles, T., Hughes, E., Kramár, J., Wheelwright, S., Leibo, J.Z.: Learning reciprocity in complex sequential social dilemmas (2019)

    Google Scholar 

  10. Esteva, M., de la Cruz, D., Rosell, B., Arcos, J.L., Rodríguez-Aguilar, J., Cuní, G.: Engineering open multi-agent systems as electronic institutions. In: Proceedings of the 19th National Conference on Artificial Intelligence, AAAI 2004, pp. 1010–1011. AAAI Press (01 2004)

    Google Scholar 

  11. Esteva, M., Rodríguez-Aguilar, J.-A., Sierra, C., Garcia, P., Arcos, J.L.: On the formal specification of electronic institutions. In: Dignum, F., Sierra, C. (eds.) Agent Mediated Electronic Commerce. LNCS (LNAI), vol. 1991, pp. 126–147. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44682-6_8

    Chapter  Google Scholar 

  12. Ghorbani, A., Ho, P., Bravo, G.: Institutional form versus function in a common property context: the credibility thesis tested through an agent-based model. Land Use Policy 102, 105237 (2021). https://doi.org/10.1016/j.landusepol.2020.105237. https://www.sciencedirect.com/science/article/pii/S0264837720325758

  13. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870. PMLR, 10–15 July 2018. https://proceedings.mlr.press/v80/haarnoja18b.html

  14. Hardin, G.: The tragedy of the commons. Science 162(3859), 1243–1248 (1968). https://doi.org/10.1126/science.162.3859.1243. https://science.sciencemag.org/content/162/3859/1243

  15. Hughes, E., et al.: Inequity aversion improves cooperation in intertemporal social dilemmas. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/7fea637fd6d02b8f0adf6f7dc36aed93-Paper.pdf

  16. Jones, A.J.I., Sergot, M.: On the characterization of law and computer systems: the normative systems perspective, pp. 275–307. Wiley, Chichester (1994)

    Google Scholar 

  17. Kollock, P.: Social dilemmas: the anatomy of cooperation. Annu. Rev. Sociol. 24(1), 183–214 (1998). https://doi.org/10.1146/annurev.soc.24.1.183

    Article  Google Scholar 

  18. Lerer, A., Peysakhovich, A.: Maintaining cooperation in complex social dilemmas using deep reinforcement learning (2018)

    Google Scholar 

  19. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, ICML 1994, pp. 157–163. Morgan Kaufmann Publishers Inc., San Francisco (1994)

    Google Scholar 

  20. McKee, K.R., Gemp, I., McWilliams, B., Duèñez Guzmán, E.A., Hughes, E., Leibo, J.Z.: Social diversity and social preferences in mixed-motive reinforcement learning. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020, pp. 869–877. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2020)

    Google Scholar 

  21. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York, 20–22 June 2016. https://proceedings.mlr.press/v48/mniha16.html

  22. Nardin, L.G.: An adaptive sanctioning enforcement model for normative multiagent systems. Ph.D. thesis, Universidade de São Paulo (2015)

    Google Scholar 

  23. Neufeld, E., Bartocci, E., Ciabattoni, A., Governatori, G.: A normative supervisor for reinforcement learning agents. In: Platzer, A., Sutcliffe, G. (eds.) CADE 2021. LNCS (LNAI), vol. 12699, pp. 565–576. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79876-5_32

    Chapter  Google Scholar 

  24. Olson, M.: The Logic of Collective Action: Public Goods and the Theory of Groups. Harvard Economic Studies, vol. 124, p. 176. Harvard University Press, Cambridge (1965). https://www.hup.harvard.edu/catalog.php?isbn=9780674537514

  25. Ostrom, E.: Coping with tragedies of the commons. Annu. Rev. Polit. Sci. 2(1), 493–535 (1999). https://doi.org/10.1146/annurev.polisci.2.1.493

    Article  Google Scholar 

  26. Ostrom, E.: Collective action and the evolution of social norms. J. Econ. Perspect. 14(3), 137–158 (2000). https://doi.org/10.1257/jep.14.3.137

    Article  Google Scholar 

  27. Pérolat, J., Leibo, J.Z., Zambaldi, V., Beattie, C., Tuyls, K., Graepel, T.: A multi-agent reinforcement learning model of common-pool resource appropriation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/2b0f658cbffd284984fb11d90254081f-Paper.pdf

  28. Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021). http://jmlr.org/papers/v22/20-1364.html

  29. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  30. Terry, J.K., et al.: PettingZoo: a standard API for multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems (2021). https://proceedings.neurips.cc//paper/2021/file/7ed2d3454c5eea71148b11d0c25104ff-Paper.pdf

  31. Ullmann-Margalit, E.: The Emergence of Norms. Oxford University Press, Oxford (1977)

    Google Scholar 

  32. Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Handbook of Reinforcement Learning and Control. SSDC, vol. 325, pp. 321–384. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60990-0_12

    Chapter  MATH  Google Scholar 

  33. Zheng, S., et al.: The AI economist: improving equality and productivity with AI-driven tax policies (2020)

    Google Scholar 

Download references

Acknowledgements

This research is being carried out with the support of Itaú Unibanco S.A., through the scholarship program of Programa de Bolsas Itaú (PBI), and it is also financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Finance Code 001, Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael M. Cheang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheang, R.M., Brandão, A.A.F., Sichman, J.S. (2022). Centralized Norm Enforcement in Mixed-Motive Multiagent Reinforcement Learning. In: Ajmeri, N., Morris Martin, A., Savarimuthu, B.T.R. (eds) Coordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems XV. COINE 2022. Lecture Notes in Computer Science(), vol 13549. Springer, Cham. https://doi.org/10.1007/978-3-031-20845-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20845-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20844-7

  • Online ISBN: 978-3-031-20845-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics