Centralized Norm Enforcement in Mixed-Motive Multiagent Reinforcement Learning

Cheang, Rafael M.; Brandão, Anarosa A. F.; Sichman, Jaime S.

doi:10.1007/978-3-031-20845-4_8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13549))

Included in the following conference series:

International Workshop on Coordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems

252 Accesses

Abstract

Mixed-motive games comprise a subset of games in which individual and collective incentives are not entirely aligned. These games are relevant because they frequently occur in real-world and artificial societies, and their outcome is often bad for the involved parties. Institutions and norms offer a good solution for governing mixed-motive systems. Still, they are usually incorporated into the system in a distributed fashion, or they are not able to dynamically adjust to the needs of the environment at run-time. We propose a way of reaching socially good outcomes in mixed-motive multiagent reinforcement learning settings by enhancing the environment with a normative system controlled by an external reinforcement learning agent. By adopting this proposal, we show it is possible to reach social welfare in a mixed-motive system of self-interested agents using only traditional reinforcement learning agent architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.bbc.com/news/business-34324772.
2.
https://nypost.com/article/what-is-cancel-culture-breaking-down-the-toxic-online-trend/.
3.
By traditional RL agent architectures we mean commonly used in other RL tasks such as A2C [21].
4.
All relevant code and data for this project is available at https://github.com/rafacheang/social_dilemmas_regulation.
5.
https://www.nobelprize.org/prizes/economic-sciences/2009/ostrom/facts/.

References

Bou, E., López-Sánchez, M., Rodríguez-Aguilar, J.A., Sichman, J.S.: Adapting autonomic electronic institutions to heterogeneous agent societies. In: Vouros, G., Artikis, A., Stathis, K., Pitt, J. (eds.) OAMAS 2008. LNCS (LNAI), vol. 5368, pp. 18–35. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02377-4_2
Chapter Google Scholar
Bou, E., López-Sánchez, M., Rodríguez-Aguilar, J.A.: Towards self-configuration in autonomic electronic institutions. In: Noriega, P., Vázquez-Salceda, J., Boella, G., Boissier, O., Dignum, V., Fornara, N., Matson, E. (eds.) COIN 2006. LNCS (LNAI), vol. 4386, pp. 229–244. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74459-7_15
Chapter Google Scholar
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Cardoso, H.L., Oliveira, E.: Adaptive deterrence sanctions in a normative framework. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, pp. 36–43. IEEE Computer Society (2009)
Google Scholar
Castelfranchi, C.: Engineering social order. In: Omicini, A., Tolksdorf, R., Zambonelli, F. (eds.) ESAW 2000. LNCS (LNAI), vol. 1972, pp. 1–18. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44539-0_1
Chapter Google Scholar
Conte, R.: Emergent (info)institutions. Cogn. Syst. Res. 2(2), 97–110 (2001). https://doi.org/10.1016/S1389-0417(01)00020-1
Article Google Scholar
Crawford, S.E.S., Ostrom, E.: A grammar of institutions. Am. Polit. Sci. Rev. 89(3), 582–600 (1995). https://doi.org/10.2307/2082975
Article Google Scholar
Dawes, R.M.: Social dilemmas. Annu. Rev. Psychol. 31(1), 169–193 (1980). https://doi.org/10.1146/annurev.ps.31.020180.001125
Article Google Scholar
Eccles, T., Hughes, E., Kramár, J., Wheelwright, S., Leibo, J.Z.: Learning reciprocity in complex sequential social dilemmas (2019)
Google Scholar
Esteva, M., de la Cruz, D., Rosell, B., Arcos, J.L., Rodríguez-Aguilar, J., Cuní, G.: Engineering open multi-agent systems as electronic institutions. In: Proceedings of the 19th National Conference on Artificial Intelligence, AAAI 2004, pp. 1010–1011. AAAI Press (01 2004)
Google Scholar
Esteva, M., Rodríguez-Aguilar, J.-A., Sierra, C., Garcia, P., Arcos, J.L.: On the formal specification of electronic institutions. In: Dignum, F., Sierra, C. (eds.) Agent Mediated Electronic Commerce. LNCS (LNAI), vol. 1991, pp. 126–147. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44682-6_8
Chapter Google Scholar
Ghorbani, A., Ho, P., Bravo, G.: Institutional form versus function in a common property context: the credibility thesis tested through an agent-based model. Land Use Policy 102, 105237 (2021). https://doi.org/10.1016/j.landusepol.2020.105237. https://www.sciencedirect.com/science/article/pii/S0264837720325758
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870. PMLR, 10–15 July 2018. https://proceedings.mlr.press/v80/haarnoja18b.html
Hardin, G.: The tragedy of the commons. Science 162(3859), 1243–1248 (1968). https://doi.org/10.1126/science.162.3859.1243. https://science.sciencemag.org/content/162/3859/1243
Hughes, E., et al.: Inequity aversion improves cooperation in intertemporal social dilemmas. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/7fea637fd6d02b8f0adf6f7dc36aed93-Paper.pdf
Jones, A.J.I., Sergot, M.: On the characterization of law and computer systems: the normative systems perspective, pp. 275–307. Wiley, Chichester (1994)
Google Scholar
Kollock, P.: Social dilemmas: the anatomy of cooperation. Annu. Rev. Sociol. 24(1), 183–214 (1998). https://doi.org/10.1146/annurev.soc.24.1.183
Article Google Scholar
Lerer, A., Peysakhovich, A.: Maintaining cooperation in complex social dilemmas using deep reinforcement learning (2018)
Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, ICML 1994, pp. 157–163. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
McKee, K.R., Gemp, I., McWilliams, B., Duèñez Guzmán, E.A., Hughes, E., Leibo, J.Z.: Social diversity and social preferences in mixed-motive reinforcement learning. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020, pp. 869–877. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2020)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR, New York, 20–22 June 2016. https://proceedings.mlr.press/v48/mniha16.html
Nardin, L.G.: An adaptive sanctioning enforcement model for normative multiagent systems. Ph.D. thesis, Universidade de São Paulo (2015)
Google Scholar
Neufeld, E., Bartocci, E., Ciabattoni, A., Governatori, G.: A normative supervisor for reinforcement learning agents. In: Platzer, A., Sutcliffe, G. (eds.) CADE 2021. LNCS (LNAI), vol. 12699, pp. 565–576. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79876-5_32
Chapter Google Scholar
Olson, M.: The Logic of Collective Action: Public Goods and the Theory of Groups. Harvard Economic Studies, vol. 124, p. 176. Harvard University Press, Cambridge (1965). https://www.hup.harvard.edu/catalog.php?isbn=9780674537514
Ostrom, E.: Coping with tragedies of the commons. Annu. Rev. Polit. Sci. 2(1), 493–535 (1999). https://doi.org/10.1146/annurev.polisci.2.1.493
Article Google Scholar
Ostrom, E.: Collective action and the evolution of social norms. J. Econ. Perspect. 14(3), 137–158 (2000). https://doi.org/10.1257/jep.14.3.137
Article Google Scholar
Pérolat, J., Leibo, J.Z., Zambaldi, V., Beattie, C., Tuyls, K., Graepel, T.: A multi-agent reinforcement learning model of common-pool resource appropriation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/2b0f658cbffd284984fb11d90254081f-Paper.pdf
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021). http://jmlr.org/papers/v22/20-1364.html
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)
MATH Google Scholar
Terry, J.K., et al.: PettingZoo: a standard API for multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems (2021). https://proceedings.neurips.cc//paper/2021/file/7ed2d3454c5eea71148b11d0c25104ff-Paper.pdf
Ullmann-Margalit, E.: The Emergence of Norms. Oxford University Press, Oxford (1977)
Google Scholar
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Handbook of Reinforcement Learning and Control. SSDC, vol. 325, pp. 321–384. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60990-0_12
Chapter MATH Google Scholar
Zheng, S., et al.: The AI economist: improving equality and productivity with AI-driven tax policies (2020)
Google Scholar

Download references

Acknowledgements

This research is being carried out with the support of Itaú Unibanco S.A., through the scholarship program of Programa de Bolsas Itaú (PBI), and it is also financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Finance Code 001, Brazil.

Author information

Authors and Affiliations

Laboratório de Técnicas Inteligentes (LTI), Universidade de São Paulo (USP), São Paulo, Brazil
Rafael M. Cheang, Anarosa A. F. Brandão & Jaime S. Sichman
Centro de Ciência de Dados (C2D), Universidade de São Paulo (USP), São Paulo, Brazil
Rafael M. Cheang

Authors

Rafael M. Cheang
View author publications
You can also search for this author in PubMed Google Scholar
Anarosa A. F. Brandão
View author publications
You can also search for this author in PubMed Google Scholar
Jaime S. Sichman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael M. Cheang .

Editor information

Editors and Affiliations

University of Bristol, Bristol, UK
Nirav Ajmeri
University of Bath, Bath, UK
Andreasa Morris Martin
University of Otago, Dunedin, New Zealand
Bastin Tony Roy Savarimuthu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheang, R.M., Brandão, A.A.F., Sichman, J.S. (2022). Centralized Norm Enforcement in Mixed-Motive Multiagent Reinforcement Learning. In: Ajmeri, N., Morris Martin, A., Savarimuthu, B.T.R. (eds) Coordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems XV. COINE 2022. Lecture Notes in Computer Science(), vol 13549. Springer, Cham. https://doi.org/10.1007/978-3-031-20845-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-20845-4_8
Published: 24 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20844-7
Online ISBN: 978-3-031-20845-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics