Specification Aware Multi-Agent Reinforcement Learning

Ritz, Fabian; Phan, Thomy; Müller, Robert; Gabor, Thomas; Sedlmeier, Andreas; Zeller, Marc; Wieghardt, Jan; Schmid, Reiner; Sauer, Horst; Klein, Cornel; Linnhoff-Popien, Claudia

doi:10.1007/978-3-031-10161-8_1

Specification Aware Multi-Agent Reinforcement Learning

Fabian Ritz ORCID: orcid.org/0000-0001-7707-1358¹⁰,
Thomy Phan¹⁰,
Robert Müller ORCID: orcid.org/0000-0003-3108-713X¹⁰,
Thomas Gabor¹⁰,
Andreas Sedlmeier¹⁰,
Marc Zeller¹¹,
Jan Wieghardt¹¹,
Reiner Schmid¹¹,
Horst Sauer¹¹,
Cornel Klein¹¹ &
…
Claudia Linnhoff-Popien¹⁰

Conference paper
First Online: 19 July 2022

521 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13251))

Abstract

Engineering intelligent industrial systems is challenging due to high complexity and uncertainty with respect to domain dynamics and multiple agents. If industrial systems act autonomously, their choices and results must be within specified bounds to satisfy these requirements. Reinforcement learning (RL) is promising to find solutions that outperform known or handcrafted heuristics. However in industrial scenarios, it also is crucial to prevent RL from inducing potentially undesired or even dangerous behavior. This paper considers specification alignment in industrial scenarios with multi-agent reinforcement learning (MARL). We propose to embed functional and non-functional requirements into the reward function, enabling the agents to learn to align with the specification. We evaluate our approach in a smart factory simulation representing an industrial lot-size-one production facility, where we train up to eight agents using DQN, VDN, and QMIX. Our results show that the proposed approach enables agents to satisfy a given set of requirements.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amodei, D., Olah, C., Steinhardt, J., Christiano, P.F., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv:1606.06565 (2016)
Belzner, L., Beck, M.T., Gabor, T., Roelle, H., Sauer, H.: Software engineering for distributed autonomous real-time systems. In: 2016 IEEE/ACM 2nd International Workshop on Software Engineering for Smart Cyber-Physical Systems (SEsCPS), pp. 54–57. IEEE (2016)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 41–48 (2009)
Google Scholar
Bures, T., et al.: Software engineering for smart cyber-physical systems: challenges and promising solutions. ACM SIGSOFT Softw. Eng. Notes 42(2), 19–24 (2017)
Article Google Scholar
Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: multi-agent learning in global reward games. In: Advances in Neural Information Processing Systems, pp. 807–814 (2004)
Google Scholar
Cheng, B.H.C., et al.: Software engineering for self-adaptive systems: a research roadmap. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 1–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02161-9_1
Chapter Google Scholar
Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012, vol. 1, pp. 433–440 (2012)
Google Scholar
Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS 2014, pp. 165–172 (2014)
Google Scholar
Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Google Scholar
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(42), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Grześ, M.: Reward shaping in episodic reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, pp. 565–573 (2017)
Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Chapter Google Scholar
Hendrycks, D., Carlini, N., Schulman, J., Steinhardt, J.: Unsolved problems in ML safety (2021)
Google Scholar
Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)
Article MathSciNet Google Scholar
Laurent, G.J., Matignon, L., Fort-Piat, L., et al.: The world of Independent Learners is not Markovian. J. Knowl.-Based Intell. Eng. Syst. 15, 55–64 (2011)
Article Google Scholar
Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems, pp. 464–473 (2017)
Google Scholar
Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., Legg, S.: Scalable agent alignment via reward modeling: a research direction (2018)
Google Scholar
Leike, J., et al.: AI safety gridworlds. arXiv:1711.09883 (2017)
Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., Graepel, T.: Emergent coordination through competition. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA (2019)
Google Scholar
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647. ACM (2005)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 278–287 (1999)
Google Scholar
Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97–114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6
Chapter Google Scholar
Phan, T., Belzner, L., Gabor, T., Schmid, K.: Leveraging statistical multi-agent online planning with emergent value function approximation. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS, pp. 730–738 (2018)
Google Scholar
Phan, T., Belzner, L., Gabor, T., Sedlmeier, A., Ritz, F., Linnhoff-Popien, C.: Resilient multi-agent reinforcement learning with adversarial value decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, pp. 11308–11316 (2021)
Google Scholar
Phan, T., et al.: Learning and testing resilience in cooperative multi-agent systems. In: Proceedings of the 19th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020 (2020)
Google Scholar
Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)
Google Scholar
Ritz, F., et al.: SAT-MARL: specification aware training in multi-agent reinforcement learning. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence, Volume 1: ICAART, pp. 28–37. SciTePress (2021). https://doi.org/10.5220/0010189500280037
Seurin, M., Preux, P., Pietquin, O.: “I’m sorry Dave, I’m afraid I can’t do that” deep q-learning from forbidden action. arXiv:1910.02078 (2019)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018). https://doi.org/10.1126/science.aar6404
Article MathSciNet MATH Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896 (2019)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract), IFAAMAS, pp. 2085–2087 (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)
Google Scholar
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4), e0172395 (2017)
Google Scholar
Wang, S., Wan, J., Zhang, D., Li, D., Zhang, C.: Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination. Comput. Netw. 101, 158–168 (2016)
Article Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Wolpert, D.H., Tumer, K.: Optimal payoff functions for members of collectives. In: Modeling Complexity in Economic and Social Systems, pp. 355–369. World Scientific (2002)
Google Scholar
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3562–3573. Curran Associates, Inc. (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Mobile and Distributed Systems Group, LMU Munich, Munich, Germany
Fabian Ritz, Thomy Phan, Robert Müller, Thomas Gabor, Andreas Sedlmeier & Claudia Linnhoff-Popien
Corporate Technology (CT), Siemens AG, Munich, Germany
Marc Zeller, Jan Wieghardt, Reiner Schmid, Horst Sauer & Cornel Klein

Authors

Fabian Ritz
View author publications
You can also search for this author in PubMed Google Scholar
Thomy Phan
View author publications
You can also search for this author in PubMed Google Scholar
Robert Müller
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Gabor
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Sedlmeier
View author publications
You can also search for this author in PubMed Google Scholar
Marc Zeller
View author publications
You can also search for this author in PubMed Google Scholar
Jan Wieghardt
View author publications
You can also search for this author in PubMed Google Scholar
Reiner Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Horst Sauer
View author publications
You can also search for this author in PubMed Google Scholar
Cornel Klein
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Linnhoff-Popien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabian Ritz .

Editor information

Editors and Affiliations

LIACC, University of Porto, Porto, Portugal
Ana Paula Rocha
ICREA, Institute of Evolutionary Biology, Barcelona, Barcelona, Spain
Luc Steels
Leiden University, Leiden, The Netherlands
Jaap van den Herik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ritz, F. et al. (2022). Specification Aware Multi-Agent Reinforcement Learning. In: Rocha, A.P., Steels, L., van den Herik, J. (eds) Agents and Artificial Intelligence. ICAART 2021. Lecture Notes in Computer Science(), vol 13251. Springer, Cham. https://doi.org/10.1007/978-3-031-10161-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-10161-8_1
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10160-1
Online ISBN: 978-3-031-10161-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics