Laser Learning Environment: A New Environment for Coordination-Critical Multi-agent Tasks

Molinghen, Yannick; Avalos, Raphaël; Van Achter, Mark; Nowé, Ann; Lenaerts, Tom

doi:10.1007/978-3-031-74650-5_8

Yannick Molinghen⁵,
Raphaël Avalos⁶,
Mark Van Achter⁸,
Ann Nowé⁶ &
…
Tom Lenaerts^5,6,7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2187))

Included in the following conference series:

Benelux Conference on Artificial Intelligence

16 Accesses

Abstract

We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment where coordination is key. In LLE, agents depend on each other to make progress (interdependence), must jointly take specific sequences of actions to succeed (perfect coordination), and accomplishing those joint actions does not yield any intermediate reward (zero-incentive dynamics). The challenge of such problems lies in the difficulty of escaping state space bottlenecks caused by interdependence steps since escaping those bottlenecks is not rewarded. We test multiple state-of-the-art value-based MARL algorithms against LLE and show that they consistently fail at the collaborative task because of their inability to escape state space bottlenecks, even though they successfully achieve perfect coordination. We show that Q-learning extensions such as prioritised experience replay and n-steps return hinder exploration in environments with zero-incentive dynamics, and find that intrinsic curiosity with random network distillation is not sufficient to escape those bottlenecks. We demonstrate the need for novel methods to solve this problem and the relevance of LLE as cooperative MARL benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code is available at https://github.com/yamoling/bnaic-2023-lle.

References

Amodei, D., Olah, C., Steinhardt, J., Christiano, P.F., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR, abs/1606.06565 (2016)
Google Scholar
Avalos, R., Reymond, M., Nowé, A., Roijers, D.M.: Local advantage networks for cooperative multi-agent reinforcement learning. In: AAMAS 2022: Proceedings of the 21st International Conference on Autonomous Agents and MultiAgent Systems (Extended Abstract) (2022)
Google Scholar
Bard, N., et al.: The hanabi challenge: a new frontier for AI research. Artif. Intell. 280, 103216 (2020). ISSN: 00043702
Article MathSciNet Google Scholar
Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, pp. 195–210 (1996)
Google Scholar
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation (2018)
Google Scholar
Cao, Y., Wenwu, Yu., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Ind. Inf. 9(1), 427–438 (2013)
Article Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence and Tenth Innovative Applications of Artificial Intelligence Conference, AAAI 1998 (1998)
Google Scholar
Klima, R., et al.: Space debris removal: learning to cooperate and the price of anarchy. Front. Robot. AI 5, 54 (2018). ISSN: 2296-9144
Article ADS PubMed PubMed Central Google Scholar
Laurent, G.J., Matignon, L., Le Fort-Piat, N.: The world of independent learners is not Markovian. Int. J. Knowl.-Based Intell. Eng. Syst. 15(1), 55–64 (2011). ISSN: 18758827, 13272314
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Ha, P.: Gradient-based learning applied to document recognition (1998)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2016)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). ISSN: 1476-4687
Article ADS CAS PubMed Google Scholar
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. arXiv preprint arXiv:1703.04908 (2017)
Oliehoek, F.A., Amato, C.: A Concise Introduction to Decentralized POMDPs. SpringerBriefs in Intelligent Systems, Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28929-8. ISBN: 978-3-319-28929-8
Book Google Scholar
Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008a). ISSN: 1076-9757
Google Scholar
Oliehoek, F.A., Spaan, M.T.J., Whiteson, S.: Exploiting locality of interaction in factored Dec-POMDPs. In: International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 517–524 (2008b)
Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Auton. Agents Multi-Agent Syst. 11(3), 387–434 (2005). ISSN: 1573-7454
Article Google Scholar
Parker-Holder, J., et al.: Evolving curricula with regret-based environment design. In: International Conference on Machine Learning, pp. 17473–17498. PMLR (2022)
Google Scholar
Rashid, T., Samvelyan, M., De Witt, C.S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning (2018)
Google Scholar
Samvelyan, M., et al.: The StarCraft multi-agent challenge. CoRR, abs/1902.04043 (2019)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, pp. 1–21 (2016)
Google Scholar
Schmidhuber, J.: A possibility for implementing curiosity and boredom in model-building neural controllers. In: Meyer, J.-A. (ed.) From Animals to Animats, pp. 222–227. The MIT Press (1991). International Conference on Simulation Adaptive Behavior: From Animals to Animats Edition. ISBN: 978-0-262-25667-4
Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning (2019)
Google Scholar
Stern, R., et al.: Multi-agent pathfinding: definitions, variants, and benchmarks (2017)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 3, pp. 2085–2087 (2018). ISSN: 15582914. ISBN: 9781510868083
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2018)
Google Scholar
Sutton, R.S.: Temporal credit assignment in reinforcement learning. Ph.D. thesis, University of Massachusetts Amherst (1984). AAI8410337
Google Scholar
Tuyls, K., Weiss, G.: Multiagent learning: basics, challenges, and prospects. AI Mag. 33(3), 41 (2012). ISSN: 2371-9621, 0738-4602
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning (2016). ISBN: 9781577357605
Google Scholar
van Hasselt, H., Doron, Y., Strub, F., Hessel, M., Sonnerat, N., Modayil, J.: Deep reinforcement learning and the deadly triad (2018)
Google Scholar
Watkins, C.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)
Google Scholar
Wu, S.A., Wang, R.E., Evans, J.A., Tenenbaum, J.B., Parkes, D.C., Kleiman-Weiner, M.: Too many cooks: coordinating multi-agent collaboration through inverse planning. Topics in Cognitive Science (2021)
Google Scholar

Download references

Acknowledgements

Raphaël Avalos is supported by the FWO (Research Foundation – Flanders) under the grant 11F5721N. Tom Lenaerts is supported by an FWO project (grant nr. G054919N) and two FRS-FNRS PDR (grant numbers 31257234 and 40007793). His is furthermore supported by Service Public de Wallonie Recherche under grant no 2010235-ariac by digitalwallonia4.ai. Ann Nowé and Tom Lenaerts are also suported by the Flemish Government through the AI Research Program and TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215.

Author information

Authors and Affiliations

Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
Yannick Molinghen & Tom Lenaerts
AI Lab, Vrije Universiteit Brussel, Brussels, Belgium
Raphaël Avalos, Ann Nowé & Tom Lenaerts
Center for Human-Compatible AI, UC Berkeley, Berkeley, USA
Tom Lenaerts
KU Leuven, Leuven, Belgium
Mark Van Achter

Authors

Yannick Molinghen
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Avalos
View author publications
You can also search for this author in PubMed Google Scholar
Mark Van Achter
View author publications
You can also search for this author in PubMed Google Scholar
Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar
Tom Lenaerts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yannick Molinghen .

Editor information

Editors and Affiliations

Delft University of Technology, Delft, The Netherlands
Frans A. Oliehoek
Delft University of Technology, Delft, The Netherlands
Manon Kok
Delft University of Technology, Delft, The Netherlands
Sicco Verwer

Appendices

A Hyperparameters

A hyperparameter search has been performed with VDN on a combination of batch sizes (32, 64 and 128 transitions), memory sizes (50k, 100k and 200k transitions) and training intervals (1 and 5). Then, we performed a hyperparameter search for prioritised experience replay on a combination of $\alpha $ (0.3, 0.4, 0.5, 0.6, 0.7, 0.8) and $\beta $ (0.3, 0.4, 0.5, 0.6, 0.7, 0.8). For random network distillation, we have explored update ratios $p \in \{0, 0.25, 0.5, 0.75 \}$.

Table 2. Hyperparameters used across all the experiments

Full size table

B Neural Networks Architectures

1.1 B.1 Q-Network

The Q-network is made of two parts with an interconnection: a convolutional neural network of three layers, an interconnect that flattens the CNN, and finally a neural network of three linear layers. This is depicted in Table 3.

Table 3. Q-network architecture

Full size table

1.2 B.2 Random Network Distillation

The random network used for the intrinsic reward computation is a convolutional one similar to the Q-network and is depicted in Table 4. The frozen random network (the target) consists in the first part of the table with an output of size 512. The optimised network (the predictor) has an additional tail with one ReLU activation and one linear layer.

Table 4. Random network architecture

Full size table

C Results of n-Steps Returns with VDN

We plot in Fig. 6 the score and exit rate over the course of the training on level 6 illustrated in Fig. 1. The agents are trained on VDN with n-steps return and with the hyperparameters shown in Table 2. These results show that a higher value of n yield worse results, as discussed in Sect. 5.3.

D Maps Provided by LLE

LLE comes with six predefined levels illustrated in Fig. 7.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Molinghen, Y., Avalos, R., Van Achter, M., Nowé, A., Lenaerts, T. (2025). Laser Learning Environment: A New Environment for Coordination-Critical Multi-agent Tasks. In: Oliehoek, F.A., Kok, M., Verwer, S. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2023. Communications in Computer and Information Science, vol 2187. Springer, Cham. https://doi.org/10.1007/978-3-031-74650-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-74650-5_8
Published: 02 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74649-9
Online ISBN: 978-3-031-74650-5
eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics

Laser Learning Environment: A New Environment for Coordination-Critical Multi-agent Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Hyperparameters

B Neural Networks Architectures

1.1 B.1 Q-Network

1.2 B.2 Random Network Distillation

C Results of n-Steps Returns with VDN

D Maps Provided by LLE

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation