Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning

Engelhardt, Raphael C.; Raycheva, Ralitsa; Lange, Moritz; Wiskott, Laurenz; Konen, Wolfgang

doi:10.1007/978-3-031-53966-4_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14506))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

152 Accesses

Abstract

Ökolopoly is a serious game developed by biochemist Frederic Vester with the goal to enhance understanding of interactions in complex systems. Due to its vast observation and action spaces, it presents a challenge for Deep Reinforcement Learning (DRL). In this paper, we make the board game available as a reinforcement learning environment and compare different methods of making the large spaces manageable. Our aim is to determine the conditions under which DRL agents are able to learn this game from self-play. To this goal we implement various wrappers to reduce the observation and action spaces, and to change the reward structure. We train PPO, SAC, and TD3 agents on combinations of these wrappers and compare their performance. We analyze the contribution of different representations of observation and action spaces to successful learning and the possibility of steering the DRL agents’ gameplay by shaping reward functions.

This research was supported by the research training group “Dataninja” (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An advanced version of the game provides optional “event cards” to be drawn every five rounds. We ignore this advanced version in our implementation.
2.
https://github.com/WolfgangKonen/oekolopoly_v1.
3.
The sixth dimension \(a_{5}' \in [-1,1]\) (modifier for Population Growth g, Sect. 3.1) is multiplied by 5, rounded, and then appended to \(\textbf{a}\).
4.
As an example the string 020101 encodes the following distribution of action points: one third of the action points are added to Education, two thirds of the action points are deducted (rightmost digit is 1) from Production.
5.
Timing experiments were performed on a system with Intel^® Core™i7-1185G7 CPU and 16 GB RAM.
6.
The full set of pie charts is available in the Github repository.

References

Bosch, O., Nguyen, N., Sun, D.: Addressing the critical need for “new ways of thinking’’ in managing complex issues in a socially responsible way. Bus. Syst. Rev. 2, 48–70 (2013)
Google Scholar
Brockman, G., et al.: OpenAI Gym (2016). https://doi.org/10.48550/arXiv.1606.01540
Cooper, S., et al.: Predicting protein structures with a multiplayer online game. Nature 466(7307), 756–760 (2010). https://doi.org/10.1038/nature09304
Article Google Scholar
Dobrovsky, A., Borghoff, U.M., Hofmann, M.: Improving adaptive gameplay in serious games through interactive deep reinforcement learning. Cogn. Infocommun. Theory Appl. 13, 411–432 (2019)
Google Scholar
Dobrovsky, A., Wilczak, C.W., Hahn, P., Hofmann, M., Borghoff, U.M.: Deep reinforcement learning in serious games: analysis and design of deep neural network architectures. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2017. LNCS, vol. 10672, pp. 314–321. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74727-9_37
Chapter Google Scholar
Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces (2015). https://doi.org/10.48550/arXiv.1512.07679
Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021). https://doi.org/10.1007/s10994-021-05961-4
Article MathSciNet Google Scholar
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1587–1596 (2018)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1861–1870 (2018)
Google Scholar
Hornak, D., Jascur, M., Ferencik, N., Bundzel, M.: Proof of concept: using reinforcement learning agent as an adversary in serious games. In: 2019 IEEE International Work Conference on Bioinspired Intelligence, pp. 111–116 (2019)
Google Scholar
Huang, S., Ontañón, S.: A closer look at invalid action masking in policy gradient algorithms. In: The International FLAIRS Conference Proceedings, vol. 35 (2022). https://doi.org/10.32473/flairs.v35i.130584
Nguyen, N.C., Bosch, O.J.H.: The art of interconnected thinking: starting with the young. Challenges 5(2), 239–259 (2014). https://doi.org/10.3390/challe5020239
Article Google Scholar
Pazis, J., Parr, R.: Generalized value functions for large action sets. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 1185–1192 (2011)
Google Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Google Scholar
Raycheva, R.: Erstellung eines custom environments in OpenAI Gym für das Spiel Ökolopoly. Technical report, TH Köln (2021)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). https://doi.org/10.48550/arXiv.1707.06347
Teixeira, J.d.S., Angeluci, A.C.B., Junior, P.P., Martin, J.G.P.: ‘Let’s play?’ A systematic review of board games in biology. J. Biol. Educ. 1–20 (2022). https://doi.org/10.1080/00219266.2022.2041461
Vester, F.: Der blaue Planet in der Krise. Gewerkschaftliche Monatshefte 39(12), 713–773 (1988)
Google Scholar
Vester, F.: Ökolopoly: das kybernetische Umweltspiel. Studiengruppe für Biologie und Umwelt (1989)
Google Scholar
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering Science, Cologne Institute of Computer Science, TH Köln, Gummersbach, Germany
Raphael C. Engelhardt, Ralitsa Raycheva & Wolfgang Konen
Faculty of Computer Science, Institute for Neural Computation, Ruhr-University Bochum, Bochum, Germany
Moritz Lange & Laurenz Wiskott

Authors

Raphael C. Engelhardt
View author publications
You can also search for this author in PubMed Google Scholar
Ralitsa Raycheva
View author publications
You can also search for this author in PubMed Google Scholar
Moritz Lange
View author publications
You can also search for this author in PubMed Google Scholar
Laurenz Wiskott
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Konen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raphael C. Engelhardt .

Editor information

Editors and Affiliations

University of Catania, Catania, Catania, Italy
Giuseppe Nicosia
Newcastle University, Newcastle upon Tyne, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Gabriele La Malfa
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Engelhardt, R.C., Raycheva, R., Lange, M., Wiskott, L., Konen, W. (2024). Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14506. Springer, Cham. https://doi.org/10.1007/978-3-031-53966-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-53966-4_9
Published: 15 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53965-7
Online ISBN: 978-3-031-53966-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning