Abstract
Ökolopoly is a serious game developed by biochemist Frederic Vester with the goal to enhance understanding of interactions in complex systems. Due to its vast observation and action spaces, it presents a challenge for Deep Reinforcement Learning (DRL). In this paper, we make the board game available as a reinforcement learning environment and compare different methods of making the large spaces manageable. Our aim is to determine the conditions under which DRL agents are able to learn this game from self-play. To this goal we implement various wrappers to reduce the observation and action spaces, and to change the reward structure. We train PPO, SAC, and TD3 agents on combinations of these wrappers and compare their performance. We analyze the contribution of different representations of observation and action spaces to successful learning and the possibility of steering the DRL agents’ gameplay by shaping reward functions.
This research was supported by the research training group “Dataninja” (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
An advanced version of the game provides optional “event cards” to be drawn every five rounds. We ignore this advanced version in our implementation.
- 2.
- 3.
The sixth dimension \(a_{5}' \in [-1,1]\) (modifier for Population Growth g, Sect. 3.1) is multiplied by 5, rounded, and then appended to \(\textbf{a}\).
- 4.
As an example the string 020101 encodes the following distribution of action points: one third of the action points are added to Education, two thirds of the action points are deducted (rightmost digit is 1) from Production.
- 5.
Timing experiments were performed on a system with Intel® Core™i7-1185G7 CPU and 16 GB RAM.
- 6.
The full set of pie charts is available in the Github repository.
References
Bosch, O., Nguyen, N., Sun, D.: Addressing the critical need for “new ways of thinking’’ in managing complex issues in a socially responsible way. Bus. Syst. Rev. 2, 48–70 (2013)
Brockman, G., et al.: OpenAI Gym (2016). https://doi.org/10.48550/arXiv.1606.01540
Cooper, S., et al.: Predicting protein structures with a multiplayer online game. Nature 466(7307), 756–760 (2010). https://doi.org/10.1038/nature09304
Dobrovsky, A., Borghoff, U.M., Hofmann, M.: Improving adaptive gameplay in serious games through interactive deep reinforcement learning. Cogn. Infocommun. Theory Appl. 13, 411–432 (2019)
Dobrovsky, A., Wilczak, C.W., Hahn, P., Hofmann, M., Borghoff, U.M.: Deep reinforcement learning in serious games: analysis and design of deep neural network architectures. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2017. LNCS, vol. 10672, pp. 314–321. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74727-9_37
Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces (2015). https://doi.org/10.48550/arXiv.1512.07679
Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021). https://doi.org/10.1007/s10994-021-05961-4
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1587–1596 (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1861–1870 (2018)
Hornak, D., Jascur, M., Ferencik, N., Bundzel, M.: Proof of concept: using reinforcement learning agent as an adversary in serious games. In: 2019 IEEE International Work Conference on Bioinspired Intelligence, pp. 111–116 (2019)
Huang, S., Ontañón, S.: A closer look at invalid action masking in policy gradient algorithms. In: The International FLAIRS Conference Proceedings, vol. 35 (2022). https://doi.org/10.32473/flairs.v35i.130584
Nguyen, N.C., Bosch, O.J.H.: The art of interconnected thinking: starting with the young. Challenges 5(2), 239–259 (2014). https://doi.org/10.3390/challe5020239
Pazis, J., Parr, R.: Generalized value functions for large action sets. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 1185–1192 (2011)
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)
Raycheva, R.: Erstellung eines custom environments in OpenAI Gym für das Spiel Ökolopoly. Technical report, TH Köln (2021)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). https://doi.org/10.48550/arXiv.1707.06347
Teixeira, J.d.S., Angeluci, A.C.B., Junior, P.P., Martin, J.G.P.: ‘Let’s play?’ A systematic review of board games in biology. J. Biol. Educ. 1–20 (2022). https://doi.org/10.1080/00219266.2022.2041461
Vester, F.: Der blaue Planet in der Krise. Gewerkschaftliche Monatshefte 39(12), 713–773 (1988)
Vester, F.: Ökolopoly: das kybernetische Umweltspiel. Studiengruppe für Biologie und Umwelt (1989)
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Engelhardt, R.C., Raycheva, R., Lange, M., Wiskott, L., Konen, W. (2024). Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14506. Springer, Cham. https://doi.org/10.1007/978-3-031-53966-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-53966-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53965-7
Online ISBN: 978-3-031-53966-4
eBook Packages: Computer ScienceComputer Science (R0)