Skip to main content

Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2023)

Abstract

Ökolopoly is a serious game developed by biochemist Frederic Vester with the goal to enhance understanding of interactions in complex systems. Due to its vast observation and action spaces, it presents a challenge for Deep Reinforcement Learning (DRL). In this paper, we make the board game available as a reinforcement learning environment and compare different methods of making the large spaces manageable. Our aim is to determine the conditions under which DRL agents are able to learn this game from self-play. To this goal we implement various wrappers to reduce the observation and action spaces, and to change the reward structure. We train PPO, SAC, and TD3 agents on combinations of these wrappers and compare their performance. We analyze the contribution of different representations of observation and action spaces to successful learning and the possibility of steering the DRL agents’ gameplay by shaping reward functions.

This research was supported by the research training group “Dataninja” (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An advanced version of the game provides optional “event cards” to be drawn every five rounds. We ignore this advanced version in our implementation.

  2. 2.

    https://github.com/WolfgangKonen/oekolopoly_v1.

  3. 3.

    The sixth dimension \(a_{5}' \in [-1,1]\) (modifier for Population Growth g, Sect. 3.1) is multiplied by 5, rounded, and then appended to \(\textbf{a}\).

  4. 4.

    As an example the string 020101 encodes the following distribution of action points: one third of the action points are added to Education, two thirds of the action points are deducted (rightmost digit is 1) from Production.

  5. 5.

    Timing experiments were performed on a system with Intel® Core™i7-1185G7 CPU and 16 GB RAM.

  6. 6.

    The full set of pie charts is available in the Github repository.

References

  1. Bosch, O., Nguyen, N., Sun, D.: Addressing the critical need for “new ways of thinking’’ in managing complex issues in a socially responsible way. Bus. Syst. Rev. 2, 48–70 (2013)

    Google Scholar 

  2. Brockman, G., et al.: OpenAI Gym (2016). https://doi.org/10.48550/arXiv.1606.01540

  3. Cooper, S., et al.: Predicting protein structures with a multiplayer online game. Nature 466(7307), 756–760 (2010). https://doi.org/10.1038/nature09304

    Article  Google Scholar 

  4. Dobrovsky, A., Borghoff, U.M., Hofmann, M.: Improving adaptive gameplay in serious games through interactive deep reinforcement learning. Cogn. Infocommun. Theory Appl. 13, 411–432 (2019)

    Google Scholar 

  5. Dobrovsky, A., Wilczak, C.W., Hahn, P., Hofmann, M., Borghoff, U.M.: Deep reinforcement learning in serious games: analysis and design of deep neural network architectures. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2017. LNCS, vol. 10672, pp. 314–321. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74727-9_37

    Chapter  Google Scholar 

  6. Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces (2015). https://doi.org/10.48550/arXiv.1512.07679

  7. Dulac-Arnold, G., et al.: Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach. Learn. 110(9), 2419–2468 (2021). https://doi.org/10.1007/s10994-021-05961-4

    Article  MathSciNet  Google Scholar 

  8. Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1587–1596 (2018)

    Google Scholar 

  9. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, PMLR, vol. 80, pp. 1861–1870 (2018)

    Google Scholar 

  10. Hornak, D., Jascur, M., Ferencik, N., Bundzel, M.: Proof of concept: using reinforcement learning agent as an adversary in serious games. In: 2019 IEEE International Work Conference on Bioinspired Intelligence, pp. 111–116 (2019)

    Google Scholar 

  11. Huang, S., Ontañón, S.: A closer look at invalid action masking in policy gradient algorithms. In: The International FLAIRS Conference Proceedings, vol. 35 (2022). https://doi.org/10.32473/flairs.v35i.130584

  12. Nguyen, N.C., Bosch, O.J.H.: The art of interconnected thinking: starting with the young. Challenges 5(2), 239–259 (2014). https://doi.org/10.3390/challe5020239

    Article  Google Scholar 

  13. Pazis, J., Parr, R.: Generalized value functions for large action sets. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 1185–1192 (2011)

    Google Scholar 

  14. Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(268), 1–8 (2021)

    Google Scholar 

  15. Raycheva, R.: Erstellung eines custom environments in OpenAI Gym für das Spiel Ökolopoly. Technical report, TH Köln (2021)

    Google Scholar 

  16. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). https://doi.org/10.48550/arXiv.1707.06347

  17. Teixeira, J.d.S., Angeluci, A.C.B., Junior, P.P., Martin, J.G.P.: ‘Let’s play?’ A systematic review of board games in biology. J. Biol. Educ. 1–20 (2022). https://doi.org/10.1080/00219266.2022.2041461

  18. Vester, F.: Der blaue Planet in der Krise. Gewerkschaftliche Monatshefte 39(12), 713–773 (1988)

    Google Scholar 

  19. Vester, F.: Ökolopoly: das kybernetische Umweltspiel. Studiengruppe für Biologie und Umwelt (1989)

    Google Scholar 

  20. Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raphael C. Engelhardt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Engelhardt, R.C., Raycheva, R., Lange, M., Wiskott, L., Konen, W. (2024). Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14506. Springer, Cham. https://doi.org/10.1007/978-3-031-53966-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53966-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53965-7

  • Online ISBN: 978-3-031-53966-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics