Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning

Ozkohen, Paul; Visser, Jelle; van Otterlo, Martijn; Wiering, Marco

doi:10.1007/978-3-319-76892-2_11

Paul Ozkohen¹¹,
Jelle Visser¹¹,
Martijn van Otterlo¹² &
…
Marco Wiering¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 823))

Included in the following conference series:

Benelux Conference on Artificial Intelligence

1718 Accesses
1 Citations

Abstract

Neural networks and reinforcement learning have successfully been applied to various games, such as Ms. Pacman and Go. We combine multilayer perceptrons and a class of reinforcement learning algorithms known as actor-critic to learn to play the arcade classic Donkey Kong. Two neural networks are used in this study: the actor and the critic. The actor learns to select the best action given the game state; the critic tries to learn the value of being in a certain state. First, a base game-playing performance is obtained by learning from demonstration, where data is obtained from human players. After this off-line training phase we further improve the base performance using feedback from the critic. The critic gives feedback by comparing the value of the state before and after taking the action. Results show that an agent pre-trained on demonstration data is able to achieve a good baseline performance. Applying actor-critic methods, however, does usually not improve performance, in many cases even decreases it. Possible reasons include the game not fully being Markovian and other issues.

The third author acknowledges support from the Amsterdam academic alliance (AAA) on data science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proceedings of the International Conference on Machine Learning, pp. 12–20 (1997)
Google Scholar
Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)
Google Scholar
Bom, L., Henken, R., Wiering, M.: Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2013)
Google Scholar
Donkey Kong fansite wiki. http://donkeykong.wikia.com/wiki/Nintendo. Accessed Sept 2017
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Shantia, A., Begue, E., Wiering, M.: Connectionist reinforcement learning for intelligent unit micro management in Starcraft. In: The 2011 International Joint Conference on Neural Networks, pp. 1794–1801 (2011)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Google Scholar
Takahashi, Y., Schoenbaum, G., Niv, Y.: Silencing the critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Front. Neurosci. 2(1), 86–99 (2008)
Article Google Scholar
van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning (2017). https://arxiv.org/abs/1706.04208
Watkins, C.J.: Learning from delayed rewards. Ph.D. Thesis, University of Cambridge, England (1989)
Google Scholar
Wiering, M., van Otterlo, M.: Reinforcement Learning: State of the Art. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3
Book Google Scholar
Wiering, M.A., Van Hasselt, H.: Two novel on-policy reinforcement learning algorithms based on TD(\(\lambda \))-methods. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 280–287 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Groningen, Groningen, The Netherlands
Paul Ozkohen, Jelle Visser & Marco Wiering
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Martijn van Otterlo

Authors

Paul Ozkohen
View author publications
You can also search for this author in PubMed Google Scholar
Jelle Visser
View author publications
You can also search for this author in PubMed Google Scholar
Martijn van Otterlo
View author publications
You can also search for this author in PubMed Google Scholar
Marco Wiering
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Wiering .

Editor information

Editors and Affiliations

Artificial Intelligence, University of Groningen, Groningen, The Netherlands
Bart Verheij
Artificial Intelligence, University of Groningen, Groningen, The Netherlands
Marco Wiering

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ozkohen, P., Visser, J., van Otterlo, M., Wiering, M. (2018). Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning. In: Verheij, B., Wiering, M. (eds) Artificial Intelligence. BNAIC 2017. Communications in Computer and Information Science, vol 823. Springer, Cham. https://doi.org/10.1007/978-3-319-76892-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-76892-2_11
Published: 25 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76891-5
Online ISBN: 978-3-319-76892-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics