Abstract
Neural networks and reinforcement learning have successfully been applied to various games, such as Ms. Pacman and Go. We combine multilayer perceptrons and a class of reinforcement learning algorithms known as actor-critic to learn to play the arcade classic Donkey Kong. Two neural networks are used in this study: the actor and the critic. The actor learns to select the best action given the game state; the critic tries to learn the value of being in a certain state. First, a base game-playing performance is obtained by learning from demonstration, where data is obtained from human players. After this off-line training phase we further improve the base performance using feedback from the critic. The critic gives feedback by comparing the value of the state before and after taking the action. Results show that an agent pre-trained on demonstration data is able to achieve a good baseline performance. Applying actor-critic methods, however, does usually not improve performance, in many cases even decreases it. Possible reasons include the game not fully being Markovian and other issues.
The third author acknowledges support from the Amsterdam academic alliance (AAA) on data science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proceedings of the International Conference on Machine Learning, pp. 12–20 (1997)
Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)
Bom, L., Henken, R., Wiering, M.: Reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2013)
Donkey Kong fansite wiki. http://donkeykong.wikia.com/wiki/Nintendo. Accessed Sept 2017
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Shantia, A., Begue, E., Wiering, M.: Connectionist reinforcement learning for intelligent unit micro management in Starcraft. In: The 2011 International Joint Conference on Neural Networks, pp. 1794–1801 (2011)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Takahashi, Y., Schoenbaum, G., Niv, Y.: Silencing the critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Front. Neurosci. 2(1), 86–99 (2008)
van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., Tsang, J.: Hybrid reward architecture for reinforcement learning (2017). https://arxiv.org/abs/1706.04208
Watkins, C.J.: Learning from delayed rewards. Ph.D. Thesis, University of Cambridge, England (1989)
Wiering, M., van Otterlo, M.: Reinforcement Learning: State of the Art. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3
Wiering, M.A., Van Hasselt, H.: Two novel on-policy reinforcement learning algorithms based on TD(\(\lambda \))-methods. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 280–287 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ozkohen, P., Visser, J., van Otterlo, M., Wiering, M. (2018). Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning. In: Verheij, B., Wiering, M. (eds) Artificial Intelligence. BNAIC 2017. Communications in Computer and Information Science, vol 823. Springer, Cham. https://doi.org/10.1007/978-3-319-76892-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-76892-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76891-5
Online ISBN: 978-3-319-76892-2
eBook Packages: Computer ScienceComputer Science (R0)