Abstract
We train a number of neural networks to play the games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen.
As the benchmark we used the convolutional model proposed in [17] and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For some games only some of these 18 actions are used in the gameplay. The number of available actions is 4 for Breakout, 18 for Seaquest and 6 for Bowling.
- 2.
E.g. we may declare that \(Q(state,action) = \theta _1 f_1(state,action) + \theta _2 f_2(state,action)\), where \(f_1,f_2\) are some fixed pre-defined functions, for example \(f_1\) may declare value 1 to the state-action pair (screen, fire) if a certain shape appears in the bottom-left corner of the screen and 0 otherwise and \(f_2\) may declare value 1 to (screen, left) if an enemy appeared on the right and 0 otherwise. Then the Q-learning algorithm learns the best values of \(\theta _1,\theta _2\).
- 3.
This algorithm is also called a deep q-network or DQN.
- 4.
The total number of experiments exceeded 100, but this includes experiments involving other models and repetitions of experiments described in this paper.
- 5.
We have not observed a statistically significant change in results when switching between replay memory size of \(10^5\) and \(5\cdot 10^5\).
- 6.
For Breakout we tested networks with best training-time results. The test consisted of choosing other random seeds and performing \(100\,000\) steps. For all networks, including nips, we received results consistently lower by about \(30\%\).
- 7.
The ale_ram’s evaluation method differ – the scores presented are the average over 30 trials consisting of a long period of learning and then a long period of testing, nevertheless the results are much worse than of any DQN-based method presented here.
- 8.
We also tried to pass all the RAM states as a (\(128*\texttt {FRAME\, SKIP}\))-dimensional vector, but this did not lead to an improved performance.
References
The Bowling Manual. https://atariage.com/manual_html_page.php?SoftwareID=879
Bowling (video game). https://en.wikipedia.org/wiki/Bowling_(video_game)
The Breakout Manual. https://atariage.com/manual_html_page.php?SoftwareID=889
Breakout (video game). https://en.wikipedia.org/wiki/Breakout_(video_game)
Lasagne - lightweight library to build and train neural networks in Theano. https://github.com/lasagne/lasagne
Nathan Sprague’s implementation of DQN. https://github.com/spragunr/deep_q_rl
The repository of our code. https://github.com/sygi/deep_q_rl
The Seaquest manual. https://atariage.com/manual_html_page.html?SoftwareLabelID=424
Seaquest (video game). https://en.wikipedia.org/wiki/Seaquest_(video_game)
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Oral Presentation (2010)
Braylan, A., Hollenbeck, M., Meyerson, E., Miikkulainen, R.: Frame skip is a powerful parameter for learning to play Atari. In: AAAI-15 Workshop on Learning for General Competency in Video Games (2015)
Defazio, A., Graepel, T.: A comparison of learning algorithms on the Arcade learning environment. CoRR abs/1410.8620 (2014). http://arxiv.org/abs/1410.8620
Liang, Y., Machado, M.C., Talvitie, E., Bowling, M.: State of the art control of Atari games using shallow reinforcement learning. arXiv preprint arXiv:1512.01563 (2015)
Lipovetzky, N., Ramirez, M., Geffner, H.: Classical planning with simulators: results on the Atari video games. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1610–1616 (2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, 3 internat edn. Pearson Education, Englewood Cliffs (2010)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. arXiv preprint arXiv:1509.06461 (2015)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. ICML 2008, pp. 1096–1103. ACM, New York (2008)
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures preprint arXiv:1511.06581 (2015)
Warde-Farley, D., Goodfellow, I.J., Courville, A., Bengio, Y.: An empirical analysis of dropout in piecewise linear networks. In: ICLR 2014 (2014)
Watkins, C.J.C.H., Dayan, P.: Technical note Q-learning. Mach. Learn. 8, 279–292 (1992)
Acknowledgements
This research was carried out with the support of grant GG63-11 awarded by the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) University of Warsaw. We would like to express our thanks to Marc G. Bellemare for suggesting this research topic.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sygnowski, J., Michalewski, H. (2017). Learning from the Memory of Atari 2600. In: Cazenave, T., Winands, M., Edelkamp, S., Schiffel, S., Thielscher, M., Togelius, J. (eds) Computer Games. CGW GIGA 2016 2016. Communications in Computer and Information Science, vol 705. Springer, Cham. https://doi.org/10.1007/978-3-319-57969-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-57969-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57968-9
Online ISBN: 978-3-319-57969-6
eBook Packages: Computer ScienceComputer Science (R0)