Learning from the Memory of Atari 2600

Sygnowski, Jakub; Michalewski, Henryk

doi:10.1007/978-3-319-57969-6_6

Jakub Sygnowski¹⁶ &
Henryk Michalewski¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 705))

Included in the following conference series:

807 Accesses

Abstract

We train a number of neural networks to play the games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen.

As the benchmark we used the convolutional model proposed in [17] and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For some games only some of these 18 actions are used in the gameplay. The number of available actions is 4 for Breakout, 18 for Seaquest and 6 for Bowling.
2.
E.g. we may declare that \(Q(state,action) = \theta _1 f_1(state,action) + \theta _2 f_2(state,action)\), where \(f_1,f_2\) are some fixed pre-defined functions, for example \(f_1\) may declare value 1 to the state-action pair (screen, fire) if a certain shape appears in the bottom-left corner of the screen and 0 otherwise and \(f_2\) may declare value 1 to (screen, left) if an enemy appeared on the right and 0 otherwise. Then the Q-learning algorithm learns the best values of \(\theta _1,\theta _2\).
3.
This algorithm is also called a deep q-network or DQN.
4.
The total number of experiments exceeded 100, but this includes experiments involving other models and repetitions of experiments described in this paper.
5.
We have not observed a statistically significant change in results when switching between replay memory size of \(10^5\) and \(5\cdot 10^5\).
6.
For Breakout we tested networks with best training-time results. The test consisted of choosing other random seeds and performing \(100\,000\) steps. For all networks, including nips, we received results consistently lower by about \(30\%\).
7.
The ale_ram’s evaluation method differ – the scores presented are the average over 30 trials consisting of a long period of learning and then a long period of testing, nevertheless the results are much worse than of any DQN-based method presented here.
8.
We also tried to pass all the RAM states as a (\(128*\texttt {FRAME\, SKIP}\))-dimensional vector, but this did not lead to an improved performance.

References

The Bowling Manual. https://atariage.com/manual_html_page.php?SoftwareID=879
Bowling (video game). https://en.wikipedia.org/wiki/Bowling_(video_game)
The Breakout Manual. https://atariage.com/manual_html_page.php?SoftwareID=889
Breakout (video game). https://en.wikipedia.org/wiki/Breakout_(video_game)
Lasagne - lightweight library to build and train neural networks in Theano. https://github.com/lasagne/lasagne
Nathan Sprague’s implementation of DQN. https://github.com/spragunr/deep_q_rl
The repository of our code. https://github.com/sygi/deep_q_rl
The Seaquest manual. https://atariage.com/manual_html_page.html?SoftwareLabelID=424
Seaquest (video game). https://en.wikipedia.org/wiki/Seaquest_(video_game)
Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)
Article MathSciNet MATH Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Oral Presentation (2010)
Google Scholar
Braylan, A., Hollenbeck, M., Meyerson, E., Miikkulainen, R.: Frame skip is a powerful parameter for learning to play Atari. In: AAAI-15 Workshop on Learning for General Competency in Video Games (2015)
Google Scholar
Defazio, A., Graepel, T.: A comparison of learning algorithms on the Arcade learning environment. CoRR abs/1410.8620 (2014). http://arxiv.org/abs/1410.8620
Liang, Y., Machado, M.C., Talvitie, E., Bowling, M.: State of the art control of Atari games using shallow reinforcement learning. arXiv preprint arXiv:1512.01563 (2015)
Lipovetzky, N., Ramirez, M., Geffner, H.: Classical planning with simulators: results on the Atari video games. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1610–1616 (2015)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, 3 internat edn. Pearson Education, Englewood Cliffs (2010)
MATH Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. arXiv preprint arXiv:1509.06461 (2015)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. ICML 2008, pp. 1096–1103. ACM, New York (2008)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures preprint arXiv:1511.06581 (2015)
Warde-Farley, D., Goodfellow, I.J., Courville, A., Bengio, Y.: An empirical analysis of dropout in piecewise linear networks. In: ICLR 2014 (2014)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Technical note Q-learning. Mach. Learn. 8, 279–292 (1992)
MATH Google Scholar

Download references

Acknowledgements

This research was carried out with the support of grant GG63-11 awarded by the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) University of Warsaw. We would like to express our thanks to Marc G. Bellemare for suggesting this research topic.

Author information

Authors and Affiliations

Department of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Poland
Jakub Sygnowski & Henryk Michalewski

Authors

Jakub Sygnowski
View author publications
You can also search for this author in PubMed Google Scholar
Henryk Michalewski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Sygnowski .

Editor information

Editors and Affiliations

LAMSADE, Université Paris-Dauphine, Paris Cedex 16, France
Tristan Cazenave
Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, Limburg, The Netherlands
Mark H.M. Winands
Institute for Artificial Intelligence, Universität Bremen, TAB, Bremen, Bremen, Germany
Stefan Edelkamp
Reykjavik University, Reykjavik, Iceland
Stephan Schiffel
The University of New South Wales, Sydney, New South Wales, Australia
Michael Thielscher
Department of Computer Science and Engineering, New York University, Brooklyn, New York, USA
Julian Togelius

A Parameters

The list of hyperparameters and their descriptions. Most of the descriptions come from [18] (Table 6).

Table 6. Parameters

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sygnowski, J., Michalewski, H. (2017). Learning from the Memory of Atari 2600. In: Cazenave, T., Winands, M., Edelkamp, S., Schiffel, S., Thielscher, M., Togelius, J. (eds) Computer Games. CGW GIGA 2016 2016. Communications in Computer and Information Science, vol 705. Springer, Cham. https://doi.org/10.1007/978-3-319-57969-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-57969-6_6
Published: 29 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57968-9
Online ISBN: 978-3-319-57969-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning from the Memory of Atari 2600

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Parameters

A Parameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation