Skip to main content

Learning from the Memory of Atari 2600

  • Conference paper
  • First Online:
Computer Games (CGW 2016, GIGA 2016)

Abstract

We train a number of neural networks to play the games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen.

As the benchmark we used the convolutional model proposed in [17] and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For some games only some of these 18 actions are used in the gameplay. The number of available actions is 4 for Breakout, 18 for Seaquest and 6 for Bowling.

  2. 2.

    E.g. we may declare that \(Q(state,action) = \theta _1 f_1(state,action) + \theta _2 f_2(state,action)\), where \(f_1,f_2\) are some fixed pre-defined functions, for example \(f_1\) may declare value 1 to the state-action pair (screen, fire) if a certain shape appears in the bottom-left corner of the screen and 0 otherwise and \(f_2\) may declare value 1 to (screen, left) if an enemy appeared on the right and 0 otherwise. Then the Q-learning algorithm learns the best values of \(\theta _1,\theta _2\).

  3. 3.

    This algorithm is also called a deep q-network or DQN.

  4. 4.

    The total number of experiments exceeded 100, but this includes experiments involving other models and repetitions of experiments described in this paper.

  5. 5.

    We have not observed a statistically significant change in results when switching between replay memory size of \(10^5\) and \(5\cdot 10^5\).

  6. 6.

    For Breakout we tested networks with best training-time results. The test consisted of choosing other random seeds and performing \(100\,000\) steps. For all networks, including nips, we received results consistently lower by about \(30\%\).

  7. 7.

    The ale_ram’s evaluation method differ – the scores presented are the average over 30 trials consisting of a long period of learning and then a long period of testing, nevertheless the results are much worse than of any DQN-based method presented here.

  8. 8.

    We also tried to pass all the RAM states as a (\(128*\texttt {FRAME\, SKIP}\))-dimensional vector, but this did not lead to an improved performance.

References

  1. The Bowling Manual. https://atariage.com/manual_html_page.php?SoftwareID=879

  2. Bowling (video game). https://en.wikipedia.org/wiki/Bowling_(video_game)

  3. The Breakout Manual. https://atariage.com/manual_html_page.php?SoftwareID=889

  4. Breakout (video game). https://en.wikipedia.org/wiki/Breakout_(video_game)

  5. Lasagne - lightweight library to build and train neural networks in Theano. https://github.com/lasagne/lasagne

  6. Nathan Sprague’s implementation of DQN. https://github.com/spragunr/deep_q_rl

  7. The repository of our code. https://github.com/sygi/deep_q_rl

  8. The Seaquest manual. https://atariage.com/manual_html_page.html?SoftwareLabelID=424

  9. Seaquest (video game). https://en.wikipedia.org/wiki/Seaquest_(video_game)

  10. Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Google Scholar 

  12. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Oral Presentation (2010)

    Google Scholar 

  13. Braylan, A., Hollenbeck, M., Meyerson, E., Miikkulainen, R.: Frame skip is a powerful parameter for learning to play Atari. In: AAAI-15 Workshop on Learning for General Competency in Video Games (2015)

    Google Scholar 

  14. Defazio, A., Graepel, T.: A comparison of learning algorithms on the Arcade learning environment. CoRR abs/1410.8620 (2014). http://arxiv.org/abs/1410.8620

  15. Liang, Y., Machado, M.C., Talvitie, E., Bowling, M.: State of the art control of Atari games using shallow reinforcement learning. arXiv preprint arXiv:1512.01563 (2015)

  16. Lipovetzky, N., Ramirez, M., Geffner, H.: Classical planning with simulators: results on the Atari video games. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1610–1616 (2015)

    Google Scholar 

  17. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)

    Google Scholar 

  18. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  19. Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, 3 internat edn. Pearson Education, Englewood Cliffs (2010)

    MATH  Google Scholar 

  20. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  21. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. arXiv preprint arXiv:1509.06461 (2015)

  22. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. ICML 2008, pp. 1096–1103. ACM, New York (2008)

    Google Scholar 

  23. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures preprint arXiv:1511.06581 (2015)

  24. Warde-Farley, D., Goodfellow, I.J., Courville, A., Bengio, Y.: An empirical analysis of dropout in piecewise linear networks. In: ICLR 2014 (2014)

    Google Scholar 

  25. Watkins, C.J.C.H., Dayan, P.: Technical note Q-learning. Mach. Learn. 8, 279–292 (1992)

    MATH  Google Scholar 

Download references

Acknowledgements

This research was carried out with the support of grant GG63-11 awarded by the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) University of Warsaw. We would like to express our thanks to Marc G. Bellemare for suggesting this research topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Sygnowski .

Editor information

Editors and Affiliations

A Parameters

A Parameters

The list of hyperparameters and their descriptions. Most of the descriptions come from [18] (Table 6).

Table 6. Parameters

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sygnowski, J., Michalewski, H. (2017). Learning from the Memory of Atari 2600. In: Cazenave, T., Winands, M., Edelkamp, S., Schiffel, S., Thielscher, M., Togelius, J. (eds) Computer Games. CGW GIGA 2016 2016. Communications in Computer and Information Science, vol 705. Springer, Cham. https://doi.org/10.1007/978-3-319-57969-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57969-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57968-9

  • Online ISBN: 978-3-319-57969-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics