Skip to main content

Deep Reinforcement Learning with Hidden Layers on Future States

  • Conference paper
  • First Online:
Computer Games (CGW 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 818))

Included in the following conference series:

  • 772 Accesses

Abstract

Deep reinforcement learning algorithms such as Deep Q-Networks have successfully been used to construct a strong agent for Atari games by only performing direct evaluation of the current state and actions. This is in stark contrast to the algorithms for traditional board games such as Chess or Go, where a look-ahead search mechanism is indispensable to build a strong agent. In this paper, we present a novel deep reinforcement learning architecture that can both effectively and efficiently use information on future states in video games. First, we demonstrate that such information is indeed quite useful in deep reinforcement learning by using exact state transition information obtained from the emulator. We then propose a method that predicts future states using Long Short Term Memory (LSTM), such that the agent can look ahead without the emulator. In this work, we applied our method to the asynchronous advantage actor-critic (A3C) architecture. The experimental results show that our proposed method with predicted future states substantially outperforms the vanilla A3C in several Atari games.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the both scores are reported at [4, 9]. The evaluation setup may not be the same.

  2. 2.

    https://github.com/muupan/async-rl.

References

  1. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Google Scholar 

  2. Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 1471–1479. Curran Associates, Inc. (2016)

    Google Scholar 

  3. Bellemare, M., Veness, J., Bowling, M.: Investigating contingency awareness using Atari 2600 games. In: AAAI Conference on Artificial Intelligence, pp. 864–871 (2012)

    Google Scholar 

  4. Guo, X., Singh, S., Lewis, R., Lee, H.: Deep learning for reward design to improve Monte Carlo tree search in Atari games. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, pp. 1519–1525 (2016)

    Google Scholar 

  5. Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. In: 2015 AAAI Fall Symposium Series, pp. 29–37 (2015)

    Google Scholar 

  6. Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., Kavukcuoglu, K.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016). http://arxiv.org/abs/1611.05397

  7. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  8. Lin, L.J.: Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University (1992)

    Google Scholar 

  9. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)

    Google Scholar 

  10. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)

    Google Scholar 

  11. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  12. Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., et al.: Massively parallel methods for deep reinforcement learning. In: ICML Deep Learning Workshop (2015)

    Google Scholar 

  13. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 807–814 (2010)

    Google Scholar 

  14. Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4026–4034. Curran Associates, Inc. (2016)

    Google Scholar 

  15. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1889–1897 (2015)

    Google Scholar 

  16. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  17. Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning, pp. 216–224 (1990)

    Google Scholar 

  18. Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull. 2(4), 160–163 (1991)

    Article  Google Scholar 

  19. Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: #Exploration: a study of count-based exploration for deep reinforcement learning. In: NIPS Deep Reinforcement Learning Workshop (2016)

    Google Scholar 

  20. Tieleman, T., Hinton, G.: Lecture 6e RMSprop: divide the gradient by a running average of its recent magnitude. Coursera: Neural Networks for Machine Learning (2012)

    Google Scholar 

  21. Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  22. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2016)

    Google Scholar 

  23. Wang, Z., de Freitas, N., Lanctot, M.: Dueling network architectures for deep reinforcement learning. CoRR abs/1511.06581 (2015). http://arxiv.org/abs/1511.06581

  24. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)

    Google Scholar 

  25. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hirotaka Kameko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kameko, H., Suzuki, J., Mizukami, N., Tsuruoka, Y. (2018). Deep Reinforcement Learning with Hidden Layers on Future States. In: Cazenave, T., Winands, M., Saffidine, A. (eds) Computer Games. CGW 2017. Communications in Computer and Information Science, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-319-75931-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75931-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75930-2

  • Online ISBN: 978-3-319-75931-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics