Deep Reinforcement Learning with Hidden Layers on Future States

Kameko, Hirotaka; Suzuki, Jun; Mizukami, Naoki; Tsuruoka, Yoshimasa

doi:10.1007/978-3-319-75931-9_4

Hirotaka Kameko¹²,
Jun Suzuki^13,14,
Naoki Mizukami¹² &
…
Yoshimasa Tsuruoka¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 818))

Included in the following conference series:

Workshop on Computer Games

772 Accesses

Abstract

Deep reinforcement learning algorithms such as Deep Q-Networks have successfully been used to construct a strong agent for Atari games by only performing direct evaluation of the current state and actions. This is in stark contrast to the algorithms for traditional board games such as Chess or Go, where a look-ahead search mechanism is indispensable to build a strong agent. In this paper, we present a novel deep reinforcement learning architecture that can both effectively and efficiently use information on future states in video games. First, we demonstrate that such information is indeed quite useful in deep reinforcement learning by using exact state transition information obtained from the emulator. We then propose a method that predicts future states using Long Short Term Memory (LSTM), such that the agent can look ahead without the emulator. In this work, we applied our method to the asynchronous advantage actor-critic (A3C) architecture. The experimental results show that our proposed method with predicted future states substantially outperforms the vanilla A3C in several Atari games.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the both scores are reported at [4, 9]. The evaluation setup may not be the same.
2.
https://github.com/muupan/async-rl.

References

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 1471–1479. Curran Associates, Inc. (2016)
Google Scholar
Bellemare, M., Veness, J., Bowling, M.: Investigating contingency awareness using Atari 2600 games. In: AAAI Conference on Artificial Intelligence, pp. 864–871 (2012)
Google Scholar
Guo, X., Singh, S., Lewis, R., Lee, H.: Deep learning for reward design to improve Monte Carlo tree search in Atari games. In: Proceedings of 25th International Joint Conference on Artificial Intelligence, pp. 1519–1525 (2016)
Google Scholar
Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. In: 2015 AAAI Fall Symposium Series, pp. 29–37 (2015)
Google Scholar
Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., Kavukcuoglu, K.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016). http://arxiv.org/abs/1611.05397
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Lin, L.J.: Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University (1992)
Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., et al.: Massively parallel methods for deep reinforcement learning. In: ICML Deep Learning Workshop (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 807–814 (2010)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4026–4034. Curran Associates, Inc. (2016)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1889–1897 (2015)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning, pp. 216–224 (1990)
Google Scholar
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull. 2(4), 160–163 (1991)
Article Google Scholar
Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: #Exploration: a study of count-based exploration for deep reinforcement learning. In: NIPS Deep Reinforcement Learning Workshop (2016)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6e RMSprop: divide the gradient by a running average of its recent magnitude. Coursera: Neural Networks for Machine Learning (2012)
Google Scholar
Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2016)
Google Scholar
Wang, Z., de Freitas, N., Lanctot, M.: Dueling network architectures for deep reinforcement learning. CoRR abs/1511.06581 (2015). http://arxiv.org/abs/1511.06581
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
Hirotaka Kameko, Naoki Mizukami & Yoshimasa Tsuruoka
NTT Communication Science Laboratories, NTT Corporation, Tokyo, Japan
Jun Suzuki
RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Jun Suzuki

Authors

Hirotaka Kameko
View author publications
You can also search for this author in PubMed Google Scholar
Jun Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Mizukami
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimasa Tsuruoka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hirotaka Kameko .

Editor information

Editors and Affiliations

Université Paris-Dauphine, Paris, France
Tristan Cazenave
Maastricht University, Maastricht, The Netherlands
Mark H.M. Winands
The University of New South Wales, Sydney, New South Wales, Australia
Abdallah Saffidine

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kameko, H., Suzuki, J., Mizukami, N., Tsuruoka, Y. (2018). Deep Reinforcement Learning with Hidden Layers on Future States. In: Cazenave, T., Winands, M., Saffidine, A. (eds) Computer Games. CGW 2017. Communications in Computer and Information Science, vol 818. Springer, Cham. https://doi.org/10.1007/978-3-319-75931-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-75931-9_4
Published: 15 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75930-2
Online ISBN: 978-3-319-75931-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics