Accelerating Spatio-Temporal Deep Reinforcement Learning Model for Game Strategy

Li, Yifan; Fang, Yuchun

doi:10.1007/978-3-030-04182-3_27

Yifan Li¹⁶ &
Yuchun Fang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

International Conference on Neural Information Processing

2259 Accesses

Abstract

In recent years, deep reinforcement learning has developed rapidly. Many deep reinforcement learning models are applied in various simple game environments. There are many applications with environments far more complex than simple games. Hence, the performance of the deep reinforcement learning model should be improved in many aspects. In this paper, we explore the effect of fast training and enhancing spatio-temporal representation in deep reinforcement learning model. For the former aspect, we propose to utilize the depthwise separable Convolutional Neural Network (CNN) to accelerate deep reinforcement learning model. For the latter aspect, we introduce the convolutional long short-term memory network (ConvLSTM) to improve the expression ability of spatio-temporal feature. We verify the models in the experiments of StarCraft II [1], a game strategy with a complex environment for reinforcement learning. All of the agents learn a certain level game strategy, such as ‘siege’ and ‘searching’. The experimental results show that depth-wise separable CNN has a good effect in shortening training time and the ConvLSTM has better spatial and temporal feature representation ability to improve the performance of the agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vinyals, O., et al.: StarCraft II: A New Challenge for Reinforcement Learning. https://arxiv.org/abs/1708.04782. Accessed 16 Aug 2017
Yu, K., Jia, L., Chen, Y., Xu, W.: Deep learning: yesterday, today, and tomorrow. J. Comput. Res. Develop. 20(6), 1349 (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. IEEE Computer Society (2014)
Google Scholar
Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1724–1734 (2014)
Google Scholar
Yang, Z., Tao, D.P., Zhang, S.Y., Jin, L.W.: Similar handwritten Chinese character recognition based on deep neural networks with big data. J. Commun. 35(9), 184–189 (2014)
Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Google Scholar
Li, Y., Zhang, J., Pan, D., Hu, D.: A study of speech recognition based on RNN-RBM language model. J. Comput. Res. Develop. 51(9), 1936–1944 (2014)
Google Scholar
Sun, Z.J., Xue, L., Xu, Y.M., Wang, Z.: Overview of deep learning. Appl. Res. Comput. 29(8), 2806–2810 (2012)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, Bradford Book. MIT Press, Cambridge (2005). IEEE Transactions on Neural Networks 16(1), 285–286
Google Scholar
Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Article Google Scholar
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215–219 (1989)
Article Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Fu, Q.M., Liu, Q., Wang, H., Xiao, F., Yu, J., Li, J.: A novel off policy Q(λ) algorithm based on linear function approximation. Chin. J. Comput. 37(3), 677–686 (2014)
MathSciNet Google Scholar
Gao, Y., Zhou, R.Y., Wang, H., Cao, Z.X.: Study on an average reward reinforcement learning algorithm. Chin. J. Comput. 30(8), 1372–1378 (2007)
MathSciNet Google Scholar
Wei, Y.Z., Zhao, M.Y.: A reinforcement learning-based approach to dynamic job-shop scheduling. Acta Autom. Sin. 31(5), 765–771 (2005)
MathSciNet Google Scholar
Ipek, E., Mutlu, O., Carunana, R.: Self-optimizing memory controllers: a reinforcement learning approach. In: International Symposium on Computer Architecture, pp. 39–50. IEEE (2008)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. Computer Science, pp. 201–220 (2013)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Silver, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Proceedings of the Neural Information Processing Systems, Montreal, Canada, pp. 2863–2871 (2015)
Google Scholar
Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: IEEE International Conference on Computer Vision, pp. 2488–2496. IEEE Computer Society (2015)
Google Scholar
Lillicrap, T.P.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2016)
Google Scholar
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control, pp. 1329–1338 (2016)
Google Scholar
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep Q-Learning with model-based acceleration. In: Proceeding of ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 2829–2838 (2016)
Google Scholar
Hansen, S.: Using deep Q-learning to control optimization hyperparameters. https://arxiv.org/abs/1602.04062v2. Accessed 19 Jun 2016
Andrychowicz, M.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 3981–3989 (2016)
Google Scholar
Mnih, V.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning, New York, USA, pp. 1928–1937 (2016)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347v2. Accessed 28 Aug 2017
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. Computer Science, pp. 1889–1897 (2015)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861. Accessed 17 Apr 2017
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W., Woo, W.: Convolutional LSTM Network: a machine learning approach for precipitation nowcasting. In: International Conference on Neural Information Processing Systems, pp. 802–810. MIT Press (2015)
Google Scholar
Jaderberg, M., et al.: Reinforcement Learning with Unsupervised Auxiliary Tasks. https://arxiv.org/abs/1611.05397. Accessed 16 Nov 2016

Download references

Acknowledgements

The work is funded by the Shanghai Undergraduate Student Innovation Project, the National Natural Science Foundation of China (No. 61170155) and the Shanghai Innovation Action Plan Project (No. 16511101200).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Yifan Li & Yuchun Fang

Authors

Yifan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuchun Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuchun Fang .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Fang, Y. (2018). Accelerating Spatio-Temporal Deep Reinforcement Learning Model for Game Strategy. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-04182-3_27
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics