Skip to main content

Accelerating Spatio-Temporal Deep Reinforcement Learning Model for Game Strategy

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

  • 2259 Accesses

Abstract

In recent years, deep reinforcement learning has developed rapidly. Many deep reinforcement learning models are applied in various simple game environments. There are many applications with environments far more complex than simple games. Hence, the performance of the deep reinforcement learning model should be improved in many aspects. In this paper, we explore the effect of fast training and enhancing spatio-temporal representation in deep reinforcement learning model. For the former aspect, we propose to utilize the depthwise separable Convolutional Neural Network (CNN) to accelerate deep reinforcement learning model. For the latter aspect, we introduce the convolutional long short-term memory network (ConvLSTM) to improve the expression ability of spatio-temporal feature. We verify the models in the experiments of StarCraft II [1], a game strategy with a complex environment for reinforcement learning. All of the agents learn a certain level game strategy, such as ‘siege’ and ‘searching’. The experimental results show that depth-wise separable CNN has a good effect in shortening training time and the ConvLSTM has better spatial and temporal feature representation ability to improve the performance of the agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vinyals, O., et al.: StarCraft II: A New Challenge for Reinforcement Learning. https://arxiv.org/abs/1708.04782. Accessed 16 Aug 2017

  2. Yu, K., Jia, L., Chen, Y., Xu, W.: Deep learning: yesterday, today, and tomorrow. J. Comput. Res. Develop. 20(6), 1349 (2013)

    Google Scholar 

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)

    Google Scholar 

  4. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  5. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. IEEE Computer Society (2014)

    Google Scholar 

  6. Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1724–1734 (2014)

    Google Scholar 

  7. Yang, Z., Tao, D.P., Zhang, S.Y., Jin, L.W.: Similar handwritten Chinese character recognition based on deep neural networks with big data. J. Commun. 35(9), 184–189 (2014)

    Google Scholar 

  8. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)

    Google Scholar 

  9. Li, Y., Zhang, J., Pan, D., Hu, D.: A study of speech recognition based on RNN-RBM language model. J. Comput. Res. Develop. 51(9), 1936–1944 (2014)

    Google Scholar 

  10. Sun, Z.J., Xue, L., Xu, Y.M., Wang, Z.: Overview of deep learning. Appl. Res. Comput. 29(8), 2806–2810 (2012)

    Google Scholar 

  11. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, Bradford Book. MIT Press, Cambridge (2005). IEEE Transactions on Neural Networks 16(1), 285–286

    Google Scholar 

  12. Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  13. Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215–219 (1989)

    Article  Google Scholar 

  14. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  15. Fu, Q.M., Liu, Q., Wang, H., Xiao, F., Yu, J., Li, J.: A novel off policy Q(λ) algorithm based on linear function approximation. Chin. J. Comput. 37(3), 677–686 (2014)

    MathSciNet  Google Scholar 

  16. Gao, Y., Zhou, R.Y., Wang, H., Cao, Z.X.: Study on an average reward reinforcement learning algorithm. Chin. J. Comput. 30(8), 1372–1378 (2007)

    MathSciNet  Google Scholar 

  17. Wei, Y.Z., Zhao, M.Y.: A reinforcement learning-based approach to dynamic job-shop scheduling. Acta Autom. Sin. 31(5), 765–771 (2005)

    MathSciNet  Google Scholar 

  18. Ipek, E., Mutlu, O., Carunana, R.: Self-optimizing memory controllers: a reinforcement learning approach. In: International Symposium on Computer Architecture, pp. 39–50. IEEE (2008)

    Google Scholar 

  19. Mnih, V., et al.: Playing atari with deep reinforcement learning. Computer Science, pp. 201–220 (2013)

    Google Scholar 

  20. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  21. Silver, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  22. Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Proceedings of the Neural Information Processing Systems, Montreal, Canada, pp. 2863–2871 (2015)

    Google Scholar 

  23. Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: IEEE International Conference on Computer Vision, pp. 2488–2496. IEEE Computer Society (2015)

    Google Scholar 

  24. Lillicrap, T.P.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2016)

    Google Scholar 

  25. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control, pp. 1329–1338 (2016)

    Google Scholar 

  26. Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep Q-Learning with model-based acceleration. In: Proceeding of ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 2829–2838 (2016)

    Google Scholar 

  27. Hansen, S.: Using deep Q-learning to control optimization hyperparameters. https://arxiv.org/abs/1602.04062v2. Accessed 19 Jun 2016

  28. Andrychowicz, M.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 3981–3989 (2016)

    Google Scholar 

  29. Mnih, V.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning, New York, USA, pp. 1928–1937 (2016)

    Google Scholar 

  30. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347v2. Accessed 28 Aug 2017

  31. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. Computer Science, pp. 1889–1897 (2015)

    Google Scholar 

  32. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861. Accessed 17 Apr 2017

  33. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W., Woo, W.: Convolutional LSTM Network: a machine learning approach for precipitation nowcasting. In: International Conference on Neural Information Processing Systems, pp. 802–810. MIT Press (2015)

    Google Scholar 

  34. Jaderberg, M., et al.: Reinforcement Learning with Unsupervised Auxiliary Tasks. https://arxiv.org/abs/1611.05397. Accessed 16 Nov 2016

Download references

Acknowledgements

The work is funded by the Shanghai Undergraduate Student Innovation Project, the National Natural Science Foundation of China (No. 61170155) and the Shanghai Innovation Action Plan Project (No. 16511101200).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchun Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y., Fang, Y. (2018). Accelerating Spatio-Temporal Deep Reinforcement Learning Model for Game Strategy. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04182-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04181-6

  • Online ISBN: 978-3-030-04182-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics