Abstract
Advances in deep reinforcement learning have allowed autonomous agents to perform well on video games, often outperforming humans, using only raw pixels to make their decisions. However, timely context awareness is not fully integrated. In this paper, we extend Deep Q-network (DQN) with spatio-temporal architecture - a novel framework that handles the temporal limitation problem. To incorporate spatio-temporal information, we construct variants of architectures by feeding spatial and temporal representations into Deep Q-networks in different ways, which are DQN with convolutional neural network (DQN-Conv), DQN with LSTM recurrent neural network (DQN-LSTM), DQN with 3D convolutional neural network (DQN-3DConv), and DQN with spatial and temporal fusion (DQN-Fusion), to explore the mutual but also fuzzy relationship between them. Extensive experiments are conducted on popular mobile game Flappy Bird and our framework achieves superior results when compared to baseline models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bunel, R., Hausknecht, M., Devlin, J., Singh, R., Kohli, P.: Leveraging grammar and reinforcement learning for neural program synthesis. arXiv preprint arXiv:1805.04276 (2018)
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535. IEEE (2016)
Kiran, B.R., et al.: Deep Reinforcement Learning for Autonomous Driving: A Survey. arXiv preprint arXiv:2002.00444 (2020)
Lipovetzky, N., Ramirez, M., Geffner, H.: Classical planning with simulators: results on the Atari video games. In 24th International Joint Conference on Artificial Intelligence (2015)
Aldape, P., Sowell, S.: Reinforcement Learning for a Simple Racing Game (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1106–1114 (2012)
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2013)
Mnih, V.: Machine Learning for Aerial Image Labeling. Ph.D. thesis, University of Toronto (2013)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action conditional video prediction using deep networks in Atari games. In: Advances in Neural Information Processing Systems, vol. 2863–2871 (2015)
Watter, M., Springenberg, J., Boedecker, J., Riedmiller, M.: Embed to control: a locally linear latent dynamics model for control from raw images. In: Advances in Neural Information Processing Systems, pp. 2746–2754 (2015)
Wahlström, N., Schön, T.B., Deisenroth, M.P.: Learning deep dynamical models from image pixels. IFAC-PapersOnLine 48(28), 1059–1064 (2015)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992). https://doi.org/10.1007/BF00992699
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. In AAAI Fall Symposium Series (2015)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Vu, T., Tran, L.: FlapAI Bird: Training an Agent to Play Flappy Bird Using Reinforcement Learning Techniques. arXiv preprint arXiv:2003.09579 (2020)
Piper, M., Bhounsule, P., Castillo-Villar, K.K.: How to beat flappy bird: a mixed-integer model predictive control approach. In: ASME Dynamic Systems and Control Conference. American Society of Mechanical Engineers Digital Collection (2017)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Bellemare, M., Veness, J., Bowling, M.: Bayesian learning of recursively factored environments. In: Proceedings of the 13th International Conference on Machine Learning, pp. 1211–1219 (2013)
Acknowledgement
This work was supported by the National Key R&D Program of China (No. 2017YFE0117500).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Zy., Liu, Jw., Li, W., Zuo, X. (2020). Deep Reinforcement Learning with Temporal-Awareness Network. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)