Deep Q-Learning with Phased Experience Cooperation

Wang, Hongbo; Zeng, Fanbing; Tu, Xuyan

doi:10.1007/978-981-15-1377-0_58

Deep Q-Learning with Phased Experience Cooperation

Hongbo Wang¹²,
Fanbing Zeng¹² &
Xuyan Tu¹²

Conference paper
First Online: 14 November 2019

965 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1042))

Abstract

The value-based reinforcement learning algorithms train agents by storing previous experience rewards, however, this simply sampling at the same probability results in a slow learning rate. In reality, the importance of each sample is not exactly the same. The use of prioritized experience replay greatly improves the learning rate of reinforcement learning, but good experiences and more effective strategies may be ignored or missed. In order to overcome two shortcomings, a Deep Q-learning with phased experience replay (MixDQN) is put forward in this article, where the priority is used to improve the training rate in the early stage of training and the random sampling in the later stage to make good use of good experience. Experiments with three classic control problems are based on OpenAI Gym. The experimental results prove that the MixDQN can enable an agent’s learning more stably, quickly and efficiently.

Supported in part by the National Natural Science Foundation of China under Grant 61572074 and in part by the China Scholarship Council under Grant 201706465028.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Onactor-critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2003)
Article MathSciNet Google Scholar
Bhatnagar, S., Ghavamzadeh, M., Lee, M., Sutton, R.S.: Incremental natural actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 105–112 (2008)
Google Scholar
Grondman, I., Busoniu, L., Lopes, G.A., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 1291–1307 (2012)
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms (2014)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Narasimhan, K., Kulkarni, T., Barzilay, R.: Language understanding for text-based games using deep reinforcement learning. arXiv preprint arXiv:1506.08941 (2015)
Zelinka, M.: Using reinforcement learning to learn how to play text-based games. arXiv preprint arXiv:1801.01999 (2018)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. pp. 449–458. JMLR.org (2017)
Google Scholar
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Atherton, L.A., Dupret, D., Mellor, J.R.: Memory trace replay: the shaping of memory consolidation by neuromodulation. Trends Neurosci. 38(9), 560–570 (2015)
Article Google Scholar
McNamara, C.G., Tejero-Cantero, Á., Trouche, S., Campo-Urriza, N., Dupret, D.: Dopaminergic neurons promote hippocampal reactivation and spatial memory persistence. Nat. Neurosci. 17(12), 1658 (2014)
Article Google Scholar
Hinton, G.E.: To recognize shapes, first learn to generate images. Progr. Brain Res. 165, 535–547 (2007)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Adam, S., Busoniu, L., Babuska, R.: Experience replay for real-time reinforcement learning control. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 201–212 (2011)
Article Google Scholar
Wawrzyński, P.: Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Netw. 22(10), 1484–1497 (2009)
Article Google Scholar
Hou, Y., Liu, L., Wei, Q., Xu, X., Chen, C.: A novel DDPG method with prioritized experience replay. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 316–321. IEEE (2017)
Google Scholar
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Sauthoff, G., Mhl, M., Janssen, S., Giegerich, R.: Bellmans GAP a language and compiler for dynamic programming in sequence analysis. Bioinformatics 29(5), 551–560 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Hongbo Wang, Fanbing Zeng & Xuyan Tu

Authors

Hongbo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fanbing Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xuyan Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbo Wang .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Kunming University of Science and Technology, Kunming, China
Zhengtao Yu
Tongji University, Shanghai, China
Hongfei Fan
University of Shanghai for Science and Technology, Shanghai, China
Liping Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Zeng, F., Tu, X. (2019). Deep Q-Learning with Phased Experience Cooperation. In: Sun, Y., Lu, T., Yu, Z., Fan, H., Gao, L. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2019. Communications in Computer and Information Science, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-15-1377-0_58

Download citation

DOI: https://doi.org/10.1007/978-981-15-1377-0_58
Published: 14 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1376-3
Online ISBN: 978-981-15-1377-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)