Abstract
The combination of modern reinforcement learning and deep learning approaches brings significant breakthroughs to a variety of domains requiring both rich perception of high-dimensional sensory inputs and policy selection. A recent significant breakthrough in using deep neural networks as function approximators, termed Deep Q-Networks (DQN), proves to be very powerful for solving problems approaching real-world complexities such as Atari 2600 games. To remove temporal correlation between the observed transitions, DQN uses a sampling mechanism called experience reply which simply replays transitions at random from the memory buffer. However, such a mechanism does not exploit the importance of transitions in the memory buffer. In this paper, we use prioritized sampling into DQN as an alternative. Our experimental results demonstrate that DQN with prioritized sampling achieves a better performance, in terms of both average score and learning rate on four Atari 2600 games.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing atari with deep reinforcement learning. In: Deep Learning Workshop of the 27th Advances in Neural Information Processing Systems, NIPS, Lake Tahoe (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Lin, L.J.: Reinforcement learning for robots using neural networks. Technical report, DTIC Document (1993)
Tsitsiklis, J.N., Van, R.B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Hinton, G.E., Srivastava, N., Krizhevsky, A., et al.: Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3(4), 212–223 (2012)
Bellemare, M.G., Naddaf, Y., Veness, J., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47(1), 253–279 (2012)
Acknowledgements
This work was funded by National Natural Science Foundation (61272005, 61303108, 61373094, 61502323, 61472262). We would also like to thank he reviewers for their helpful comments. Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201308, SYG201422).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhai, J. et al. (2016). Deep Q-Learning with Prioritized Sampling. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-46687-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)