When Less May Be More: Exploring Similarity to Improve Experience Replay

Neves, Daniel Eugênio; Ishitani, Lucila; do Patrocínio Júnior, Zenilton Kleber Gonçalves

doi:10.1007/978-3-031-21689-3_8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13654 ))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1047 Accesses
1 Citations

Abstract

We propose the COMPact Experience Replay (COMPER) as a reinforcement learning method that seeks to reduce the required number of experiences to agent training regarding the total accumulated rewards in the long run. COMPER uses temporal difference learning with predicted target values for sets of similar transitions and a new experience replay approach based on two memories of transitions. We present an assessment of two possible neural network architectures for the target network with a complete analysis of the memories’ behavior, along with detailed results for 100,000 frames and about 25,000 iterations with a small experience memory on eight challenging 2600 Atari games on the Arcade Learning Environment (ALE). We also present results for a Deep Q-Network (DQN) agent with the same experimental protocol on the same set of games as a baseline. We demonstrate that COMPER can approximate a good policy from a small number of frame observations using a compact memory and learning the similar transitions’ sets dynamics using a recurrent neural network.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. It has also received partial funding from Brazilian Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and from Pontifical Catholic University of Minas Gerais (PUC Minas).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lucid dreaming for experience replay: refreshing past states with the current policy

Article 25 May 2021

Deep Reinforcement Learning in Strategic Board Game Environments

Deep Q-Learning with Prioritized Sampling

Notes

1.
Available at https://github.com/mgbellemare/Arcade-Learning-Environment.
2.
https://github.com/DanielEugenioNeves/COMPER-RELEASE-RESULTS.

References

Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 449–458 (2017)
Google Scholar
Fedus, W., et al.: Revisiting fundamentals of experience replay. In: Daumé, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 3061–3071 (2020)
Google Scholar
Fortunato, M., et al.: Noisy networks for exploration. In: Proceedings of the International Conference on Representation Learning (2018)
Google Scholar
van Hasselt, H.: Double Q-learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23 (NIPS), pp. 2613–2621 (2010)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (2015)
Google Scholar
Hessel, M., et al.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Article Google Scholar
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. J. Artifi. Intell. Res. 61, 523–562 (2018)
Article MathSciNet MATH Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: Proceedings of the International Conference on Representation Learning (2016)
Google Scholar
Sutton, R.S.: Reinforcement learning architectures. In: Proceedings ISKIT 1992 International Symposium on Neural Information Processing (1992)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1995–2003 (2016)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Article MATH Google Scholar
Wei, Q., Ma, H., Chen, C., Dong, D.: Deep reinforcement learning with quantum-inspired experience replay. IEEE Trans. Cybern. 1–13 (2021)
Google Scholar
Zha, D., Lai, K.H., Zhou, K., Hu, X.: Experience replay optimization. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 4243–4249 (2019)
Google Scholar
Zhang, S., Sutton, R.S.: A deeper look at experience replay. In: Deep Reinforcement Learning Symposium (NIPS) (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Pontifícia Universidade Católica de Minas Gerais, Av. Dom José Gaspar, 500 - Prédio 20, Belo Horizonte, MG, 30535-901, Brazil
Daniel Eugênio Neves, Lucila Ishitani & Zenilton Kleber Gonçalves do Patrocínio Júnior

Authors

Daniel Eugênio Neves
View author publications
You can also search for this author in PubMed Google Scholar
Lucila Ishitani
View author publications
You can also search for this author in PubMed Google Scholar
Zenilton Kleber Gonçalves do Patrocínio Júnior
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Eugênio Neves .

Editor information

Editors and Affiliations

Federal University of Rio Grande do Norte, Natal, Brazil
João Carlos Xavier-Junior
Federal University of Bahia, Salvador, Brazil
Ricardo Araújo Rios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neves, D.E., Ishitani, L., do Patrocínio Júnior, Z.K.G. (2022). When Less May Be More: Exploring Similarity to Improve Experience Replay. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-21689-3_8
Published: 19 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When Less May Be More: Exploring Similarity to Improve Experience Replay