Abstract
We propose the COMPact Experience Replay (COMPER) as a reinforcement learning method that seeks to reduce the required number of experiences to agent training regarding the total accumulated rewards in the long run. COMPER uses temporal difference learning with predicted target values for sets of similar transitions and a new experience replay approach based on two memories of transitions. We present an assessment of two possible neural network architectures for the target network with a complete analysis of the memories’ behavior, along with detailed results for 100,000 frames and about 25,000 iterations with a small experience memory on eight challenging 2600 Atari games on the Arcade Learning Environment (ALE). We also present results for a Deep Q-Network (DQN) agent with the same experimental protocol on the same set of games as a baseline. We demonstrate that COMPER can approximate a good policy from a small number of frame observations using a compact memory and learning the similar transitions’ sets dynamics using a recurrent neural network.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. It has also received partial funding from Brazilian Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and from Pontifical Catholic University of Minas Gerais (PUC Minas).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 449–458 (2017)
Fedus, W., et al.: Revisiting fundamentals of experience replay. In: Daumé, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 3061–3071 (2020)
Fortunato, M., et al.: Noisy networks for exploration. In: Proceedings of the International Conference on Representation Learning (2018)
van Hasselt, H.: Double Q-learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23 (NIPS), pp. 2613–2621 (2010)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (2015)
Hessel, M., et al.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. J. Artifi. Intell. Res. 61, 523–562 (2018)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: Proceedings of the International Conference on Representation Learning (2016)
Sutton, R.S.: Reinforcement learning architectures. In: Proceedings ISKIT 1992 International Symposium on Neural Information Processing (1992)
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1995–2003 (2016)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Wei, Q., Ma, H., Chen, C., Dong, D.: Deep reinforcement learning with quantum-inspired experience replay. IEEE Trans. Cybern. 1–13 (2021)
Zha, D., Lai, K.H., Zhou, K., Hu, X.: Experience replay optimization. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 4243–4249 (2019)
Zhang, S., Sutton, R.S.: A deeper look at experience replay. In: Deep Reinforcement Learning Symposium (NIPS) (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Neves, D.E., Ishitani, L., do Patrocínio Júnior, Z.K.G. (2022). When Less May Be More: Exploring Similarity to Improve Experience Replay. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-21689-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)