Skip to main content

When Less May Be More: Exploring Similarity to Improve Experience Replay

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

We propose the COMPact Experience Replay (COMPER) as a reinforcement learning method that seeks to reduce the required number of experiences to agent training regarding the total accumulated rewards in the long run. COMPER uses temporal difference learning with predicted target values for sets of similar transitions and a new experience replay approach based on two memories of transitions. We present an assessment of two possible neural network architectures for the target network with a complete analysis of the memories’ behavior, along with detailed results for 100,000 frames and about 25,000 iterations with a small experience memory on eight challenging 2600 Atari games on the Arcade Learning Environment (ALE). We also present results for a Deep Q-Network (DQN) agent with the same experimental protocol on the same set of games as a baseline. We demonstrate that COMPER can approximate a good policy from a small number of frame observations using a compact memory and learning the similar transitions’ sets dynamics using a recurrent neural network.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. It has also received partial funding from Brazilian Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and from Pontifical Catholic University of Minas Gerais (PUC Minas).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available at https://github.com/mgbellemare/Arcade-Learning-Environment.

  2. 2.

    https://github.com/DanielEugenioNeves/COMPER-RELEASE-RESULTS.

References

  1. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 449–458 (2017)

    Google Scholar 

  2. Fedus, W., et al.: Revisiting fundamentals of experience replay. In: Daumé, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 3061–3071 (2020)

    Google Scholar 

  3. Fortunato, M., et al.: Noisy networks for exploration. In: Proceedings of the International Conference on Representation Learning (2018)

    Google Scholar 

  4. van Hasselt, H.: Double Q-learning. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23 (NIPS), pp. 2613–2621 (2010)

    Google Scholar 

  5. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  6. Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable mdps. In: AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (2015)

    Google Scholar 

  7. Hessel, M., et al.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  8. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)

    Article  Google Scholar 

  9. Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. J. Artifi. Intell. Res. 61, 523–562 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  10. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937 (2016)

    Google Scholar 

  11. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  12. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: Proceedings of the International Conference on Representation Learning (2016)

    Google Scholar 

  13. Sutton, R.S.: Reinforcement learning architectures. In: Proceedings ISKIT 1992 International Symposium on Neural Information Processing (1992)

    Google Scholar 

  14. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1995–2003 (2016)

    Google Scholar 

  15. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)

    Article  MATH  Google Scholar 

  16. Wei, Q., Ma, H., Chen, C., Dong, D.: Deep reinforcement learning with quantum-inspired experience replay. IEEE Trans. Cybern. 1–13 (2021)

    Google Scholar 

  17. Zha, D., Lai, K.H., Zhou, K., Hu, X.: Experience replay optimization. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pp. 4243–4249 (2019)

    Google Scholar 

  18. Zhang, S., Sutton, R.S.: A deeper look at experience replay. In: Deep Reinforcement Learning Symposium (NIPS) (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Eugênio Neves .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Neves, D.E., Ishitani, L., do Patrocínio Júnior, Z.K.G. (2022). When Less May Be More: Exploring Similarity to Improve Experience Replay. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21689-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21688-6

  • Online ISBN: 978-3-031-21689-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics