Skip to main content

Hindsight-Combined and Hindsight-Prioritized Experience Replay

  • Conference paper
  • First Online:
  • 2493 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Abstract

Reinforcement learning has proved to be of great utility; execution, however, may be costly due to sampling inefficiency. An efficient method for training is experience replay, which recalls past experiences. Several experience replay techniques, namely, combined experience replay, hindsight experience replay, and prioritized experience replay, have been crafted while their relative merits are unclear. In the study, one proposes hybrid algorithms – hindsight-combined and hindsight-prioritized experience replay – and evaluates their performance against published baselines. Experimental results demonstrate the superior performance of hindsight-combined experience replay on an OpenAI Gym benchmark. Further, insight into the nonconvergence of hindsight-prioritized experience replay is presented towards the improvement of the approach.

Supported by the Japan Society for the Promotion of Science through the Grants-in-Aid for Scientific Research Program (KAKENHI 18K19821).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This is found in: https://github.com/renzopereztan/HyER.

  2. 2.

    The link is: https://gym.openai.com/envs/LunarLander-v2.

References

  1. Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  2. Arel, I., Liu, C., Urbanik, T., Kohls, A.: Reinforcement learning-based multi-agent system for network traffic signal control. Institution of Engineering and Technology Intelligent Transport Systems (2010)

    Google Scholar 

  3. Dobrushin, R.: The definition of random variables by conditional distributions. Probab. Theory Appl. 15, 458–486 (1970)

    Article  MathSciNet  Google Scholar 

  4. Fan, Z., Su, R., Zhang, W., Yu, Y.: Hybrid actor-critic reinforcement learning in parametrized action space. In: Proceedings of the International Joint Conference on Artificial Intelligence (2019)

    Google Scholar 

  5. Grande, R., Walsh, T., How, J.: Sample efficient reinforcement learning with Gaussian processes. In: Proceedings of the International Conference on Machine Learning (2014)

    Google Scholar 

  6. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv Preprint arXiv:1710.02298 (2017)

  7. Kapoor, S.: Multi-agent reinforcement learning: a report on challenges and approaches. arXiv Preprint arXiv:1807.09427 (2018)

  8. Kober, J., Bagnell, A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)

    Article  Google Scholar 

  9. Kwiatkowski, R., Lipson, H.: Task-agnostic self-modeling machines. Sci. Robot. 4(26), 4 (2019)

    Article  Google Scholar 

  10. Lillicrap, T., et al.: Continuous control with deep reinforcement learning. arXiv Preprint arXiv:1509.02971 (2015)

  11. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning, and teaching. Mach. Learn. 8, 293–321 (1992). https://doi.org/10.1007/BF00992699

    Article  Google Scholar 

  12. Ma, C., Li, J., Bai, J., Wang, Y., Liu, B., Sun, J.: A hybrid deep reinforcement learning algorithm for intelligent manipulation. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds.) ICIRA 2019. LNCS (LNAI), vol. 11743, pp. 367–377. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27538-9_31

    Chapter  Google Scholar 

  13. Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the Fifteenth Association for Computing Machinery Workshop on Hot Topics in Networks (2016)

    Google Scholar 

  14. Marjaninejad, A., Urbina-Melendez, D., Cohn, B., Valero-Cuevas, F.: Autonomous functional movements in a tendon-driven limb via limited experience. Nat. Mach. Intell. 1, 144–154 (2019)

    Article  Google Scholar 

  15. Mirowski, P., et al.: Learning to navigate in complex environments. arXiv Preprint arXiv:1611.03673 (2016)

  16. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning (2016)

    Google Scholar 

  17. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  18. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv Preprint arXiv:1511.05952 (2015)

  19. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the International Conference on Machine Learning (2014)

    Google Scholar 

  20. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. Massachusetts Institute of Technology Press, Cambridge (1998)

    MATH  Google Scholar 

  21. Tesauro, G., Jong, N., Das, R., Bennani, M.: A hybrid reinforcement learning approach to autonomic resource allocation. In: Proceedings of the International Conference on Autonomic Computing (2006)

    Google Scholar 

  22. Wang, Z., Qiu, X., Wang, T.: A hybrid reinforcement learning algorithm for policy-based autonomic management. In: Proceedings of the International Conference on Services Systems and Services Management (2012)

    Google Scholar 

  23. Zhang, S., Sutton, R.: A deeper look at experience replay. arXiv Preprint arXiv:1712.01275 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renzo Roel P. Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, R.R.P., Ikeda, K., Vergara, J.P.C. (2020). Hindsight-Combined and Hindsight-Prioritized Experience Replay. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics