Skip to main content

Hindsight Curriculum Generation Based Multi-Goal Experience Replay

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12890))

Included in the following conference series:

  • 2256 Accesses

Abstract

In multi-goal tasks, an agent learns to achieve diverse goals from past experiences. Hindsight Experience Replay (HER)—which replays experiences with pseudo goals—has shown the potential to learn from failed experiences. However, not all the pseudo goals are well-explored to provide reliable value estimates. In view of value estimation, the agent should learn from achievable goals towards desired goals distribution progressively. To tackle the problem, we propose to generate a hindsight curriculum, which maintains a sequence of balancing distributions of achieved goals to replay. Based on the hindsight curriculum, the agent evaluates hindsight experiences with a batch of similar well-explored experiences, and strikes a dynamic balance between function approximation and task solving. We implement Hindsight Curriculum Generation (HCG) with the vanilla Deep Deterministic Policy Gradient (DDPG), and experiments on several multi-goal tasks with sparse binary rewards demonstrate that HCG improves sample efficiency of the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016)

    Article  Google Scholar 

  2. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  3. Mnih, V., Kavukcuoglu, K., Silver, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  4. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016)

    Google Scholar 

  5. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)

    Google Scholar 

  6. Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  7. Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. CoRR abs/1802.09464 (2018)

    Google Scholar 

  8. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)

    Google Scholar 

  9. Popov, I., et al.: Data-efficient deep reinforcement learning for dexterous manipulation. CoRR abs/1704.03073 (2017)

    Google Scholar 

  10. Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  11. Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Proceedings of The 2nd Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 87, pp. 113–122. PMLR, 29–31 Oct 2018

    Google Scholar 

  12. Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  13. Zhao, R., Sun, X., Tresp, V.: Maximum entropy-regularized multi-goal reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning (2019)

    Google Scholar 

  14. Sutton, R.S., Barto., A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT press, Cambridge (2018)

    Google Scholar 

  15. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML, pp. 1312–1320 (2015)

    Google Scholar 

  16. Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)

    Google Scholar 

  17. Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep RL for model-based control. CoRR abs/1802.09081 (2018)

    Google Scholar 

  18. Rauber, P., Ummadisingu, A., Mutz, F., Schmidhuber, J.: Hindsight policy gradients. In: International Conference on Learning Representations (2019)

    Google Scholar 

  19. Li, A., Pinto, L., Abbeel, P.: Generalized hindsight for reinforcement learning. ArXiv:abs/2002.11708 (2020)

  20. Ren, Z., Dong, K., Zhou, Y., Liu, Q., Peng, J.: Exploration via hindsight goal generation. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  21. Fang, M., Zhou, C., Shi, B., Gong, B., Xu, J., Zhang, T.: DHER: Hindsight experience replay for dynamic goals. In: International Conference on Learning Representations (2019)

    Google Scholar 

  22. Riedmiller, M., et al.: Learning by playing - solving sparse reward tasks from scratch (2018)

    Google Scholar 

  23. Şimşek, O., Barto, A.G.: An intrinsic reward mechanism for efficient exploration. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 833–840. ICML 2006. ACM, New York (2006)

    Google Scholar 

  24. Cameron, J., Pierce, W.D.: Reinforcement, reward, and intrinsic motivation: a meta-analysis. Rev. Educ. Res. 64(3), 363–423 (1994)

    Article  Google Scholar 

  25. Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1515–1528. PMLR, Stockholmsmässan, 10–15 Jul 2018

    Google Scholar 

  26. Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: state-covering self-supervised reinforcement learning. CoRR abs/1903.03698 (2019)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61836011). We would like to thank all the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Feng, X. (2021). Hindsight Curriculum Generation Based Multi-Goal Experience Replay. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12890. Springer, Cham. https://doi.org/10.1007/978-3-030-87361-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87361-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87360-8

  • Online ISBN: 978-3-030-87361-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics