Hindsight Curriculum Generation Based Multi-Goal Experience Replay

Feng, Xiaoyun

doi:10.1007/978-3-030-87361-5_15

Xiaoyun Feng¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12890))

Included in the following conference series:

International Conference on Image and Graphics

2256 Accesses

Abstract

In multi-goal tasks, an agent learns to achieve diverse goals from past experiences. Hindsight Experience Replay (HER)—which replays experiences with pseudo goals—has shown the potential to learn from failed experiences. However, not all the pseudo goals are well-explored to provide reliable value estimates. In view of value estimation, the agent should learn from achievable goals towards desired goals distribution progressively. To tackle the problem, we propose to generate a hindsight curriculum, which maintains a sequence of balancing distributions of achieved goals to replay. Based on the hindsight curriculum, the agent evaluates hindsight experiences with a batch of similar well-explored experiences, and strikes a dynamic balance between function approximation and task solving. We implement Hindsight Curriculum Generation (HCG) with the vanilla Deep Deterministic Policy Gradient (DDPG), and experiments on several multi-goal tasks with sparse binary rewards demonstrate that HCG improves sample efficiency of the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)
Google Scholar
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. CoRR abs/1802.09464 (2018)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Google Scholar
Popov, I., et al.: Data-efficient deep reinforcement learning for dexterous manipulation. CoRR abs/1704.03073 (2017)
Google Scholar
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Proceedings of The 2nd Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 87, pp. 113–122. PMLR, 29–31 Oct 2018
Google Scholar
Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Zhao, R., Sun, X., Tresp, V.: Maximum entropy-regularized multi-goal reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning (2019)
Google Scholar
Sutton, R.S., Barto., A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT press, Cambridge (2018)
Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML, pp. 1312–1320 (2015)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep RL for model-based control. CoRR abs/1802.09081 (2018)
Google Scholar
Rauber, P., Ummadisingu, A., Mutz, F., Schmidhuber, J.: Hindsight policy gradients. In: International Conference on Learning Representations (2019)
Google Scholar
Li, A., Pinto, L., Abbeel, P.: Generalized hindsight for reinforcement learning. ArXiv:abs/2002.11708 (2020)
Ren, Z., Dong, K., Zhou, Y., Liu, Q., Peng, J.: Exploration via hindsight goal generation. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Fang, M., Zhou, C., Shi, B., Gong, B., Xu, J., Zhang, T.: DHER: Hindsight experience replay for dynamic goals. In: International Conference on Learning Representations (2019)
Google Scholar
Riedmiller, M., et al.: Learning by playing - solving sparse reward tasks from scratch (2018)
Google Scholar
Şimşek, O., Barto, A.G.: An intrinsic reward mechanism for efficient exploration. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 833–840. ICML 2006. ACM, New York (2006)
Google Scholar
Cameron, J., Pierce, W.D.: Reinforcement, reward, and intrinsic motivation: a meta-analysis. Rev. Educ. Res. 64(3), 363–423 (1994)
Article Google Scholar
Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1515–1528. PMLR, Stockholmsmässan, 10–15 Jul 2018
Google Scholar
Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: state-covering self-supervised reinforcement learning. CoRR abs/1903.03698 (2019)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61836011). We would like to thank all the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, Anhui, China
Xiaoyun Feng

Authors

Xiaoyun Feng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Tsinghua University, Beijing, China
Shi-Min Hu
Tampere University, Tampere, Finland
Moncef Gabbouj
Zhejiang University, Hangzhou, China
Kun Zhou
Technion – Israel Institute of Technology, Haifa, Israel
Michael Elad
Tsinghua University, Beijing, China
Kun Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, X. (2021). Hindsight Curriculum Generation Based Multi-Goal Experience Replay. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12890. Springer, Cham. https://doi.org/10.1007/978-3-030-87361-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-87361-5_15
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87360-8
Online ISBN: 978-3-030-87361-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics