Abstract
In multi-goal tasks, an agent learns to achieve diverse goals from past experiences. Hindsight Experience Replay (HER)—which replays experiences with pseudo goals—has shown the potential to learn from failed experiences. However, not all the pseudo goals are well-explored to provide reliable value estimates. In view of value estimation, the agent should learn from achievable goals towards desired goals distribution progressively. To tackle the problem, we propose to generate a hindsight curriculum, which maintains a sequence of balancing distributions of achieved goals to replay. Based on the hindsight curriculum, the agent evaluates hindsight experiences with a batch of similar well-explored experiences, and strikes a dynamic balance between function approximation and task solving. We implement Hindsight Curriculum Generation (HCG) with the vanilla Deep Deterministic Policy Gradient (DDPG), and experiments on several multi-goal tasks with sparse binary rewards demonstrate that HCG improves sample efficiency of the state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Mnih, V., Kavukcuoglu, K., Silver, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. CoRR abs/1802.09464 (2018)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Popov, I., et al.: Data-efficient deep reinforcement learning for dexterous manipulation. CoRR abs/1704.03073 (2017)
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Proceedings of The 2nd Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 87, pp. 113–122. PMLR, 29–31 Oct 2018
Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Zhao, R., Sun, X., Tresp, V.: Maximum entropy-regularized multi-goal reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning (2019)
Sutton, R.S., Barto., A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT press, Cambridge (2018)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML, pp. 1312–1320 (2015)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep RL for model-based control. CoRR abs/1802.09081 (2018)
Rauber, P., Ummadisingu, A., Mutz, F., Schmidhuber, J.: Hindsight policy gradients. In: International Conference on Learning Representations (2019)
Li, A., Pinto, L., Abbeel, P.: Generalized hindsight for reinforcement learning. ArXiv:abs/2002.11708 (2020)
Ren, Z., Dong, K., Zhou, Y., Liu, Q., Peng, J.: Exploration via hindsight goal generation. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Fang, M., Zhou, C., Shi, B., Gong, B., Xu, J., Zhang, T.: DHER: Hindsight experience replay for dynamic goals. In: International Conference on Learning Representations (2019)
Riedmiller, M., et al.: Learning by playing - solving sparse reward tasks from scratch (2018)
Şimşek, O., Barto, A.G.: An intrinsic reward mechanism for efficient exploration. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 833–840. ICML 2006. ACM, New York (2006)
Cameron, J., Pierce, W.D.: Reinforcement, reward, and intrinsic motivation: a meta-analysis. Rev. Educ. Res. 64(3), 363–423 (1994)
Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1515–1528. PMLR, Stockholmsmässan, 10–15 Jul 2018
Pong, V.H., Dalal, M., Lin, S., Nair, A., Bahl, S., Levine, S.: Skew-fit: state-covering self-supervised reinforcement learning. CoRR abs/1903.03698 (2019)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 61836011). We would like to thank all the anonymous reviewers for their insightful comments.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, X. (2021). Hindsight Curriculum Generation Based Multi-Goal Experience Replay. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12890. Springer, Cham. https://doi.org/10.1007/978-3-030-87361-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-87361-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87360-8
Online ISBN: 978-3-030-87361-5
eBook Packages: Computer ScienceComputer Science (R0)