A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay

Li, Chenxing; Liu, Yinlong; Bing, Zhenshan; Seyler, Jan; Eivazi, Shahram

doi:10.1007/978-3-030-97672-9_42

Chenxing Li¹⁶,
Yinlong Liu¹⁶,
Zhenshan Bing¹⁶,
Jan Seyler¹⁷ &
…
Shahram Eivazi^17,18

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 429))

Included in the following conference series:

International Conference on Robot Intelligence Technology and Applications

1325 Accesses

The original version of this chapter was revised: Authors’ last names in Ref. [9] have been corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-97672-9_54

Abstract

Hindsight Experience Replay (HER) in reinforcement learning is used to train an agent by substituting the real goal with hindsight goals (virtual goals). This technique improves the data efficiency and speeds up the learning process. To efficiently choose a hindsight goal, previous research suggested an Energy-Based Prioritization (EBP) method. However, for complex robotic tasks which RL agent interacts with objects in the environment, the objects’ information such as location and velocity are needed in EBP. This is not feasible for real world application. In this paper, we propose a Trajectory Behaviour Prioritization (TBP) method to remove the need for an additional environment feedback while maintaining a competitive learning performance. We define a trajectory behaviour weight function to consider good behaviours in one trajectory. We evaluate our TBP approach on two challenging robotic manipulation tasks in simulation, The results show that our approaches preform well deposit of having no information related to objects. This work serves as a step towards accelerating the training of reinforcement learning for complex real world robotics tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

15 September 2022
The original version of this chapter was inadvertently published with incorrect authors’ last names in Ref. [9], which have now been corrected from “Galloupédec, Q., Cazin, N., Dellandrpéa, E., Chen, L” to “Gallouédec, Q., Cazin, N., Dellandréa, E., Chen, L”. The chapter has been updated with the changes.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press Cambridge (1998)
Google Scholar
Wiering, M., Van Otterlo, M. (eds.): Reinforcement learning. In: Adaptation, Learning, and Optimization, vol. 12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press Cambridge (2016)
Google Scholar
Arulkumaran, K., Cully, A., Togelius, J.: AlphaStar: an evolutionary computation perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 314–315 (2019)
Google Scholar
Wang, X., et al.: SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II. In: International Conference on Machine Learning, PMLR 2021, pp. 10905–10915 (2021)
Google Scholar
Andrychowicz, F. et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, p. 50485058 (2017)
Google Scholar
Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, PMLR 2018, pp. 113–122 (2018)
Google Scholar
Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 3rd IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)
Google Scholar
Gallouédec, Q., Cazin, N., Dellandréa, E., Chen, L.: Multi-goal reinforcement learning environments for simulated Franka Emika Panda robot. arXiv arXiv:2106.13687 [cs.LG] (2021)
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293321 (1992)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)
Google Scholar
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)
Article Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Google Scholar
Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. arXiv preprint arXiv:1704.03003 (2017)
Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., Fergus, R.: Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407 (2017)
Srivastava, R.K., Steunebrink, B.R., Schmidhuber, J.: First experiments with powerplay. Neural Netw. 41, 130–136 (2013)
Article Google Scholar
Schmidhuber, J.: Optimal ordered problem solver. Mach. Learn. 54(3), 211–254 (2004)
Article Google Scholar
Florensa, C., Held, D., Wulfmeier, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017)
Thrun, S.B.: Efficient exploration in reinforcement learning (1992)
Google Scholar
Puterman, M.L.: Markov decision processes. In: Handbooks in Operations Research and Management Science, vol. 2, pp. 331–434 (1990)
Google Scholar
Garcia, F., Rachelson, E.: Markov decision processes. In: Markov Decision Processing Artificial Intelligence, pp. 1–38 (2013)
Google Scholar
Bennett, C.C., Hauser, K.: Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intel. Med. 57(1), 9–19 (2013)
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)
Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)
Article MathSciNet Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: AAAI/IAAI 1998, pp. 761–768 (1998)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Zhao, D., Wang, H., Shao, K., Zhu, Y.: Deep reinforcement learning with experience replay based on SARSA. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6. IEEE (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Chenxing Li, Yinlong Liu & Zhenshan Bing
Festo, Esslingen, Germany
Jan Seyler & Shahram Eivazi
University of Tübingen, Tübingen, Germany
Shahram Eivazi

Authors

Chenxing Li
View author publications
You can also search for this author in PubMed Google Scholar
Yinlong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenshan Bing
View author publications
You can also search for this author in PubMed Google Scholar
Jan Seyler
View author publications
You can also search for this author in PubMed Google Scholar
Shahram Eivazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shahram Eivazi .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jinwhan Kim
Mechanical Engineering, Stevens Institute of Technology, Hoboken, NJ, USA
Brendan Englot
Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Hae-Won Park
Aerospace Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Han-Lim Choi
Civil and Environmental Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Hyun Myung
School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jong-Hwan Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, C., Liu, Y., Bing, Z., Seyler, J., Eivazi, S. (2022). A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay. In: Kim, J., et al. Robot Intelligence Technology and Applications 6. RiTA 2021. Lecture Notes in Networks and Systems, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-030-97672-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-97672-9_42
Published: 01 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97671-2
Online ISBN: 978-3-030-97672-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay