Skip to main content

A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay

  • Conference paper
  • First Online:
Robot Intelligence Technology and Applications 6 (RiTA 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 429))

  • 1325 Accesses

Abstract

Hindsight Experience Replay (HER) in reinforcement learning is used to train an agent by substituting the real goal with hindsight goals (virtual goals). This technique improves the data efficiency and speeds up the learning process. To efficiently choose a hindsight goal, previous research suggested an Energy-Based Prioritization (EBP) method. However, for complex robotic tasks which RL agent interacts with objects in the environment, the objects’ information such as location and velocity are needed in EBP. This is not feasible for real world application. In this paper, we propose a Trajectory Behaviour Prioritization (TBP) method to remove the need for an additional environment feedback while maintaining a competitive learning performance. We define a trajectory behaviour weight function to consider good behaviours in one trajectory. We evaluate our TBP approach on two challenging robotic manipulation tasks in simulation, The results show that our approaches preform well deposit of having no information related to objects. This work serves as a step towards accelerating the training of reinforcement learning for complex real world robotics tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 15 September 2022

    The original version of this chapter was inadvertently published with incorrect authors’ last names in Ref. [9], which have now been corrected from “Galloupédec, Q., Cazin, N., Dellandrpéa, E., Chen, L” to “Gallouédec, Q., Cazin, N., Dellandréa, E., Chen, L”. The chapter has been updated with the changes.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press Cambridge (1998)

    Google Scholar 

  2. Wiering, M., Van Otterlo, M. (eds.): Reinforcement learning. In: Adaptation, Learning, and Optimization, vol. 12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3

  3. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press Cambridge (2016)

    Google Scholar 

  4. Arulkumaran, K., Cully, A., Togelius, J.: AlphaStar: an evolutionary computation perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 314–315 (2019)

    Google Scholar 

  5. Wang, X., et al.: SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II. In: International Conference on Machine Learning, PMLR 2021, pp. 10905–10915 (2021)

    Google Scholar 

  6. Andrychowicz, F. et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, p. 50485058 (2017)

    Google Scholar 

  7. Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, PMLR 2018, pp. 113–122 (2018)

    Google Scholar 

  8. Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 3rd IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)

    Google Scholar 

  9. Gallouédec, Q., Cazin, N., Dellandréa, E., Chen, L.: Multi-goal reinforcement learning environments for simulated Franka Emika Panda robot. arXiv arXiv:2106.13687 [cs.LG] (2021)

  10. Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  11. Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293321 (1992)

    Google Scholar 

  12. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  13. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)

    Google Scholar 

  14. Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)

    Article  Google Scholar 

  15. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)

    Google Scholar 

  16. Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)

  17. Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. arXiv preprint arXiv:1704.03003 (2017)

  18. Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., Fergus, R.: Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407 (2017)

  19. Srivastava, R.K., Steunebrink, B.R., Schmidhuber, J.: First experiments with powerplay. Neural Netw. 41, 130–136 (2013)

    Article  Google Scholar 

  20. Schmidhuber, J.: Optimal ordered problem solver. Mach. Learn. 54(3), 211–254 (2004)

    Article  Google Scholar 

  21. Florensa, C., Held, D., Wulfmeier, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017)

  22. Thrun, S.B.: Efficient exploration in reinforcement learning (1992)

    Google Scholar 

  23. Puterman, M.L.: Markov decision processes. In: Handbooks in Operations Research and Management Science, vol. 2, pp. 331–434 (1990)

    Google Scholar 

  24. Garcia, F., Rachelson, E.: Markov decision processes. In: Markov Decision Processing Artificial Intelligence, pp. 1–38 (2013)

    Google Scholar 

  25. Bennett, C.C., Hauser, K.: Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intel. Med. 57(1), 9–19 (2013)

    Article  Google Scholar 

  26. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)

    Google Scholar 

  27. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)

    Article  MathSciNet  Google Scholar 

  28. Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: AAAI/IAAI 1998, pp. 761–768 (1998)

    Google Scholar 

  29. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  30. Zhao, D., Wang, H., Shao, K., Zhu, Y.: Deep reinforcement learning with experience replay based on SARSA. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6. IEEE (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shahram Eivazi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, C., Liu, Y., Bing, Z., Seyler, J., Eivazi, S. (2022). A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay. In: Kim, J., et al. Robot Intelligence Technology and Applications 6. RiTA 2021. Lecture Notes in Networks and Systems, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-030-97672-9_42

Download citation

Publish with us

Policies and ethics