Abstract
Hindsight Experience Replay (HER) in reinforcement learning is used to train an agent by substituting the real goal with hindsight goals (virtual goals). This technique improves the data efficiency and speeds up the learning process. To efficiently choose a hindsight goal, previous research suggested an Energy-Based Prioritization (EBP) method. However, for complex robotic tasks which RL agent interacts with objects in the environment, the objects’ information such as location and velocity are needed in EBP. This is not feasible for real world application. In this paper, we propose a Trajectory Behaviour Prioritization (TBP) method to remove the need for an additional environment feedback while maintaining a competitive learning performance. We define a trajectory behaviour weight function to consider good behaviours in one trajectory. We evaluate our TBP approach on two challenging robotic manipulation tasks in simulation, The results show that our approaches preform well deposit of having no information related to objects. This work serves as a step towards accelerating the training of reinforcement learning for complex real world robotics tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
15 September 2022
The original version of this chapter was inadvertently published with incorrect authors’ last names in Ref. [9], which have now been corrected from “Galloupédec, Q., Cazin, N., Dellandrpéa, E., Chen, L” to “Gallouédec, Q., Cazin, N., Dellandréa, E., Chen, L”. The chapter has been updated with the changes.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press Cambridge (1998)
Wiering, M., Van Otterlo, M. (eds.): Reinforcement learning. In: Adaptation, Learning, and Optimization, vol. 12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press Cambridge (2016)
Arulkumaran, K., Cully, A., Togelius, J.: AlphaStar: an evolutionary computation perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 314–315 (2019)
Wang, X., et al.: SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II. In: International Conference on Machine Learning, PMLR 2021, pp. 10905–10915 (2021)
Andrychowicz, F. et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, p. 50485058 (2017)
Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, PMLR 2018, pp. 113–122 (2018)
Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 3rd IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)
Gallouédec, Q., Cazin, N., Dellandréa, E., Chen, L.: Multi-goal reinforcement learning environments for simulated Franka Emika Panda robot. arXiv arXiv:2106.13687 [cs.LG] (2021)
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293321 (1992)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. arXiv preprint arXiv:1704.03003 (2017)
Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., Fergus, R.: Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407 (2017)
Srivastava, R.K., Steunebrink, B.R., Schmidhuber, J.: First experiments with powerplay. Neural Netw. 41, 130–136 (2013)
Schmidhuber, J.: Optimal ordered problem solver. Mach. Learn. 54(3), 211–254 (2004)
Florensa, C., Held, D., Wulfmeier, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017)
Thrun, S.B.: Efficient exploration in reinforcement learning (1992)
Puterman, M.L.: Markov decision processes. In: Handbooks in Operations Research and Management Science, vol. 2, pp. 331–434 (1990)
Garcia, F., Rachelson, E.: Markov decision processes. In: Markov Decision Processing Artificial Intelligence, pp. 1–38 (2013)
Bennett, C.C., Hauser, K.: Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intel. Med. 57(1), 9–19 (2013)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: AAAI/IAAI 1998, pp. 761–768 (1998)
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Zhao, D., Wang, H., Shao, K., Zhu, Y.: Deep reinforcement learning with experience replay based on SARSA. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Liu, Y., Bing, Z., Seyler, J., Eivazi, S. (2022). A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay. In: Kim, J., et al. Robot Intelligence Technology and Applications 6. RiTA 2021. Lecture Notes in Networks and Systems, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-030-97672-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-97672-9_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97671-2
Online ISBN: 978-3-030-97672-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)