Hindsight-Combined and Hindsight-Prioritized Experience Replay

Tan, Renzo Roel P.; Ikeda, Kazushi; Vergara, John Paul C.

doi:10.1007/978-3-030-63833-7_36

Hindsight-Combined and Hindsight-Prioritized Experience Replay

Conference paper
First Online: 20 November 2020

2493 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Abstract

Reinforcement learning has proved to be of great utility; execution, however, may be costly due to sampling inefficiency. An efficient method for training is experience replay, which recalls past experiences. Several experience replay techniques, namely, combined experience replay, hindsight experience replay, and prioritized experience replay, have been crafted while their relative merits are unclear. In the study, one proposes hybrid algorithms – hindsight-combined and hindsight-prioritized experience replay – and evaluates their performance against published baselines. Experimental results demonstrate the superior performance of hindsight-combined experience replay on an OpenAI Gym benchmark. Further, insight into the nonconvergence of hindsight-prioritized experience replay is presented towards the improvement of the approach.

Supported by the Japan Society for the Promotion of Science through the Grants-in-Aid for Scientific Research Program (KAKENHI 18K19821).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This is found in: https://github.com/renzopereztan/HyER.
2.
The link is: https://gym.openai.com/envs/LunarLander-v2.

References

Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Arel, I., Liu, C., Urbanik, T., Kohls, A.: Reinforcement learning-based multi-agent system for network traffic signal control. Institution of Engineering and Technology Intelligent Transport Systems (2010)
Google Scholar
Dobrushin, R.: The definition of random variables by conditional distributions. Probab. Theory Appl. 15, 458–486 (1970)
Article MathSciNet Google Scholar
Fan, Z., Su, R., Zhang, W., Yu, Y.: Hybrid actor-critic reinforcement learning in parametrized action space. In: Proceedings of the International Joint Conference on Artificial Intelligence (2019)
Google Scholar
Grande, R., Walsh, T., How, J.: Sample efficient reinforcement learning with Gaussian processes. In: Proceedings of the International Conference on Machine Learning (2014)
Google Scholar
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv Preprint arXiv:1710.02298 (2017)
Kapoor, S.: Multi-agent reinforcement learning: a report on challenges and approaches. arXiv Preprint arXiv:1807.09427 (2018)
Kober, J., Bagnell, A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238–1274 (2013)
Article Google Scholar
Kwiatkowski, R., Lipson, H.: Task-agnostic self-modeling machines. Sci. Robot. 4(26), 4 (2019)
Article Google Scholar
Lillicrap, T., et al.: Continuous control with deep reinforcement learning. arXiv Preprint arXiv:1509.02971 (2015)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning, and teaching. Mach. Learn. 8, 293–321 (1992). https://doi.org/10.1007/BF00992699
Article Google Scholar
Ma, C., Li, J., Bai, J., Wang, Y., Liu, B., Sun, J.: A hybrid deep reinforcement learning algorithm for intelligent manipulation. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds.) ICIRA 2019. LNCS (LNAI), vol. 11743, pp. 367–377. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27538-9_31
Chapter Google Scholar
Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the Fifteenth Association for Computing Machinery Workshop on Hot Topics in Networks (2016)
Google Scholar
Marjaninejad, A., Urbina-Melendez, D., Cohn, B., Valero-Cuevas, F.: Autonomous functional movements in a tendon-driven limb via limited experience. Nat. Mach. Intell. 1, 144–154 (2019)
Article Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. arXiv Preprint arXiv:1611.03673 (2016)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv Preprint arXiv:1511.05952 (2015)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the International Conference on Machine Learning (2014)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. Massachusetts Institute of Technology Press, Cambridge (1998)
MATH Google Scholar
Tesauro, G., Jong, N., Das, R., Bennani, M.: A hybrid reinforcement learning approach to autonomic resource allocation. In: Proceedings of the International Conference on Autonomic Computing (2006)
Google Scholar
Wang, Z., Qiu, X., Wang, T.: A hybrid reinforcement learning algorithm for policy-based autonomic management. In: Proceedings of the International Conference on Services Systems and Services Management (2012)
Google Scholar
Zhang, S., Sutton, R.: A deeper look at experience replay. arXiv Preprint arXiv:1712.01275 (2017)

Download references

Author information

Authors and Affiliations

Division of Information Science, Nara Institute of Science and Technology, Takayama Town, Ikoma, Nara, 6300192, Japan
Renzo Roel P. Tan & Kazushi Ikeda
School of Science and Engineering, Ateneo de Manila University, Katipunan Avenue, National Capital Region, 1108, Quezon City, Philippines
Renzo Roel P. Tan & John Paul C. Vergara

Authors

Renzo Roel P. Tan
View author publications
You can also search for this author in PubMed Google Scholar
Kazushi Ikeda
View author publications
You can also search for this author in PubMed Google Scholar
John Paul C. Vergara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renzo Roel P. Tan .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, R.R.P., Ikeda, K., Vergara, J.P.C. (2020). Hindsight-Combined and Hindsight-Prioritized Experience Replay. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_36
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics