Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning

Wu, Zhen; Zhang, Zongzhang; Zhang, Xiaofang

doi:10.1007/978-3-030-63833-7_51

Zhen Wu¹⁴,
Zongzhang Zhang¹⁵ &
Xiaofang Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2428 Accesses

Abstract

Model-free reinforcement learning algorithms have been successfully applied to continuous control tasks. However, these algorithms suffer from severe instability and high sample complexity. Inspired by Averaged-DQN, this paper proposes a recency-weighted target estimator for actor-critic settings, which will construct a target estimator with more weight placed on recently learned value functions, obtaining a more stable and accurate value estimator. Besides, delaying policy updates with more flexible control is adopted to reduce per-update error because of value function errors. Furthermore, to improve the performance of prioritized experience replay (PER) for continuous control tasks, Phased-PER is proposed to accelerate training in different periods. Experimental results are given to demonstrate that using the same hyper-parameters and architecture the proposed algorithm is more robust and achieves better performance, surpassing the existing methods on a range of continuous control benchmark tasks.

This work is in part supported by the Natural Science Foundation of China (61876119), the Natural Science Foundation of Jiangsu (BK20181432) and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems (NIPS), pp. 5048–5058 (2017)
Google Scholar
Anschel, O., Baram, N., Shimkin, N.: Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 176–185 (2017)
Google Scholar
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (ICML), pp. 1587–1596 (2018)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Hasselt, H.V.: Double Q-learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2613–2621 (2010)
Google Scholar
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) (2018)
Google Scholar
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Liu, Q., et al.: A survey on deep reinforcement learning. Chin. J. Comput. 41(1), 1–27 (2018)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 1054–1062 (2016)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Shi, W., Song, S., Wu, H., Hsu, Y.C., Wu, C., Huang, G.: Regularized Anderson acceleration for off-policy deep reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 10231–10241 (2019)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026–5033. IEEE (2012)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, University of Cambridge (1989). https://ci.nii.ac.jp/naid/10000072699/en/
Zhang, Z., Pan, Z., Kochenderfer, M.J.: Weighted double Q-learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3455–3461 (2017)
Google Scholar
Zheng, Y., Hao, J., Zhang, Z., Meng, Z., Hao, X.: Efficient multiagent policy optimization based on weighted estimators in stochastic cooperative environments. J. Comput. Sci. Technol. 35, 268–280 (2020). https://doi.org/10.1007/s11390-020-9967-6
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Zhen Wu & Xiaofang Zhang
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Zongzhang Zhang

Authors

Zhen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zongzhang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zongzhang Zhang .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Z., Zhang, Z., Zhang, X. (2020). Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_51
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics