Abstract
Model-free reinforcement learning algorithms have been successfully applied to continuous control tasks. However, these algorithms suffer from severe instability and high sample complexity. Inspired by Averaged-DQN, this paper proposes a recency-weighted target estimator for actor-critic settings, which will construct a target estimator with more weight placed on recently learned value functions, obtaining a more stable and accurate value estimator. Besides, delaying policy updates with more flexible control is adopted to reduce per-update error because of value function errors. Furthermore, to improve the performance of prioritized experience replay (PER) for continuous control tasks, Phased-PER is proposed to accelerate training in different periods. Experimental results are given to demonstrate that using the same hyper-parameters and architecture the proposed algorithm is more robust and achieves better performance, surpassing the existing methods on a range of continuous control benchmark tasks.
This work is in part supported by the Natural Science Foundation of China (61876119), the Natural Science Foundation of Jiangsu (BK20181432) and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems (NIPS), pp. 5048–5058 (2017)
Anschel, O., Baram, N., Shimkin, N.: Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 176–185 (2017)
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (ICML), pp. 1587–1596 (2018)
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Hasselt, H.V.: Double Q-learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2613–2621 (2010)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) (2018)
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Liu, Q., et al.: A survey on deep reinforcement learning. Chin. J. Comput. 41(1), 1–27 (2018)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 1054–1062 (2016)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Shi, W., Song, S., Wu, H., Hsu, Y.C., Wu, C., Huang, G.: Regularized Anderson acceleration for off-policy deep reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 10231–10241 (2019)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026–5033. IEEE (2012)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, University of Cambridge (1989). https://ci.nii.ac.jp/naid/10000072699/en/
Zhang, Z., Pan, Z., Kochenderfer, M.J.: Weighted double Q-learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3455–3461 (2017)
Zheng, Y., Hao, J., Zhang, Z., Meng, Z., Hao, X.: Efficient multiagent policy optimization based on weighted estimators in stochastic cooperative environments. J. Comput. Sci. Technol. 35, 268–280 (2020). https://doi.org/10.1007/s11390-020-9967-6
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Z., Zhang, Z., Zhang, X. (2020). Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)