Skip to main content

Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning

  • Conference paper
  • First Online:
Book cover Neural Information Processing (ICONIP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

  • 2428 Accesses

Abstract

Model-free reinforcement learning algorithms have been successfully applied to continuous control tasks. However, these algorithms suffer from severe instability and high sample complexity. Inspired by Averaged-DQN, this paper proposes a recency-weighted target estimator for actor-critic settings, which will construct a target estimator with more weight placed on recently learned value functions, obtaining a more stable and accurate value estimator. Besides, delaying policy updates with more flexible control is adopted to reduce per-update error because of value function errors. Furthermore, to improve the performance of prioritized experience replay (PER) for continuous control tasks, Phased-PER is proposed to accelerate training in different periods. Experimental results are given to demonstrate that using the same hyper-parameters and architecture the proposed algorithm is more robust and achieves better performance, surpassing the existing methods on a range of continuous control benchmark tasks.

This work is in part supported by the Natural Science Foundation of China (61876119), the Natural Science Foundation of Jiangsu (BK20181432) and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems (NIPS), pp. 5048–5058 (2017)

    Google Scholar 

  2. Anschel, O., Baram, N., Shimkin, N.: Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 176–185 (2017)

    Google Scholar 

  3. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (ICML), pp. 1587–1596 (2018)

    Google Scholar 

  5. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  6. Hasselt, H.V.: Double Q-learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2613–2621 (2010)

    Google Scholar 

  7. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) (2018)

    Google Scholar 

  8. Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)

  9. Liu, Q., et al.: A survey on deep reinforcement learning. Chin. J. Comput. 41(1), 1–27 (2018)

    Google Scholar 

  10. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  11. Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 1054–1062 (2016)

    Google Scholar 

  12. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  13. Shi, W., Song, S., Wu, H., Hsu, Y.C., Wu, C., Huang, G.: Regularized Anderson acceleration for off-policy deep reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 10231–10241 (2019)

    Google Scholar 

  14. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  15. Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026–5033. IEEE (2012)

    Google Scholar 

  16. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, University of Cambridge (1989). https://ci.nii.ac.jp/naid/10000072699/en/

  17. Zhang, Z., Pan, Z., Kochenderfer, M.J.: Weighted double Q-learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3455–3461 (2017)

    Google Scholar 

  18. Zheng, Y., Hao, J., Zhang, Z., Meng, Z., Hao, X.: Efficient multiagent policy optimization based on weighted estimators in stochastic cooperative environments. J. Comput. Sci. Technol. 35, 268–280 (2020). https://doi.org/10.1007/s11390-020-9967-6

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongzhang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Z., Zhang, Z., Zhang, X. (2020). Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics