Skip to main content

Advertisement

Log in

A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

  • Research Article
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Reinforcement learning (RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However, the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic (HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay (HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI:https://doi.org/10.1038/nature16961.

    Article  Google Scholar 

  2. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI:https://doi.org/10.1038/nature14236.

    Article  Google Scholar 

  3. S. X. Gu, E. Holly, T. Lillicrap, S. Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 3389–3396, 2017. DOI:https://doi.org/10.1109/ICRA.2017.7989385.

    Google Scholar 

  4. M. Y. Zhang, G. H. Tian, C. C. Li, J. Gong. Learning to transform service instructions into actions with reinforcement learning and knowledge base. International Journal of Automation and Computing, vol. 15, no. 5, pp. 582–592, 2018. DOI:https://doi.org/10.1007/s11633-018-1128-9.

    Article  Google Scholar 

  5. Z. Li, S. R. Xue, X. H. Yu, H. J. Gao. Controller optimization for multirate systems based on reinforcement learning. International Journal of Automation and Computing, vol. 17, no. 3, pp. 417–427, 2020. DOI:https://doi.org/10.1007/s11633-020-1229-0.

    Article  Google Scholar 

  6. Y. P. Luo, H. Z. Xu, Y. Z. Li, Y. D. Tian, T. Darrell, T. Y. Ma. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. In Proceedings of International Conference on Learning Representations, New Orleans, USA, 2019.

  7. T. Kurutach, I. Clavera, Y. Duan, A Tamar, P. Abbeel. Model-ensemble trust-region policy optimization. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.

  8. I. Clavera, J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, P. Abbeel. Model-based reinforcement learning via meta-policy optimization. In Proceedings of the 2nd Conference on Robot Learning, Zurich, Switzerland, pp.617–629, 2018.

  9. Q. Xiao, Z. C. Cao, M. C. Zhou. Learning locomotion skills via model-based proximal meta-reinforcement learning. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, IEEE, Bari, Italy, pp. 1545–1550, 2019. DOI:https://doi.org/10.1109/SMC.2019.8914406.

    Google Scholar 

  10. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.

  11. S. Fujimoto, H. Van Hoof, D. Meger. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596, 2018.

  12. H. Van Hasselt, A. Guez, D. Silver. Deep reinforcement learning with double Q-Learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, pp. 2094–2100, 2016.

  13. J. Wu, R. Wang, R. Y. Li, H. Zhang, X. H. Hu. Multi-critic DDPG method and double experience replay. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, IEEE, Miyazaki, Japan, pp. 165–171, 2018. DOI:https://doi.org/10.1109/SMC.2018.00039.

    Google Scholar 

  14. Z. B. Zheng, C. Yuan, Z. H. Lin, Y. Y. Cheng, H. H. Wu. Self-adaptive double bootstrapped DDPG. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 3198–3204, 2018. DOI: 10.24963/ijcai.2018/444 doi.

  15. B. Xi, R. Wang, S. Wang, T. Lu, Y. H. Cai. Conservative policy gradient in multi-critic setting. In Proceedings of Chinese Automation Congress, Hangzhou, China, pp. 1486–1489, 2019.

  16. P. W. Chou, D. Maturana, S. Scherer. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the Beta distribution. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 834–843, 2017.

  17. Y. H. Wu, E. Mansimov, S. Liao, R. Grosse, J. Ba. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5279–5288, 2017.

  18. P. N. Ward, A. Smofsky, A. J. Bose. Improving exploration in soft-actor-critic with normalizing flows policies. [Online], Available: https://arxiv.org/abs/1906.02771, 2019.

  19. J. Schulman, X. Chen, P. Abbeel. Equivalence between policy gradients and soft Q-Learning. [Online], Available: https://arxiv.org/abs/1704.06440, 2017.

  20. T. Haarnoja, H. R. Tang, P. Abbeel, S. Levine. Reinforcement learning with deep energy-based policies. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 1352–1361, 2017.

  21. T. Haarnoja, A. Zhou, P. Abbeel, S. Levine. Soft actor-critic: Off-Policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1856–1865, 2018.

  22. E. Uchibe. Cooperative and competitive reinforcement and imitation learning for a mixture of heterogeneous learning modules. Frontiers in Neurorobotics, vol. 12, Article number 61, 2018. DOI:https://doi.org/10.3389/fnbot.2018.00061.

  23. T. Schaul, J. Quan, I. Antonoglou, D. Silver. Prioritized experience replay. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.

  24. D. C. Zha, K. H. Lai, K. X. Zhou, X. Hu. Experience replay optimization. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, pp. 4243–4249, 2019.

  25. C. Wang, K. Ross. Boosting soft actor-critic: Emphasizing recent experience without forgetting the past. [Online], Available: https://arxiv.org/abs/1906.04009, 2019.

  26. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, W. Zaremba. Hindsight experience replay. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5048–5058, 2017.

  27. E. Todorov, T. Erez, Y. Tassa. MuJoCo: A physics engine for model-based control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura-Algarve, Portugal, pp. 5026–5033, 2012. DOI:https://doi.org/10.1109/IROS.2012.6386109.

    Google Scholar 

  28. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba. OpenAI gym. [Online], Available: https://arxiv.org/abs/1606.01540, 2016.

  29. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2014.

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (NO. 2018AAA 0103003), National Natural Science Foundation of China (NO. 61773378), Basic Research Program (NO. JCKY *******B029), and Strategic Priority Research Program of Chinese Academy of Science (NO. XDB32050100).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuo Wang.

Additional information

Recommended by Associate Editor Wing Cheong Daniel Ho

Colored figures are available in the online version at https://link.springer.com/journal/11633

Bao Xi received the B. Sc. degree in automation, and the M. Eng. degree in control science and engineering from Xi’an Jiaotong University (XJTU), China in 2013 and 2016, respectively, and received the Ph. D. degree in control theory and control engineering at State Key Laboratory of Management and Control or Complex Systems, Institute of Automation, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, China in 2021.

His research interests include robotics and automation.

E-mail: xi_bao@foxmail.com

ORCID iD: 0000-0003-1495-8802

Rui Wang received the B. Eng. degree in automation from Beijing Institute of Technology, China in 2013, and the Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2018. He is currently an assistant professor with State Key Laboratory of Management and Control for Complex Systems, CASIA.

His research interests include intelligent control, robotics, underwater robots, and biomimetic robots.

E-mail: rwang5212@ia.ac.cn

ORCID iD: 0000-0003-3172-3167

Ying-Hao Cai received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, China in 2009. She was a postdoctoral research associate in Institute of Robotics and Intelligent Systems, University of Southern California, USA, and a senior research scientist in Machine Vision Group, University of Oulu, Finland. She is an associate professor in Institute of Automation, Chinese Academy of Sciences, China.

Her research interests include object detection and tracking and computer vision in robotics.

E-mail: yinghao.cai@ia.ac.cn

ORCID iD: 0000-0003-3024-2943

Tao Lu received the B. Eng. degree in control engineering from Shandong University, China in 2002, and the Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, China in 2007. He is currently an associate professor in State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China.

His research interest is reinforcement learning in robot manipulation.

E-mail: tao.lu@ia.ac.cn

Shuo Wang received the B.Eng. degree in electrical engineering from Shenyang Architecture and Civil Engineering Institute, China in 1995, received the M.Eng. degree in industrial automation from the Northeastern University, China in 1998, and received the Ph. D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences, China in 2001. He is currently a professor in State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, the Center for Excellence in Brain Science and Intelligence Technology of Chinese Academy of Sciences, and University of Chinese Academy of Sciences, China.

His research interests include biomimetic robot, underwater robot, and multirobot systems.

E-mail: shuo.wang@ia.ac.cn (Corresponding author)

ORCID iD: 0000-0002-1390-9219

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xi, B., Wang, R., Cai, YH. et al. A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory. Int. J. Autom. Comput. 18, 619–631 (2021). https://doi.org/10.1007/s11633-021-1296-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-021-1296-x

Keywords