A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

Xi, Bao; Wang, Rui; Cai, Ying-Hao; Lu, Tao; Wang, Shuo

doi:10.1007/s11633-021-1296-x

A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

Research Article
Published: 23 April 2021

Volume 18, pages 619–631, (2021)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Reinforcement learning (RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However, the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic (HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay (HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

A Guided Evaluation Method for Robot Dynamic Manipulation

Quantile Regression Hindsight Experience Replay

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI:https://doi.org/10.1038/nature16961.
Article Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI:https://doi.org/10.1038/nature14236.
Article Google Scholar
S. X. Gu, E. Holly, T. Lillicrap, S. Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 3389–3396, 2017. DOI:https://doi.org/10.1109/ICRA.2017.7989385.
Google Scholar
M. Y. Zhang, G. H. Tian, C. C. Li, J. Gong. Learning to transform service instructions into actions with reinforcement learning and knowledge base. International Journal of Automation and Computing, vol. 15, no. 5, pp. 582–592, 2018. DOI:https://doi.org/10.1007/s11633-018-1128-9.
Article Google Scholar
Z. Li, S. R. Xue, X. H. Yu, H. J. Gao. Controller optimization for multirate systems based on reinforcement learning. International Journal of Automation and Computing, vol. 17, no. 3, pp. 417–427, 2020. DOI:https://doi.org/10.1007/s11633-020-1229-0.
Article Google Scholar
Y. P. Luo, H. Z. Xu, Y. Z. Li, Y. D. Tian, T. Darrell, T. Y. Ma. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. In Proceedings of International Conference on Learning Representations, New Orleans, USA, 2019.
T. Kurutach, I. Clavera, Y. Duan, A Tamar, P. Abbeel. Model-ensemble trust-region policy optimization. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.
I. Clavera, J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, P. Abbeel. Model-based reinforcement learning via meta-policy optimization. In Proceedings of the 2nd Conference on Robot Learning, Zurich, Switzerland, pp.617–629, 2018.
Q. Xiao, Z. C. Cao, M. C. Zhou. Learning locomotion skills via model-based proximal meta-reinforcement learning. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, IEEE, Bari, Italy, pp. 1545–1550, 2019. DOI:https://doi.org/10.1109/SMC.2019.8914406.
Google Scholar
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
S. Fujimoto, H. Van Hoof, D. Meger. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1587–1596, 2018.
H. Van Hasselt, A. Guez, D. Silver. Deep reinforcement learning with double Q-Learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, pp. 2094–2100, 2016.
J. Wu, R. Wang, R. Y. Li, H. Zhang, X. H. Hu. Multi-critic DDPG method and double experience replay. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, IEEE, Miyazaki, Japan, pp. 165–171, 2018. DOI:https://doi.org/10.1109/SMC.2018.00039.
Google Scholar
Z. B. Zheng, C. Yuan, Z. H. Lin, Y. Y. Cheng, H. H. Wu. Self-adaptive double bootstrapped DDPG. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 3198–3204, 2018. DOI: 10.24963/ijcai.2018/444 doi.
B. Xi, R. Wang, S. Wang, T. Lu, Y. H. Cai. Conservative policy gradient in multi-critic setting. In Proceedings of Chinese Automation Congress, Hangzhou, China, pp. 1486–1489, 2019.
P. W. Chou, D. Maturana, S. Scherer. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the Beta distribution. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 834–843, 2017.
Y. H. Wu, E. Mansimov, S. Liao, R. Grosse, J. Ba. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5279–5288, 2017.
P. N. Ward, A. Smofsky, A. J. Bose. Improving exploration in soft-actor-critic with normalizing flows policies. [Online], Available: https://arxiv.org/abs/1906.02771, 2019.
J. Schulman, X. Chen, P. Abbeel. Equivalence between policy gradients and soft Q-Learning. [Online], Available: https://arxiv.org/abs/1704.06440, 2017.
T. Haarnoja, H. R. Tang, P. Abbeel, S. Levine. Reinforcement learning with deep energy-based policies. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 1352–1361, 2017.
T. Haarnoja, A. Zhou, P. Abbeel, S. Levine. Soft actor-critic: Off-Policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1856–1865, 2018.
E. Uchibe. Cooperative and competitive reinforcement and imitation learning for a mixture of heterogeneous learning modules. Frontiers in Neurorobotics, vol. 12, Article number 61, 2018. DOI:https://doi.org/10.3389/fnbot.2018.00061.
T. Schaul, J. Quan, I. Antonoglou, D. Silver. Prioritized experience replay. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
D. C. Zha, K. H. Lai, K. X. Zhou, X. Hu. Experience replay optimization. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, pp. 4243–4249, 2019.
C. Wang, K. Ross. Boosting soft actor-critic: Emphasizing recent experience without forgetting the past. [Online], Available: https://arxiv.org/abs/1906.04009, 2019.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, W. Zaremba. Hindsight experience replay. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5048–5058, 2017.
E. Todorov, T. Erez, Y. Tassa. MuJoCo: A physics engine for model-based control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura-Algarve, Portugal, pp. 5026–5033, 2012. DOI:https://doi.org/10.1109/IROS.2012.6386109.
Google Scholar
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba. OpenAI gym. [Online], Available: https://arxiv.org/abs/1606.01540, 2016.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2014.

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (NO. 2018AAA 0103003), National Natural Science Foundation of China (NO. 61773378), Basic Research Program (NO. JCKY *******B029), and Strategic Priority Research Program of Chinese Academy of Science (NO. XDB32050100).

Author information

Authors and Affiliations

State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Bao Xi, Rui Wang, Ying-Hao Cai, Tao Lu & Shuo Wang
University of Chinese Academy of Sciences, Beijing, 100049, China
Bao Xi & Shuo Wang
Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
Shuo Wang

Authors

Bao Xi
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Hao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Tao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuo Wang.

Additional information

Recommended by Associate Editor Wing Cheong Daniel Ho

Colored figures are available in the online version at https://link.springer.com/journal/11633

Bao Xi received the B. Sc. degree in automation, and the M. Eng. degree in control science and engineering from Xi’an Jiaotong University (XJTU), China in 2013 and 2016, respectively, and received the Ph. D. degree in control theory and control engineering at State Key Laboratory of Management and Control or Complex Systems, Institute of Automation, Chinese Academy of Sciences, and University of Chinese Academy of Sciences, China in 2021.

His research interests include robotics and automation.

E-mail: xi_bao@foxmail.com

ORCID iD: 0000-0003-1495-8802

Rui Wang received the B. Eng. degree in automation from Beijing Institute of Technology, China in 2013, and the Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2018. He is currently an assistant professor with State Key Laboratory of Management and Control for Complex Systems, CASIA.

His research interests include intelligent control, robotics, underwater robots, and biomimetic robots.

E-mail: rwang5212@ia.ac.cn

ORCID iD: 0000-0003-3172-3167

Ying-Hao Cai received the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, China in 2009. She was a postdoctoral research associate in Institute of Robotics and Intelligent Systems, University of Southern California, USA, and a senior research scientist in Machine Vision Group, University of Oulu, Finland. She is an associate professor in Institute of Automation, Chinese Academy of Sciences, China.

Her research interests include object detection and tracking and computer vision in robotics.

E-mail: yinghao.cai@ia.ac.cn

ORCID iD: 0000-0003-3024-2943

Tao Lu received the B. Eng. degree in control engineering from Shandong University, China in 2002, and the Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, China in 2007. He is currently an associate professor in State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, China.

His research interest is reinforcement learning in robot manipulation.

E-mail: tao.lu@ia.ac.cn

Shuo Wang received the B.Eng. degree in electrical engineering from Shenyang Architecture and Civil Engineering Institute, China in 1995, received the M.Eng. degree in industrial automation from the Northeastern University, China in 1998, and received the Ph. D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences, China in 2001. He is currently a professor in State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, the Center for Excellence in Brain Science and Intelligence Technology of Chinese Academy of Sciences, and University of Chinese Academy of Sciences, China.

His research interests include biomimetic robot, underwater robot, and multirobot systems.

E-mail: shuo.wang@ia.ac.cn (Corresponding author)

ORCID iD: 0000-0002-1390-9219

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xi, B., Wang, R., Cai, YH. et al. A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory. Int. J. Autom. Comput. 18, 619–631 (2021). https://doi.org/10.1007/s11633-021-1296-x

Download citation

Received: 31 August 2020
Accepted: 19 March 2021
Published: 23 April 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11633-021-1296-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Heterogeneous Actor-critic Algorithm with Recent Emphasizing Replay Memory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

A Guided Evaluation Method for Robot Dynamic Manipulation

Quantile Regression Hindsight Experience Replay

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now