A Guided Evaluation Method for Robot Dynamic Manipulation

Feng, Chuzhen; Lan, Xuguang; Wan, Lipeng; Liang, Zhuo; Wang, Haoyu

doi:10.1007/978-3-030-66645-3_14

Chuzhen Feng¹⁵,
Xuguang Lan¹⁵,
Lipeng Wan¹⁵,
Zhuo Liang¹⁶ &
…
Haoyu Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12595))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

1442 Accesses
1 Citations

Abstract

It is challenging for reinforcement learning (RL) to solve the dynamic goal tasks of robot in sparse reward setting. Dynamic Hindsight Experience Replay (DHER) is a method to solve such problems. However, the learned policy DHER is easy to degrade, and the success rate is low, especially in complex environment. In order to help agents learn purposefully in dynamic goal tasks, avoid blind exploration, and improve the stability and robustness of policy, we propose a guided evaluation method named GEDHER, which assists the agent to learn under the guidance of evaluated expert demonstrations based on the DHER. In addition, We add the Gaussian noise in action sampling to balance the exploration and exploitation, preventing from falling into local optimal policy. Experiment results show that our method outperforms original DHER method in terms of both stability and success rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Nair, A., et al.: Overcoming exploration in reinforcement learning with demonstrations. In: ICRA (2018)
Google Scholar
Vecerik, M., et al.: Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. arXiv:1707.08817 (2017)
Wang, Y., et al.: An experienced-based policy gradient method for smooth manipulation. In: IEEE-CYBER (2019)
Google Scholar
Fang, M., et al.: DHER: hindsight experience replay for dynamic goals. In: ICLR (2019)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforce-ment learning. Nature 518(7540), 529 (2015)
Article Google Scholar
This, Paul R. Markov decision processes. Comap, Incorporated, (1983) (MDP)
Google Scholar
Yang, G., et al.: Reinforcement learning form imperfect demonstrations. In: International Conference on Machine Learning, Stockholm, Sweden, PMLR, vol. 80 (2018)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML (2016)
Google Scholar
Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots (2007)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: “MuJoCo”: a physics engine for model-based control. In: The IEEE/RSJ International Conference on Intelligent Robots and Systems (2012)
Google Scholar
Popov, I., et al.: Data-efficient Deep Reinforcement Learning for Dexterous Manipulation. arXiv preprint arXiv:1704.03073 (2017)
Haarnoja, T., et al.: Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv preprint arXiv:1610.00633 (2016)
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
Google Scholar
Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Proceedings of the 8-th Conference on Intelligent Autonomous Systems, pp. 438–445
Google Scholar
Hester, T., et al.: Learning from Demonstrations for Real World Reinforcement Learning. arXiv preprint arxiv:1704.03732 (2017)
Xu, K., Liu, H., Shen, H., Yang, T.: Structure design and kinematic analysis of a partially-decoupled 3T1R parallel manipulator. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds.) ICIRA 2019. LNCS (LNAI), vol. 11742, pp. 415–424. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27535-8_37
Chapter Google Scholar
Heess, N., et al.: Learning continuous control policies by stochastic value gradients. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 2944–2952 (2015)
Google Scholar

Download references

Funding

This work was supported in part by Trico-Robot plan of NSFC under grant No.91748208, National Major Project under grant No. 2018ZX01028-101, Shaanxi Project under grant No.2018ZDCXLGY0607, NSFC No.61973246, and the program of the Ministry of Education.

Author information

Authors and Affiliations

School of Artificail Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, No. 28, Xianning West Road, Beilin District, Xi’an, Shannxi, China
Chuzhen Feng, Xuguang Lan, Lipeng Wan & Haoyu Wang
China Academy of Launch Vehicle Technology, Beijing, China
Zhuo Liang

Authors

Chuzhen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xuguang Lan
View author publications
You can also search for this author in PubMed Google Scholar
Lipeng Wan
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Liang
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xuguang Lan , Lipeng Wan , Zhuo Liang or Haoyu Wang .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Chee Seng Chan
Harbin Institute of Technology, Harbin, China
Hong Liu
Shanghai Jiao Tong University, Shanghai, China
Xiangyang Zhu
Monash University, Selangor, Malaysia
Chern Hong Lim
Tsinghua University, Beijing, China
Xinjun Liu
Shenyang Institute of Automation, Shenyang, China
Lianqing Liu
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Kam Meng Goh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, C., Lan, X., Wan, L., Liang, Z., Wang, H. (2020). A Guided Evaluation Method for Robot Dynamic Manipulation. In: Chan, C.S., et al. Intelligent Robotics and Applications. ICIRA 2020. Lecture Notes in Computer Science(), vol 12595. Springer, Cham. https://doi.org/10.1007/978-3-030-66645-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-66645-3_14
Published: 09 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66644-6
Online ISBN: 978-3-030-66645-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics