Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning

Yuan, Yinlong; Yu, Zhu Liang; Hua, Liang; Cheng, Yun; Li, Junhong; Sang, Xiaohu

doi:10.1007/s10489-022-03219-7

Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning

Published: 29 April 2022

Volume 53, pages 1417–1434, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yinlong Yuan¹,
Zhu Liang Yu²,
Liang Hua¹,
Yun Cheng¹,
Junhong Li¹ &
…
Xiaohu Sang¹

1166 Accesses
1 Altmetric
Explore all metrics

Abstract

Although deep reinforcement learning (DRL) algorithms with experience replay have been used to solve many sequential learning problems, applications of DRL in real-world robotics still face some serious challenges, such as the problem of smooth movement. A robot’s motion trajectory needs to be smoothly coded, with no sudden acceleration or jerk. In this paper, a novel hierarchical reinforcement learning control framework named the hierarchical dynamic movement primitive (HDMP) framework is proposed to achieve the smooth movement of robots. In contrast to traditional algorithms, the HDMP framework consists of two learning hierarchies: a lower-level controller learning hierarchy and an upper-level policy learning hierarchy. In the lower-level controller learning hierarchy, modified dynamic movement primitives (DMPs) are utilized to generate a smooth motion trajectory. In the upper-level policy learning hierarchy, an improved local proximal policy optimization (L-PPO) method is proposed to endow the robot with autonomous learning capabilities. The performance achieved with the HDMP algorithm has been evaluated in a classical reaching movement task based on a Sawyer robot. The experimental results demonstrate that the proposed HDMP algorithm can endow a robot with the ability to smoothly execute motor skills and learn autonomously.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Robot Motor Skill Acquisition Based on Bayesian Inspired Policy Improvement

Teaching Humanoid Robot Reaching Motion by Imitation and Reinforcement Learning

Multi-scale Data Reconstruction Based Policy Optimization Algorithm for Skill Learning

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4(1):237–285
Article Google Scholar
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, Mcgrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. arXiv:1707.01495v3
Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
Heess N, Dhruva TB, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami SMA (2017) Emergence of locomotion behaviours in rich environments. arXiv:1707.02286v2
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347v2
Rajeswaran A, Kumar V, Gupta A, Vezzani G, Schulman J, Todorov E, Levine S (2017) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087
Li X, Wu L (2019) Impact motion control of a flexible dual-arm space robot for capturing a spinning object. Int J Adv Robot Syst 16(3):1–7
Article Google Scholar
Bagheri M, Naseradinmousavi P, Krstić M. (2019) Feedback linearization based predictor for time delay control of a high-dof robot manipulator. Automatica 108:1–8
Article MathSciNet Google Scholar
Duan J, Ou Y, Hu J, Wang Z, Jin S, Xu C (2017) Fast and stable learning of dynamical systems based on extreme learning machine. IEEE Trans Syst Man Cybern Syst 49:1–11
Google Scholar
Liu Z, Wu J, Wang D (2019) An engineering-oriented motion accuracy fluctuation suppression method of a hybrid spray-painting robot considering dynamics. Mech Mach Theory 131:62–74
Article Google Scholar
Xiong H, Ma T, Zhang L, Diao X (2020) Comparison of end-to-end and hybrid deep reinforcement learning strategies for controlling cable-driven parallel robots. Neurocomputing 377:73–84
Article Google Scholar
Yu W, Turk G, Liu CK (2018) Learning symmetric and low-energy locomotion. ACM Trans Graph 37(4):1–12
Article Google Scholar
Brito B, Everett M, How JP, Alonso-Mora J (2021) Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot Autom Lett 6(3):4616–4623
Article Google Scholar
Liu Q, Liu Z, Xiong B, Xu W, Y. L. (2021) Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function. Adv Eng Inform 49(12):101360
Article Google Scholar
Li B, Wu Y (2020) Path planning for uav ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074
Article Google Scholar
Hu Y, Wu X, Geng P, Li Z (2018) Evolution strategies learning with variable impedance control for grasping under uncertainty. IEEE Trans Ind Electron 66(10):7788–7799
Article Google Scholar
Ijspeert A (2002) Learning attractor landscapes for learning motor primitives. In: Advances in neural information processing systems, pp 1523–1530
Kober J, Oztop E, Peters J (2011) Reinforcement learning to adjust robot movements to new situations. In: IEEE/RSJ international joint conference on artificial intelligence, pp 2650–2655
Kober J, Mulling K, KroMer O, Lampert CH (2014) Movement templates for learning of hitting and batting. In: IEEE international conference on robotics and automation, pp. 853–858
Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans Robot 27(5):943–957
Article Google Scholar
Muelling K, Kober J, Peters J (2010) Learning table tennis with a mixture of motor primitives. In: IEEE international conference on humanoid robots, pp 411–416
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379
Article Google Scholar
Kupcsik A, Deisenroth MP, Peters J, Loh AP, Vadakkepat P, Neumann G (2017) Model-based contextual policy search for data-efficient generalization of robot skills. Artif Intell 247:415–439
Article MathSciNet MATH Google Scholar
Rueckert E, Mundo J, Paraschos A, Peters J, Neumann G (2015) Extracting low-dimensional control variables for movement primitives. In: IEEE international conference on robotics & automation, pp 1511–1518
Li Z, Zhao T, Chen F, Hu C, Yingbai Su, Fukuda T (2017) Reinforcement learning of manipulation and grasping using dynamical movement primitives for a humanoid-like mobile manipulator. IEEE/ASME Trans Mech 23(1):121–131
Article Google Scholar
Mulling K, Kober J, Peters J (2010) A biomimetic approach to robot table tennis. Adapt Behav 19(5):359–376
Article Google Scholar
Lling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279
Article Google Scholar
Kormushev P, Calinon S, Caldwell DG (2013) Reinforcement learning in robotics: applications and real-world challenges. Robot 2(3):122–148
Article Google Scholar
Qureshi MS, Swarnkar P, Gupta S (2018) A supervisory on-line tuned fuzzy logic based sliding mode control for robotics: An application to surgical robots. Robot Auton Syst 109:68–85
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland A, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Ijspeert AJ, Nakanishi J, Schaal S (2001) Trajectory formation for imitation with nonlinear dynamical systems. In: IEEE international conference on intelligent robots and systems, pp 752–757
Swaminathan A, Joachims T (2015) The self-normalized estimator for counterfactual learning. In: Annual conference on neural information processing systems, pp 3231–3239
Hachiya H, Akiyama T, Sugiayma M, Peters J (2009) Adaptive importance sampling for value function approximation in off-policy reinforcement learning. Neural Netw 22(10):1399–1410
Article MATH Google Scholar
Ali W, Abdelkarim S, Zahran M, Zidan M, Sallab AE (2018) Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. arXiv: Computer Vision and Pattern Recognition
Hersch M, Guenter F, Calinon S, Billard AG (2006) Learning dynamical system modulation for constrained reaching tasks. In: 6th IEEE-RAS international conference on humanoid robots, pp 444–449
Argall BD, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61836003, and in part by the Natural Science Foundation for Universities of Jiangsu Province under Grant 20KJB520008, and in part by the Nantong Science and Technology Plan Project under Grant JC2020148.

Author information

Authors and Affiliations

College of Electrical Engineering, Nantong University, Seyuan road 9, City, 226019, Jiangsu, China
Yinlong Yuan, Liang Hua, Yun Cheng, Junhong Li & Xiaohu Sang
College of Automation Science and Engineering, South China University of Technology, Wushan road 381, Guangzhou, 510640, Guangdong, China
Zhu Liang Yu

Authors

Yinlong Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Zhu Liang Yu
View author publications
You can also search for this author inPubMed Google Scholar
Liang Hua
View author publications
You can also search for this author inPubMed Google Scholar
Yun Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Junhong Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiaohu Sang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhu Liang Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, Y., Yu, Z.L., Hua, L. et al. Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning. Appl Intell 53, 1417–1434 (2023). https://doi.org/10.1007/s10489-022-03219-7

Download citation

Accepted: 06 January 2022
Published: 29 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03219-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Robot Motor Skill Acquisition Based on Bayesian Inspired Policy Improvement

Teaching Humanoid Robot Reaching Motion by Imitation and Reinforcement Learning

Multi-scale Data Reconstruction Based Policy Optimization Algorithm for Skill Learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix:

Appendix:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now