Skip to main content

Advertisement

Log in

Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Although deep reinforcement learning (DRL) algorithms with experience replay have been used to solve many sequential learning problems, applications of DRL in real-world robotics still face some serious challenges, such as the problem of smooth movement. A robot’s motion trajectory needs to be smoothly coded, with no sudden acceleration or jerk. In this paper, a novel hierarchical reinforcement learning control framework named the hierarchical dynamic movement primitive (HDMP) framework is proposed to achieve the smooth movement of robots. In contrast to traditional algorithms, the HDMP framework consists of two learning hierarchies: a lower-level controller learning hierarchy and an upper-level policy learning hierarchy. In the lower-level controller learning hierarchy, modified dynamic movement primitives (DMPs) are utilized to generate a smooth motion trajectory. In the upper-level policy learning hierarchy, an improved local proximal policy optimization (L-PPO) method is proposed to endow the robot with autonomous learning capabilities. The performance achieved with the HDMP algorithm has been evaluated in a classical reaching movement task based on a Sawyer robot. The experimental results demonstrate that the proposed HDMP algorithm can endow a robot with the ability to smoothly execute motor skills and learn autonomously.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4(1):237–285

    Article  Google Scholar 

  2. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  3. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  4. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971

  5. Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, Mcgrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. arXiv:1707.01495v3

  6. Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897

  7. Heess N, Dhruva TB, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami SMA (2017) Emergence of locomotion behaviours in rich environments. arXiv:1707.02286v2

  8. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347v2

  9. Rajeswaran A, Kumar V, Gupta A, Vezzani G, Schulman J, Todorov E, Levine S (2017) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087

  10. Li X, Wu L (2019) Impact motion control of a flexible dual-arm space robot for capturing a spinning object. Int J Adv Robot Syst 16(3):1–7

    Article  Google Scholar 

  11. Bagheri M, Naseradinmousavi P, Krstić M. (2019) Feedback linearization based predictor for time delay control of a high-dof robot manipulator. Automatica 108:1–8

    Article  MathSciNet  Google Scholar 

  12. Duan J, Ou Y, Hu J, Wang Z, Jin S, Xu C (2017) Fast and stable learning of dynamical systems based on extreme learning machine. IEEE Trans Syst Man Cybern Syst 49:1–11

    Google Scholar 

  13. Liu Z, Wu J, Wang D (2019) An engineering-oriented motion accuracy fluctuation suppression method of a hybrid spray-painting robot considering dynamics. Mech Mach Theory 131:62–74

    Article  Google Scholar 

  14. Xiong H, Ma T, Zhang L, Diao X (2020) Comparison of end-to-end and hybrid deep reinforcement learning strategies for controlling cable-driven parallel robots. Neurocomputing 377:73–84

    Article  Google Scholar 

  15. Yu W, Turk G, Liu CK (2018) Learning symmetric and low-energy locomotion. ACM Trans Graph 37(4):1–12

    Article  Google Scholar 

  16. Brito B, Everett M, How JP, Alonso-Mora J (2021) Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot Autom Lett 6(3):4616–4623

    Article  Google Scholar 

  17. Liu Q, Liu Z, Xiong B, Xu W, Y. L. (2021) Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function. Adv Eng Inform 49(12):101360

    Article  Google Scholar 

  18. Li B, Wu Y (2020) Path planning for uav ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074

    Article  Google Scholar 

  19. Hu Y, Wu X, Geng P, Li Z (2018) Evolution strategies learning with variable impedance control for grasping under uncertainty. IEEE Trans Ind Electron 66(10):7788–7799

    Article  Google Scholar 

  20. Ijspeert A (2002) Learning attractor landscapes for learning motor primitives. In: Advances in neural information processing systems, pp 1523–1530

  21. Kober J, Oztop E, Peters J (2011) Reinforcement learning to adjust robot movements to new situations. In: IEEE/RSJ international joint conference on artificial intelligence, pp 2650–2655

  22. Kober J, Mulling K, KroMer O, Lampert CH (2014) Movement templates for learning of hitting and batting. In: IEEE international conference on robotics and automation, pp. 853–858

  23. Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans Robot 27(5):943–957

    Article  Google Scholar 

  24. Muelling K, Kober J, Peters J (2010) Learning table tennis with a mixture of motor primitives. In: IEEE international conference on humanoid robots, pp 411–416

  25. Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379

    Article  Google Scholar 

  26. Kupcsik A, Deisenroth MP, Peters J, Loh AP, Vadakkepat P, Neumann G (2017) Model-based contextual policy search for data-efficient generalization of robot skills. Artif Intell 247:415–439

    Article  MathSciNet  MATH  Google Scholar 

  27. Rueckert E, Mundo J, Paraschos A, Peters J, Neumann G (2015) Extracting low-dimensional control variables for movement primitives. In: IEEE international conference on robotics & automation, pp 1511–1518

  28. Li Z, Zhao T, Chen F, Hu C, Yingbai Su, Fukuda T (2017) Reinforcement learning of manipulation and grasping using dynamical movement primitives for a humanoid-like mobile manipulator. IEEE/ASME Trans Mech 23(1):121–131

    Article  Google Scholar 

  29. Mulling K, Kober J, Peters J (2010) A biomimetic approach to robot table tennis. Adapt Behav 19(5):359–376

    Article  Google Scholar 

  30. Lling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279

    Article  Google Scholar 

  31. Kormushev P, Calinon S, Caldwell DG (2013) Reinforcement learning in robotics: applications and real-world challenges. Robot 2(3):122–148

    Article  Google Scholar 

  32. Qureshi MS, Swarnkar P, Gupta S (2018) A supervisory on-line tuned fuzzy logic based sliding mode control for robotics: An application to surgical robots. Robot Auton Syst 109:68–85

    Article  Google Scholar 

  33. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602

  34. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland A, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  35. Ijspeert AJ, Nakanishi J, Schaal S (2001) Trajectory formation for imitation with nonlinear dynamical systems. In: IEEE international conference on intelligent robots and systems, pp 752–757

  36. Swaminathan A, Joachims T (2015) The self-normalized estimator for counterfactual learning. In: Annual conference on neural information processing systems, pp 3231–3239

  37. Hachiya H, Akiyama T, Sugiayma M, Peters J (2009) Adaptive importance sampling for value function approximation in off-policy reinforcement learning. Neural Netw 22(10):1399–1410

    Article  MATH  Google Scholar 

  38. Ali W, Abdelkarim S, Zahran M, Zidan M, Sallab AE (2018) Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. arXiv: Computer Vision and Pattern Recognition

  39. Hersch M, Guenter F, Calinon S, Billard AG (2006) Learning dynamical system modulation for constrained reaching tasks. In: 6th IEEE-RAS international conference on humanoid robots, pp 444–449

  40. Argall BD, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61836003, and in part by the Natural Science Foundation for Universities of Jiangsu Province under Grant 20KJB520008, and in part by the Nantong Science and Technology Plan Project under Grant JC2020148.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhu Liang Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Appendix:

Fig. 14
figure 14

The results for the joint forces during the process of reaching target point N

Fig. 15
figure 15

The results for the joint forces during the process of reaching target point P

Fig. 16
figure 16

The results for the joint forces during the process of reaching target point Q

Fig. 17
figure 17

The results for the angle, velocity, and acceleration of each joint during the process of reaching target point N

Fig. 18
figure 18

The results for the angle, velocity, and acceleration of each joint during the process of reaching target point P

Fig. 19
figure 19

The results for the angle, velocity, and acceleration of each joint during the process of reaching target point Q

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, Y., Yu, Z.L., Hua, L. et al. Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning. Appl Intell 53, 1417–1434 (2023). https://doi.org/10.1007/s10489-022-03219-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03219-7

Keywords