Abstract
Although deep reinforcement learning (DRL) algorithms with experience replay have been used to solve many sequential learning problems, applications of DRL in real-world robotics still face some serious challenges, such as the problem of smooth movement. A robot’s motion trajectory needs to be smoothly coded, with no sudden acceleration or jerk. In this paper, a novel hierarchical reinforcement learning control framework named the hierarchical dynamic movement primitive (HDMP) framework is proposed to achieve the smooth movement of robots. In contrast to traditional algorithms, the HDMP framework consists of two learning hierarchies: a lower-level controller learning hierarchy and an upper-level policy learning hierarchy. In the lower-level controller learning hierarchy, modified dynamic movement primitives (DMPs) are utilized to generate a smooth motion trajectory. In the upper-level policy learning hierarchy, an improved local proximal policy optimization (L-PPO) method is proposed to endow the robot with autonomous learning capabilities. The performance achieved with the HDMP algorithm has been evaluated in a classical reaching movement task based on a Sawyer robot. The experimental results demonstrate that the proposed HDMP algorithm can endow a robot with the ability to smoothly execute motor skills and learn autonomously.













Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4(1):237–285
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, Mcgrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. arXiv:1707.01495v3
Schulman J, Levine S, Moritz P, Jordan MI, Abbeel P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
Heess N, Dhruva TB, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami SMA (2017) Emergence of locomotion behaviours in rich environments. arXiv:1707.02286v2
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347v2
Rajeswaran A, Kumar V, Gupta A, Vezzani G, Schulman J, Todorov E, Levine S (2017) Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087
Li X, Wu L (2019) Impact motion control of a flexible dual-arm space robot for capturing a spinning object. Int J Adv Robot Syst 16(3):1–7
Bagheri M, Naseradinmousavi P, Krstić M. (2019) Feedback linearization based predictor for time delay control of a high-dof robot manipulator. Automatica 108:1–8
Duan J, Ou Y, Hu J, Wang Z, Jin S, Xu C (2017) Fast and stable learning of dynamical systems based on extreme learning machine. IEEE Trans Syst Man Cybern Syst 49:1–11
Liu Z, Wu J, Wang D (2019) An engineering-oriented motion accuracy fluctuation suppression method of a hybrid spray-painting robot considering dynamics. Mech Mach Theory 131:62–74
Xiong H, Ma T, Zhang L, Diao X (2020) Comparison of end-to-end and hybrid deep reinforcement learning strategies for controlling cable-driven parallel robots. Neurocomputing 377:73–84
Yu W, Turk G, Liu CK (2018) Learning symmetric and low-energy locomotion. ACM Trans Graph 37(4):1–12
Brito B, Everett M, How JP, Alonso-Mora J (2021) Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot Autom Lett 6(3):4616–4623
Liu Q, Liu Z, Xiong B, Xu W, Y. L. (2021) Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function. Adv Eng Inform 49(12):101360
Li B, Wu Y (2020) Path planning for uav ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074
Hu Y, Wu X, Geng P, Li Z (2018) Evolution strategies learning with variable impedance control for grasping under uncertainty. IEEE Trans Ind Electron 66(10):7788–7799
Ijspeert A (2002) Learning attractor landscapes for learning motor primitives. In: Advances in neural information processing systems, pp 1523–1530
Kober J, Oztop E, Peters J (2011) Reinforcement learning to adjust robot movements to new situations. In: IEEE/RSJ international joint conference on artificial intelligence, pp 2650–2655
Kober J, Mulling K, KroMer O, Lampert CH (2014) Movement templates for learning of hitting and batting. In: IEEE international conference on robotics and automation, pp. 853–858
Khansari-Zadeh SM, Billard A (2011) Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans Robot 27(5):943–957
Muelling K, Kober J, Peters J (2010) Learning table tennis with a mixture of motor primitives. In: IEEE international conference on humanoid robots, pp 411–416
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379
Kupcsik A, Deisenroth MP, Peters J, Loh AP, Vadakkepat P, Neumann G (2017) Model-based contextual policy search for data-efficient generalization of robot skills. Artif Intell 247:415–439
Rueckert E, Mundo J, Paraschos A, Peters J, Neumann G (2015) Extracting low-dimensional control variables for movement primitives. In: IEEE international conference on robotics & automation, pp 1511–1518
Li Z, Zhao T, Chen F, Hu C, Yingbai Su, Fukuda T (2017) Reinforcement learning of manipulation and grasping using dynamical movement primitives for a humanoid-like mobile manipulator. IEEE/ASME Trans Mech 23(1):121–131
Mulling K, Kober J, Peters J (2010) A biomimetic approach to robot table tennis. Adapt Behav 19(5):359–376
Lling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279
Kormushev P, Calinon S, Caldwell DG (2013) Reinforcement learning in robotics: applications and real-world challenges. Robot 2(3):122–148
Qureshi MS, Swarnkar P, Gupta S (2018) A supervisory on-line tuned fuzzy logic based sliding mode control for robotics: An application to surgical robots. Robot Auton Syst 109:68–85
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland A, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Ijspeert AJ, Nakanishi J, Schaal S (2001) Trajectory formation for imitation with nonlinear dynamical systems. In: IEEE international conference on intelligent robots and systems, pp 752–757
Swaminathan A, Joachims T (2015) The self-normalized estimator for counterfactual learning. In: Annual conference on neural information processing systems, pp 3231–3239
Hachiya H, Akiyama T, Sugiayma M, Peters J (2009) Adaptive importance sampling for value function approximation in off-policy reinforcement learning. Neural Netw 22(10):1399–1410
Ali W, Abdelkarim S, Zahran M, Zidan M, Sallab AE (2018) Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. arXiv: Computer Vision and Pattern Recognition
Hersch M, Guenter F, Calinon S, Billard AG (2006) Learning dynamical system modulation for constrained reaching tasks. In: 6th IEEE-RAS international conference on humanoid robots, pp 444–449
Argall BD, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61836003, and in part by the Natural Science Foundation for Universities of Jiangsu Province under Grant 20KJB520008, and in part by the Nantong Science and Technology Plan Project under Grant JC2020148.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix:
Appendix:
Rights and permissions
About this article
Cite this article
Yuan, Y., Yu, Z.L., Hua, L. et al. Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning. Appl Intell 53, 1417–1434 (2023). https://doi.org/10.1007/s10489-022-03219-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03219-7