An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments

Zheng, Li; Wang, YaHao; Yang, Run; Wu, Shaolei; Guo, Rui; Dong, Erbao

doi:10.1007/s10846-023-01822-5

An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments

Short Paper
Published: 27 March 2023

Volume 107, article number 50, (2023)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Li Zheng¹,
YaHao Wang¹,
Run Yang¹,
Shaolei Wu²,
Rui Guo³ &
…
Erbao Dong ORCID: orcid.org/0000-0002-4062-9730¹

505 Accesses
2 Citations
Explore all metrics

Abstract

Recently, deep reinforcement learning (DRL)-based trajectory planning methods have been designed for manipulator trajectory planning, given their potential in solving the problem of multidimensional spatial trajectory planning. However, many DRL models that have been proposed for manipulators working in dynamic environments face difficulties in obtaining the optimal strategy, thereby preventing them from reaching convergence because of massive ineffective exploration and sparse rewards. In this paper, we solve the inefficient convergence problem at the two levels of the action selection strategy and reward functions. First, this paper designs a dynamic action selection strategy that has a high probability of providing positive samples in the pre-training period by using a variable guide item and effectively reduces invalid exploration. Second, this study proposes a combinatorial reward function that combines the artificial potential field method with a time-energy function, thereby greatly improving the efficiency and stability of DRL-based methods for manipulators trajectory planning in dynamic working environments. Extensive experiments are conducted using the CoppeliaSim simulation model with a freely moving obstacle and the 6-DOF manipulator. The results show that the proposed dynamic action selection strategy and combinatorial reward function can improve the convergence rate on the DDPG, TD3, and SAC DRL algorithms by up to 3-5 times. Furthermore, the mean value of the reward function increases by up to 1.47-2.70 times, and the standard deviation decreases by 27.56% to 56.60%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Industrial Robotics

A robot grasping detection network based on flexible selection of multi-modal feature fusion structure

Article 13 April 2024

Modeling and Simulation of Dynamics in Soft Robotics: a Review of Numerical Approaches

Article Open access 19 August 2023

Availability of Code and Data

The codes and datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Brogåardh, T.: Present and future robot control development—an industrial perspective. Annu. Rev. Control. 31(1), 69–79 (2007)
Wonsick, M., Long, P., Önol, A.Ö., Wang, M., Padır, T.: A holistic approach to human-supervised humanoid robot operations in extreme environments. Front. Robot. and AI 8, 148 (2021)
Article Google Scholar
Gonçalves R.S., Carvalho, J.C.M.: Review and latest trends in mobile robots used on power transmission lines. Int. J. Adv. Robot. Syst. 10(12), 408 (2013)
Article Google Scholar
Mgbemena, E.: Man-machine systems : a review of current trends and applications. FUPRE J. Sci Ind. Res. (FJSIR) 4(2), 91–117 (2020)
Google Scholar
Robla-Gomeź, S., Becerra, V.M., Llata, J.R., Gonzalez-Sarabia, E., Torre-Ferrero, C., Perez-Oria, J.: Working together : a review on safe human-robot collaboration in industrial environments. IEEE Access 5, 26754–26773 (2017)
Article Google Scholar
Ata, A.A.: Optimal trajectory planning of manipulators : a review. J. Eng. Sci. Technol. 2(1), 32–54 (2007)
Google Scholar
Wang, T., Wang, W., Wei, F.: An overview of control strategy and trajectory planning of visual servoing. In: Recent Featured Applications of Artificial Intelligence Methods. LSMS 2020 and ICSEE 2020 Workshops, pp. 358–370. Springer (2020)
Gasparetto, A., Boscariol, P., Lanzutti, A., Vidoni, R.: Path planning and trajectory planning algorithms: a general overview. Motion Oper. Plan. Robot. Syst. 3–27 (2015)
Guan, Y., Yokoi, K., Stasse, O., Kheddar, A.: On robotic trajectory planning using polynomial interpolations. In: 2005 IEEE International Conference on Robotics and Biomimetics-ROBIO, pp. 111–116. IEEE (2005)
Fang, S., Ma, X., Zhao, Y., Zhang, Q., Li, Y.: Trajectory planning for seven-dof robotic arm based on quintic polynormial. In: 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 2, pp. 198–201. IEEE (2019)
Wang, H., Wang, H., Huang, J., Zhao, B., Quan, L.: Smooth point-to-point trajectory planning for industrial robots with kinematical constraints based on high-order polynomial curve. Mech. Mach. Theory 139, 284–293 (2019)
Article Google Scholar
Guldner, J.R., Utkin, V.I., Hashimoto H.: Robot obstacle avoidance in n-dimensional space using planar harmonic artificial potential fields (1997)
Guernane, R., Belhocine, M.: A smoothing strategy for prm paths application to six-axes motoman sv3x manipulator. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4155–4160. IEEE (2005)
Kuwata, Y., Teo, J., Fiore, G., Karaman, S., Frazzoli, E., How, J. P.: Real-time motion planning with applications to autonomous urban driving. IEEE Trans. Control Syst. Technol. 17(5), 1105–1118 (2009)
Article Google Scholar
Sepehri, A., Moghaddam, A.M.: A motion planning algorithm for redundant manipulators using rapidly exploring randomized trees and artificial potential fields. IEEE Access 9, 26059–26070 (2021)
Article Google Scholar
Qureshi, A.H., Nakamura, Y., Yoshikawa, Y., Ishiguro, H.: Robot gains social intelligence through multimodal deep reinforcement learning. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp. 745–751. IEEE (2016)
Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5129–5136. IEEE (2018)
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning : Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)
Chen, X., Ghadirzadeh, A., Folkesson, J., Björkman, M., Jensfelt, P.: Deep reinforcement learning to acquire navigation skills for wheel-legged robots in complex environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3110–3116. IEEE (2018)
Zhao, T., Deng, M., Li, Z., Hu, Y.: Cooperative manipulation for a mobile dual-arm robot using sequences of dynamic movement primitives. IEEE Trans. Cogn. Dev. Syst. 12(1), 18–29 (2018)
Article Google Scholar
Rahatabad, F.N., Rangraz, P.: Combination of reinforcement learning and bee algorithm for controlling two-link arm with six muscle: simplified human arm model in the horizontal plane. Phys. Eng. Sci. Med. 43(1), 135–142 (2020)
Article Google Scholar
Liu, C., Gao, J., Bi, Y., Shi, X., Tian, D.: A multitasking-oriented robot arm motion planning scheme based on deep reinforcement learning and twin synchro-control. Sensors 20(12), 3515 (2020)
Article Google Scholar
Wu, Y.-H., Yu, Z.-C., Li, C.-Y., He, M.-J., Hua, B., Chen, Z.-M.: Reinforcement learning in dual-arm trajectory planning for a free-floating space robot. Aerosp. Sci. Technol. 98, 105657 (2020)
Article Google Scholar
Chen, S., Yan, D., Zhang, Y., Tan, Y., Wang, W.: Live working manipulator control model based on dppo-dqn combined algorithm. In: 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 1, pp. 2620–2624. IEEE (2019)
Rohmer, E., Singh, S.P., Freese, M.: V-rep : a versatile and scalable robot simulation framework. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1321–1326. IEEE (2013)
Freese, M., Singh, S., Ozaki, F., Matsuhira, N.: Virtual robot experimentation platform v-rep : a versatile 3d robot simulator. In: International Conference on Simulation, Modeling, and Programming for Autonomous Robots, pp. 51–62. Springer (2010)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Grant number:2018YFB1307400).

Funding

This work was supported by the National Key R&D Program of China (Grant numbers[2018YFB1307400]).

Author information

Authors and Affiliations

CAS Key Laboratory of Mechanical Behavior and Design of Materials, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, 96 Jinzhai Road, Hefei, 230026, Anhui Province, China
Li Zheng, YaHao Wang, Run Yang & Erbao Dong
State Grid Anhui Electric Power Company Electric Power Research Institute, Hefei, 230601, Anhui Province, China
Shaolei Wu
State Grid Intelligent Technology Co, Ltd, Jinan, 250001, Shandong Province, China
Rui Guo

Authors

Li Zheng
View author publications
You can also search for this author in PubMed Google Scholar
YaHao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Run Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shaolei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Erbao Dong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Conceptualization, methodology, software, data curation, writing-original draft preparation, investigation were performed by Li Zheng. Data curation was performed by YaHao Wang, Run Yang, Shaolei Wu, Rui Guo and Erbao Dong. Supervision, writing, reviewing and editing were performed by Erbao Dong. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Erbao Dong.

Ethics declarations

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for Publication

The authors affirm that human research participants provided informed consent for publication of the images in Fig. 1.

Competing interests

All authors disclosed no relevant financial or nonfinancial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zheng, L., Wang, Y., Yang, R. et al. An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments. J Intell Robot Syst 107, 50 (2023). https://doi.org/10.1007/s10846-023-01822-5

Download citation

Received: 02 March 2022
Accepted: 31 January 2023
Published: 27 March 2023
DOI: https://doi.org/10.1007/s10846-023-01822-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments

Abstract

Access this article

Similar content being viewed by others

Industrial Robotics

A robot grasping detection network based on flexible selection of multi-modal feature fusion structure

Modeling and Simulation of Dynamics in Soft Robotics: a Review of Numerical Approaches

Availability of Code and Data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for Publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments

Abstract

Access this article

Similar content being viewed by others

Industrial Robotics

A robot grasping detection network based on flexible selection of multi-modal feature fusion structure

Modeling and Simulation of Dynamics in Soft Robotics: a Review of Numerical Approaches

Availability of Code and Data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for Publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation