Skip to main content
Log in

An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Recently, deep reinforcement learning (DRL)-based trajectory planning methods have been designed for manipulator trajectory planning, given their potential in solving the problem of multidimensional spatial trajectory planning. However, many DRL models that have been proposed for manipulators working in dynamic environments face difficulties in obtaining the optimal strategy, thereby preventing them from reaching convergence because of massive ineffective exploration and sparse rewards. In this paper, we solve the inefficient convergence problem at the two levels of the action selection strategy and reward functions. First, this paper designs a dynamic action selection strategy that has a high probability of providing positive samples in the pre-training period by using a variable guide item and effectively reduces invalid exploration. Second, this study proposes a combinatorial reward function that combines the artificial potential field method with a time-energy function, thereby greatly improving the efficiency and stability of DRL-based methods for manipulators trajectory planning in dynamic working environments. Extensive experiments are conducted using the CoppeliaSim simulation model with a freely moving obstacle and the 6-DOF manipulator. The results show that the proposed dynamic action selection strategy and combinatorial reward function can improve the convergence rate on the DDPG, TD3, and SAC DRL algorithms by up to 3-5 times. Furthermore, the mean value of the reward function increases by up to 1.47-2.70 times, and the standard deviation decreases by 27.56% to 56.60%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of Code and Data

The codes and datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Brogåardh, T.: Present and future robot control development—an industrial perspective. Annu. Rev. Control. 31(1), 69–79 (2007)

  2. Wonsick, M., Long, P., Önol, A.Ö., Wang, M., Padır, T.: A holistic approach to human-supervised humanoid robot operations in extreme environments. Front. Robot. and AI 8, 148 (2021)

    Article  Google Scholar 

  3. Gonçalves R.S., Carvalho, J.C.M.: Review and latest trends in mobile robots used on power transmission lines. Int. J. Adv. Robot. Syst. 10(12), 408 (2013)

    Article  Google Scholar 

  4. Mgbemena, E.: Man-machine systems : a review of current trends and applications. FUPRE J. Sci Ind. Res. (FJSIR) 4(2), 91–117 (2020)

    Google Scholar 

  5. Robla-Gomeź, S., Becerra, V.M., Llata, J.R., Gonzalez-Sarabia, E., Torre-Ferrero, C., Perez-Oria, J.: Working together : a review on safe human-robot collaboration in industrial environments. IEEE Access 5, 26754–26773 (2017)

    Article  Google Scholar 

  6. Ata, A.A.: Optimal trajectory planning of manipulators : a review. J. Eng. Sci. Technol. 2(1), 32–54 (2007)

    Google Scholar 

  7. Wang, T., Wang, W., Wei, F.: An overview of control strategy and trajectory planning of visual servoing. In: Recent Featured Applications of Artificial Intelligence Methods. LSMS 2020 and ICSEE 2020 Workshops, pp. 358–370. Springer (2020)

  8. Gasparetto, A., Boscariol, P., Lanzutti, A., Vidoni, R.: Path planning and trajectory planning algorithms: a general overview. Motion Oper. Plan. Robot. Syst. 3–27 (2015)

  9. Guan, Y., Yokoi, K., Stasse, O., Kheddar, A.: On robotic trajectory planning using polynomial interpolations. In: 2005 IEEE International Conference on Robotics and Biomimetics-ROBIO, pp. 111–116. IEEE (2005)

  10. Fang, S., Ma, X., Zhao, Y., Zhang, Q., Li, Y.: Trajectory planning for seven-dof robotic arm based on quintic polynormial. In: 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 2, pp. 198–201. IEEE (2019)

  11. Wang, H., Wang, H., Huang, J., Zhao, B., Quan, L.: Smooth point-to-point trajectory planning for industrial robots with kinematical constraints based on high-order polynomial curve. Mech. Mach. Theory 139, 284–293 (2019)

    Article  Google Scholar 

  12. Guldner, J.R., Utkin, V.I., Hashimoto H.: Robot obstacle avoidance in n-dimensional space using planar harmonic artificial potential fields (1997)

  13. Guernane, R., Belhocine, M.: A smoothing strategy for prm paths application to six-axes motoman sv3x manipulator. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4155–4160. IEEE (2005)

  14. Kuwata, Y., Teo, J., Fiore, G., Karaman, S., Frazzoli, E., How, J. P.: Real-time motion planning with applications to autonomous urban driving. IEEE Trans. Control Syst. Technol. 17(5), 1105–1118 (2009)

    Article  Google Scholar 

  15. Sepehri, A., Moghaddam, A.M.: A motion planning algorithm for redundant manipulators using rapidly exploring randomized trees and artificial potential fields. IEEE Access 9, 26059–26070 (2021)

    Article  Google Scholar 

  16. Qureshi, A.H., Nakamura, Y., Yoshikawa, Y., Ishiguro, H.: Robot gains social intelligence through multimodal deep reinforcement learning. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp. 745–751. IEEE (2016)

  17. Kahn, G., Villaflor, A., Ding, B., Abbeel, P., Levine, S.: Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5129–5136. IEEE (2018)

  18. Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning : Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. IEEE (2017)

  19. Chen, X., Ghadirzadeh, A., Folkesson, J., Björkman, M., Jensfelt, P.: Deep reinforcement learning to acquire navigation skills for wheel-legged robots in complex environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3110–3116. IEEE (2018)

  20. Zhao, T., Deng, M., Li, Z., Hu, Y.: Cooperative manipulation for a mobile dual-arm robot using sequences of dynamic movement primitives. IEEE Trans. Cogn. Dev. Syst. 12(1), 18–29 (2018)

    Article  Google Scholar 

  21. Rahatabad, F.N., Rangraz, P.: Combination of reinforcement learning and bee algorithm for controlling two-link arm with six muscle: simplified human arm model in the horizontal plane. Phys. Eng. Sci. Med. 43(1), 135–142 (2020)

    Article  Google Scholar 

  22. Liu, C., Gao, J., Bi, Y., Shi, X., Tian, D.: A multitasking-oriented robot arm motion planning scheme based on deep reinforcement learning and twin synchro-control. Sensors 20(12), 3515 (2020)

    Article  Google Scholar 

  23. Wu, Y.-H., Yu, Z.-C., Li, C.-Y., He, M.-J., Hua, B., Chen, Z.-M.: Reinforcement learning in dual-arm trajectory planning for a free-floating space robot. Aerosp. Sci. Technol. 98, 105657 (2020)

    Article  Google Scholar 

  24. Chen, S., Yan, D., Zhang, Y., Tan, Y., Wang, W.: Live working manipulator control model based on dppo-dqn combined algorithm. In: 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 1, pp. 2620–2624. IEEE (2019)

  25. Rohmer, E., Singh, S.P., Freese, M.: V-rep : a versatile and scalable robot simulation framework. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1321–1326. IEEE (2013)

  26. Freese, M., Singh, S., Ozaki, F., Matsuhira, N.: Virtual robot experimentation platform v-rep : a versatile 3d robot simulator. In: International Conference on Simulation, Modeling, and Programming for Autonomous Robots, pp. 51–62. Springer (2010)

  27. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  28. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  29. Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  30. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)

  31. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Grant number:2018YFB1307400).

Funding

This work was supported by the National Key R&D Program of China (Grant numbers[2018YFB1307400]).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Conceptualization, methodology, software, data curation, writing-original draft preparation, investigation were performed by Li Zheng. Data curation was performed by YaHao Wang, Run Yang, Shaolei Wu, Rui Guo and Erbao Dong. Supervision, writing, reviewing and editing were performed by Erbao Dong. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Erbao Dong.

Ethics declarations

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for Publication

The authors affirm that human research participants provided informed consent for publication of the images in Fig. 1.

Competing interests

All authors disclosed no relevant financial or nonfinancial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, L., Wang, Y., Yang, R. et al. An Efficiently Convergent Deep Reinforcement Learning-Based Trajectory Planning Method for Manipulators in Dynamic Environments. J Intell Robot Syst 107, 50 (2023). https://doi.org/10.1007/s10846-023-01822-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-01822-5

Keywords

Navigation