Abstract
Collaborative and autonomous robots are increasingly important in meeting the demands of a faster and more cost-effective market. To ensure production efficiency and safety, robots must swiftly respond to the presence of human operators or other dynamic obstacles, avoiding potential collisions by quickly planning alternative paths. Deep Reinforcement Learning (DRL) based methods have shown great potential in path planning due to their rapid response capabilities. However, existing DRL-based planners lack a safety verification system to evaluate the feasibility of actions generated by neural models, and they cannot guarantee 100% collision-free paths. This paper presents an enhanced DRL-based path planning system incorporating a robust safety verification mechanism. This system predicts potential collisions and generates alternative collision-free paths as necessary. We analyzed the essential elements of trajectory planning using the DRL method and proposed improvements to accelerate planning speed. The results demonstrate that our planner consistently generates paths for typical reaching tasks with an average planning time of 12.1 ms, a notable improvement over traditional algorithms. Moreover, the paths produced by our method are nearly optimal, akin to those generated by Optimization-based algorithms.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Data Availability
Data will be made available upon reasonable request.
References
Zabalza, J., Fei, Z., Wong, C., Yan, Y., Mineo, C., Yang, E., Rodden, T., Mehnen, J., Pham, Q., Ren, J.: Smart sensing and adaptive reasoning for enabling industrial robots with interactive human-robot capabilities in dynamic environments: a case study. Sensors 19(6), 1354 (2019)
Nicola, G., Ghidoni, S.: Deep Reinforcement Learning for Motion Planning in Human Robot cooperative Scenarios. in 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA ). IEEE, 1–7 (2021).
Li, S., Han, K., Li, X., Zhang, S., Xiong, Y., Xie, Z.: Hybrid trajectory replanning-based dynamic obstacle avoidance for physical human-robot interaction. J. Intell. Rob. Syst. 103(3), 1–14 (2021)
LaValle, S.: Rapidly-exploring random trees: a new tool for path planning. Res. Rep. 9811 (1998).
Long, H., Li, G., Zhou, F., Chen, T.: Cooperative dynamic motion planning for dual manipulator arms based on RRT*Smart-AD algorithm. Sensors 23(18), 7759 (2023)
Yuan, C., Shuai, C., Zhang, W.: A dynamic multiple-query RRT planning algorithm for manipulator obstacle avoidance. Appl. Sci. Basel 13(6), 3394 (2023)
Yu, Y., Zhang, Y.: Collision avoidance and path planning for industrial manipulator using slice-based heuristic fast marching tree. Robot. Comput.-Integr. Manuf 75, 102289 (2022)
Merckaert, K., Convens, B., Nicotra, M., Vanderborght, B.: Real-time constraint-based planning and control of robotic manipulators for safe human-robot collaboration. Robot. Comput.-Integr. Manuf 87, 102711 (2024)
Wei, S., Liu, B., Yao, M., Yu, X., Tang, L.: Efficient online motion planning method for the robotic arm to pick-up moving objects smoothly with temporal constraints. Proc. Inst. Mech. Eng 236(15), 8650–8662 (2022)
Dam, T., Chalvatzaki, G., Peters, J., Pajarinen, J.: Monte-Carlo robot path planning. IEEE Robot. Autom. Lett 7(4), 11213–11220 (2022)
Cao, X., Zou, X., Jia, C., Chen, M., Zeng, Z.: RRT-based path planning for an intelligent litchi-picking manipulator. Comput. Electron. Agric. 156, 105–118 (2019)
Yuan, C., Liu, G., Zhang, W., Pan, X.: An efficient RRT cache method in dynamic environments for path planning. Robot. Auton. Syst. 131, 103595 (2020)
Zhang, H., Wang, Y., Zheng, J., Yu, J.: Path planning of industrial robot based on improved RRT algorithm in complex environments. IEEE Access 6, 53296–53306 (2018)
Ichter, B., Harrison, J., Pavone, M.: Learning sampling distributions for robot motion planning. in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 7087–7094 (2018).
Wang, J., Chi, W., Li, C., Wang, C., Meng, M.: Neural RRT*: learning-based optimal path planning. IEEE Trans. Autom. Sci. Eng. 17(4), 1748–1758 (2020)
Ma, N., Wang, J., Liu, J., Meng, M.: Conditional generative adversarial networks for optimal path planning. IEEE Trans. Cogn. Dev. Syst. 14(2), 662–671 (2022)
Wang, Y., Wei, L., Du, K., Liu, G., Yang, Q., Wei, Y., Fang, Q.: An online collision-free trajectory generation algorithm for human-robot collaboration. Robot. Comput.-Integr. Manuf 80, 102475 (2023)
Power, T., Berenson, D.: Learning a generalizable trajectory sampling distribution for model predictive control. IEEE Trans. Rob. 40, 2111–2127 (2024)
Lee, C., Song, K.: Path re-planning design of a cobot in a dynamic environment based on current obstacle configuration. Robot. Autom. Lett. 8(3), 1183–1190 (2023)
Jiang, L., Liu, S., Cui, Y., Jiang, H.: Path planning for robotic manipulator in complex multi-obstacle environment based on improved_RRT. IEEE/ASME Trans. Mechatron. 27(6), 4774–4785 (2022)
Ratliff, N., Zucker, M., Bagnell, J., Srinivasa, S.: CHOMP: Gradient optimization techniques for efficient motion planning. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, 489–494 (2009).
Kalakrishnan, M., Chitta, S., Theodorou, E., Pastor, P., Schaal, S.: STOMP: Stochastic trajectory optimization for motion planning. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4569–4574 (2011).
Park, C., Pan, J., Manocha, D.: ITOMP: Incremental Trajectory Optimization for Real-time Replanning in Dynamic Environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 22, 207–215 (2012).
Finean, M., Petrovic, L., Merkt, W., Markovic, I., Havoutis, I.: Motion planning in dynamic environments using context-aware human trajectory prediction. Robot. Auton. Syst. 166, 104450 (2023)
Dong, J., Mukadam, M., Dellaert, F., Boots, B.: Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs. In: Robotics: Science and Systems (RSS). 12(4), (2016).
Finean, M., Merkt, W., Havoutis, I.: Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3710–3717 (2021)..
Kuntz, A., Bowen, C., Alterovitz, R.: Fast Anytime Motion Planning in Point Clouds by Interleaving Sampling and Interior Point Optimization. In: Springer International Conference on Intelligent Robots and Systems (IROS). Springer, 929–945 (2020).
Alwala, K., Mukadam, M.: Joint Sampling and Trajectory Optimization over Graphs for Online Motion Planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4700–4707 (2021).
Watkins, C. J. C. H.: Learning from delayed rewards. PhD Thesis, King's College, University of Cambridge (1989)
Salmaninejad, M., Zilles, S., Mayorga, R.: Motion Path Planning of Two Robot Arms in a Common Workspace. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 45–51 (2020).
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 (2013).
Petrenko, V., Tebueva, F., Ryabtsev, S., Gurchinsky, M.: Method of Controlling the Movement of an Anthropomorphic Manipulator in the Working Area With Dynamic Obstacle. In: 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS). IEEE, 359–364 (2020).
Alam, M. S., Sudha, S. K. R., Somayajula, A.: AI on the Water: Applying DRL to Autonomous Vessel Navigation. arXiv preprint arXiv:2310.14938 (2023).
Regunathan, R.D., Sudha, S.K.R., Alam, M.S., Somayajula, A.: Deep Reinforcement Learning Based Controller for Ship Navigation. Ocean Eng. 273, 113937 (2023)
Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015).
Li, Z., Ma, H., Ding, Y., Wang, C., Jin, Y.: Motion planning of six-dof arm robot based on improved DDPG algorithm. In: 2020 39th Chinese Control Conference (CCC). IEEE, 3954–3959 (2020).
Lindner, T., Milecki, A.: Reinforcement learning-based algorithm to avoid obstacles by the anthropomorphic robotic arm. Appl. Sci. 12, 6629 (2022)
Zeng, R., Liu, M., Zhang, J., Li, X., Zhou, Q., Jiang, Y.: Manipulator Control Method Based on Deep Reinforcement Learning. In: 2020 Chinese Control And Decision Conference (CCDC). IEEE, 415–420 (2020).
Um, D., Nethala, P., Shin, H.: Hierarchical DDPG for manipulator motion planning in dynamic environments. AI 3(3), 645–658 (2022)
Jose, J., Alam, M. S., Somayajula, A. S.: Navigating the Ocean with DRL: Path following for marine vessels. arXiv preprint arXiv:2310.14932 (2023).
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv:1802.09477 (2018).
Wang, S., Yi, W., He, Z., Xu, J., Yang, L.: Safe reinforcement learning-based trajectory planning for industrial robot. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 3471–3476 (2020).
Huang, Z., Chen, G., Shen, Y., Wang, R., Liu, C., Zhang, L.: An obstacle-avoidance motion planning method for redundant space robot via reinforcement learning. Actuators 12(2), 69 (2023)
Chen, P., Pei, J., Lu, W., Li, M.: A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance. Neurocomputing 497, 64–75 (2022)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized Experience Replay. CoRR abs/1511.05952 (2015).
Andrychowicz, M., Crow, D., Ray, A., Schneider, J., Fong, R., Welinder, P., ..., Zaremba, W.: Hindsight Experience Replay. ArXiv abs/1707.01495 (2017).
Feng, X.: Consistent experience replay in high-dimensional continuous control with decayed hindsights. Machines 10, 856 (2022)
Kim, S., An, B.: Learning Heuristic A: Efficient Graph Search using Neural Network. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9542–9547 (2020).
Prianto, E., Park, J.H., Bae, J.H., Kim, J.S.: Deep reinforcement learning-based path planning for multi-arm manipulators with periodically moving obstacles. Applied Sciences-Basel 11(6), 2587 (2021)
Ren, Z., Dong, K., Zhou, Y., Liu, Q., Peng, J.: Exploration via Hindsight Goal Generation. Adv. Neural Inf. Process Syst. 32 (2019).
Bing, Z., Brucker, M., Morin, F.O., Li, R., Su, X., Huang, K., Knoll, A.: Complex robotic manipulation via graph-based hindsight goal generation. IEEE Trans. Neural Netw. Learn. Syst 33(12), 7863–7876 (2021)
Bing, Z. S., Alvarez, E., Cheng, L., Morin, F. O., Li, R., Su, X. J., ..., Knoll, A.: Robotic Manipulation in Dynamic Scenarios via Bounding-Box-Based Hindsight Goal Generation. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 5037–5050 (2023).
Althoff, M., Dolan, J.M.: Online verification of automated road vehicles using reachability analysis. IEEE Trans. Rob. 30(4), 903–918 (2014)
Chan, C.C., Tsai, C.C.: Collision-free path planning based on new navigation function for an industrial robotic manipulator in human-robot coexistence environments. J. Chin. Inst. Eng. 43(6), 508–518 (2020)
Zhao, J. B., Zhao, Q., Wang, J. Z., Zhang, X., Wang, Y. L.: Path Planning and Evaluation for Obstacle Avoidance of Manipulator Based on Improved Artificial Potential Field and Danger Field. In: 33rd Chinese Control and Decision Conference (CCDC). IEEE, 3018–3025 (2021).
Tulbure, A., Khatib, O.: Closing the Loop: Real-Time Perception and Control for Robust Collision Avoidance with Occluded Obstacles. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5700–5707 (2020).
Zhao, M., Lv, X.Q.: Improved manipulator obstacle avoidance path planning based on potential field method. J. Robot. 2020, 1–12 (2020)
Zhang, H., Zhu, Y.F., Liu, X.F., Xu, X.R.: Analysis of obstacle avoidance strategy for dual-arm robot based on speed field with improved artificial potential field algorithm. Electronics 10(15), 1850 (2021)
Elahres, M., Fonte, A., Poisson, G.: Evaluation of an artificial potential field method in collision-free path planning for a robot manipulator. In: 2nd International Conference on Robotics, Computer Vision and Intelligent Systems (ROBOVIS). 92–102 (2021).
Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. In: Proceedings of the 1985 IEEE International Conference on Robotics and Automation. IEEE, 500–505 (1985).
Kavraki, L.E., Svestka, P., Latombe, J., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12(4), 566–580 (1996)
Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int.l J. Robot. Res. 30(7), 846–894 (2011)
Mukadam, M., Dong, J., Yan, X., Dellaert, F., Boots, B.: Continuous- time Gaussian process motion planning via probabilistic inference. Int. J. Robot. Res. 37(11), 1319–1340 (2018)
Thakar, S., Rajendran, P., Kim, H., Kabir, A. M., Gupta, S. K.: Accelerating bi-directional sampling-based search for motion planning of non-holonomic mobile manipulators. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 6711–6717 (2020).
Gammell, J. D., Srinivasa, S. S., Barfoot, T. D., Batch Informed Trees (BIT): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs. In: 2015 IEEE international conference on robotics and automation (ICRA), 3067–3074 (2015).
Schulman, J., Ho, J., Lee, A.X., Awwal, I., Bradlow, H., Abbeel, P.: Finding locally optimal, collision-free trajectories with sequential convex optimization. Robot Sci Syst IX 9(1), 1–10 (2013)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning (PMLR), 1861–1870 (2018).
Funding
This work was supported by The Ministry of Higher Education for the Fundamental Research Grant Scheme (FRGS/1/2022/TK10/UM/02/7) awarded to Ir. Dr. Hwa-Jen Yap (Universiti Malaya) and Application Innovation Project of Hebei Vocational University of Technology and Engineering (202205).
Author information
Authors and Affiliations
Contributions
Conceptualization: [JL], [HJY], [ASMK]; Methodology: [JL], [HJY], [ASMK]; Formal analysis and investigation: [JL]; Writing—original draft preparation: [JL]; Writing—review and editing: [HJY], [ASMK]; Funding acquisition, resources: [HJY], [JL]; Supervision: [HJY], [ASMK].
Corresponding author
Ethics declarations
Competing Interests
The authors have no competing interests to disclose about the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
1.1 Implementation Details
A. Sampling-Based Algorithms
In sampling-based algorithms, the orientation of the end-effector remains constant throughout the planning process. The planning is performed by first sampling positions for the Tool Center Point (TCP) within the workspace. Then, inverse kinematics is applied to compute the joint angles for these positions. To avoid collisions, the ‘getContactPoints’ API in PyBullet is used for collision detection.
-
1)
Common Parameters for Sampling-Based Algorithms
-
2)
Max Samples: The algorithm can generate up to 1000 random samples to grow the tree.
-
3)
Step Length: The maximum step length for tree extension towards a sampled point is 0.1 m per iteration.
-
4)
Collision Checking: Performed every 0.02 m when connecting path points to ensure no collisions.
-
5)
Goal Threshold: The goal is considered reached if the tree reaches within 0.1 m of the goal position.
-
6)
Specific Algorithm Parameters
-
7)
Bias Goal RRT: A bias factor of 0.5 is used, meaning that after determining the nearest node to a random sample, an additional 50% of the maximum step length is taken towards the goal.
-
8)
RRT*: A search radius of 0.5 m is applied for local optimization to minimize path length.
-
9)
PRM: A sample size of 1000 is employed with a maximum step length of 0.1 m for smooth transitions, and 10 nearest neighbors are considered to improve connectivity.
-
10)
BIT*: Similar to PRM, a sample size of 1000 and a maximum step length of 0.1 m are used. Heuristic and cost calculations are also used to reduce path length.
B. Optimization-Based Algorithms
In optimization-based algorithms, the path planning problem is framed as an optimization task. The objective is to find a feasible solution by minimizing a cost function while meeting certain constraints. The general formulation for optimization-based motion planning can be expressed as:
1) Common Setting for Optimization-Based Algorithms
The objective function consists of multiple cost components, including path length, obstacle avoidance, smoothness, and roughness. The function \(\mathcal{F}\) is expressed as:
where: \({R}_{l}\) is the path length cost:
\({R}_{s}\) is the smoothness cost:
\({R}_{r}\) is the roughness cost [100]:
\({R}_{o}\) is the obstacle cost:
Here, \({S}_{t}\) represents the position of the TCP at the \(t\)-th discrete point, and \(T=10\) is the total number of discrete points. \({d}_{t,i}\) is the distance between the robot and the \(i\)-th closest obstacle at the \(t\)-th waypoint, as determined by the closest points, which are obtained by the API ‘getClosestPoints’. \(\alpha =100\) is a scaling factor for the obstacle cost. \(\beta =0.1\) meter is the threshold distance below which the obstacle cost starts to increase.
-
Inequality Constraints: The joint angles \({\theta }_{j}(t)\) must stay within allowable limits:
$${\theta }_{j}^{min}\le {\theta }_{j}(t)\le {\theta }_{j}^{max},j=1,\dots ,n$$(15)where \(n\) is the number of joints.
-
Equality constraints: Ensure that the initial and final positions are at the start and goal positions:
$$s\left({t}_{0}\right)={s}_{\text{start}},s({t}_{f})={s}_{\text{goal}}.$$(16) -
A straight-line trajectory is generated as an initial guess, and optimization refines the trajectory until either a convergence threshold of 1e − 4) is reached or the maximum of 100 iterations is completed.
-
Specific Algorithm Parameters:
-
TrajOpt: Gradients are computed by perturbing trajectory points by a small value (1e-5) and observing the changes in cost. The trajectory points are iteratively updated using a learning rate of 0.01 until convergence or the maximum number of iterations.
-
CHOMP: Like TrajOpt, CHOMP computes gradients through perturbation (epsilon = 1e-5) but updates points using Covariant Gradient Descent. A Hessian approximation is used to speed up the process.
-
STOMP: This algorithm uses a stochastic method, generating noisy trajectories with random perturbations (standard deviation of 0.01) and then updating points based on the cost of these trajectories, iterating until convergence or the iteration limit.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, J., Yap, H.J. & Khairuddin, A.S.M. Path Planning for the Robotic Manipulator in Dynamic Environments Based on a Deep Reinforcement Learning Method. J Intell Robot Syst 111, 3 (2025). https://doi.org/10.1007/s10846-024-02205-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-024-02205-0