ABSTRACT
For the UAV path planning problem in complex unknown environments, a dual delayed deep deterministic policy gradient algorithm based on composite experience replay is proposed. First, the TD3 algorithm and LSTM neural network are combined, and then the experience replay mechanism and reward function are improved to allow better obstacle avoidance in both dynamic and static environments. By building an environment for simulation experiments, the results show that the algorithm is more efficient and stable in obstacle avoidance compared with the original algorithm, and can help UAVs perform better path planning in unknown environments where multiple obstacles exist.
- YU H, LI G, ZHANG W, The unmanned aerial vehicle benchmark: object detection, tracking and base line [J]. International Journal of Computer Vision, 2020 (128):1141-1159.Google Scholar
- PEREIRA D S, MORAIS M R D, NASCIMENTO L B P, Zigbee protocol-based communication network for multi-unmanned aerial vehicle networks [J]. IEEE Access, 2020, 8:57762-57771.Google ScholarCross Ref
- BASILICO N, CARPIN S. Deploying teams of heterogeneous UAVs in cooperative two-level surveillance missions [C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany, 2015: 610-615.Google Scholar
- KOREN Y, BORENSTEIN J. Potential field methods and their inherent limitations for mobile robot navigation [C] // Proceedings of the IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 1991: 1398-1404.Google Scholar
- LOZANOPEREZ T, WESLEY M A. An algorithm for planning collision-free paths among polyhedral obstacles [J]. Communications of the Association for Computing Machinery, 1979, 22(10):560-570.Google ScholarDigital Library
- CASTILLO O, TRUJILLO L, MELIN P. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots [J]. Soft Computing, 2007, 11(3):269-279.Google ScholarDigital Library
- KIM I, SHIN S, WU J, Obstacle avoidance path planning for UVA using reinforcement learning under simulated environment[C]//IASER 3rd International Conference on Electronics, Electrical Engineering. Piscataway: IEEE, 2017: 34-36.Google Scholar
- WATKINS C, CHRISTOPHER J, DAYAN P. Q-learning [J]. Machine Learning, 1992, 8(3/4): 279-292.Google ScholarCross Ref
- HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths [J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2):100-107Google ScholarCross Ref
- FUJIMOTOS, HOOFH, MEGERD. Addressing function approximation error in actor-critic methods [J]. arXiv:1802.09477, 2018.Google Scholar
Index Terms
- UAV path planning based on improved TD3 algorithm
Recommendations
Sampling-Based Path Planning for UAV Collision Avoidance
The ability to avoid collisions with moving obstacles, such as commercial aircraft is critical to the safe operation of unmanned aerial vehicles (UAVs) and other air traffic. This paper presents the design and implementation of sampling-based path ...
Planetary Rover Path Planning Based on Improved A* Algorithm
Intelligent Robotics and ApplicationsAbstractDeveloping a space rover with ability to explore robustly and autonomously the unknown outer space landscape like Moon and Mars has always been a major challenge, since the first roving remote-controlled robot, Lunokhod 1, landed on the moon. Path ...
Path Planning of Mobile Robot Based on Improved TD3 Algorithm
2022 IEEE International Conference on Mechatronics and Automation (ICMA)An improved TD3 algorithm is studied for the low success rate and slow learning speed of TD3(Twin Delayed Deep Deterministic Policy Gradients) algorithm in mobile robot path planning. Prioritized experience replay is added and dynamic delay update ...
Comments