The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method
Introduction
Unmanned surface vehicle is a small surface ship with the ability of autonomous planning and navigation, which can accomplish missions such as the environmental perception and target detection under the autonomous mode or manual intervention. The unmanned aerial vehicle (UAV), unmanned ground vehicle (UGV), unmanned underwater vehicle (UUV) and USV are important parts of unmanned system, and their cooperative operations jointly construct holonomic unmanned marine system [1], [2]. Once equipped with multiple sensors, communication devices and advanced control devices, USVs will be flexible and intelligent to carry out different missions such as marine detection, water quality measurement and so on.
Different missions will require USVs to be deployed in various marine areas, especially in the harsh and dangerous marine environment for big ships. Thus there is a high demand for autonomous navigation and obstacle avoidance for USVs, which is to find an optimal or approximately optimal route from the starting point to target under certain constraints. This will ensure that USVs could navigate through all obstacles without collisions.
With the theoretical and technical achievements, especially in reinforcement learning and deep learning, the development of unmanned systems [3], [4], [5] has been dramatically promoted. Traditional navigation and path planning techniques include graphic method, dynamic window method [6], artificial potential field method and so on. There have been some heuristic path planning algorithms, which include genetic algorithms [7], [8] and swarm intelligence algorithms [9], [10], [11], [12]. Each kind of approaches has strengths and weaknesses. Traditional methods are easy to fall into traps in complex environments and have lower probability to reach destination with a reasonable route compared with heuristic techniques. Some heuristic methods are slow in speed and unable to detect and avoid obstacles in real time in some cases. Reinforcement learning algorithms are based on rewards and punishments mechanism to improve performances completing the missions. The exploration and greedy policy of reinforcement learning algorithms are especially suitable for path planning in sophisticated environment. The higher the random exploration probability is, the better the obtained navigation routes will be.
Inspired by the theoretical achievements of reinforcement learning and deep learning, ANOA method is proposed for the autonomous navigation of USVs. The main contributions of this paper can be summarized as follows:
- •
With tailored design of state and action spaces and a dueling deep Q-network, a deep reinforcement learning method ANOA is proposed for the autonomous navigation and obstacle avoidance of USVs, which has better performance than deep Q-network (DQN) and Deep Sarsa not only in static environment but also in dynamic environment. Furthermore, a real control model of USVs moving in surge, sway and yaw is integrated with proposed ANOA and a frequently used heuristic approach, Recast navigation. In dynamic environment, ANOA achieves a higher success rate than Recast navigation after ANOA is fully trained.
- •
A dueling deep Q-network as the deep learning module is proposed to sense the sea environment of USVs for informative feature learning. This dueling deep Q-Network is trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function to estimate future rewards, also known as Q values. The reinforcement learning part interacts with deep learning part by obtaining its Q value estimation to make decisions. The proposed deep learning part could work with a finite space of memory, but its performance is as good as a well optimized Q table does in traditional Q learning.
- •
For the autonomous navigation and obstacle avoidance of USVs, the pros and cons of different methods are discussed on a simulation platform constructed with open source tools. On the simulation platform, ANOA, DQN and Deep Sarsa are quantitatively evaluated with the reward, loss value and average Q-value not only in static environment but also in dynamic environment. Moreover, the comparison between ANOA and Recast Navigation are quantitatively evaluated with the success rate in dynamic environment.
The contribution on the sustainability and potential replication of the research could be expanded to all unmanned vehicles. That is to say, the proposed ANOA method and the simulation platform for USVs could also be used for UAVs, UGVs and UUVs conditionally.
Section snippets
Related work
In the changeable and complex marine areas, USVs have to explore the marine environment by self-navigation to approach destinations safely. To be intelligent and autonomous, USVs should be aware of surrounding obstacles with the help of different types of sensors. With real-time environmental awareness, USVs should be capable of making right decisions without human control or interventions aiming at predetermined destinations.
The path planning has been a popular research topic for years. On one
Problem formulation
For the autonomous navigation of USVs, the most important is to reach predetermined target without collision with obstacles in the environment. Success in the autonomous navigation relies heavily on efficient methods, which include following parts:
(1) Instant decision-making for movement strategies immediately after getting observations of different environments.
(2) Sequences of control actions for USVs without stop even in emergency.
(3) Each action of USVs complying with their designs and easy
The simulation platform
To carry out experiments of autonomous navigation and obstacle avoidance for USVs, a simulation platform is constructed as demonstrated in Fig. 4. The simulation platform is implemented with the Unity Machine Learning Agents Toolkit (ML-Agents) [43]. It is an open-source Unity plugin that enables games and simulations to serve as environments for training intelligent agents. Agents can be trained using reinforcement learning, imitation learning, neuroevolution, or other machine learning methods
Discussions
The autonomous navigation and obstacle avoidance for USVs is of scientific significance and practical value since USVs could get to marine areas dangerous for ships with sailors. With the help of different types of sensors, USVs should be aware of surrounding obstacles during the autonomous navigation. A qualified autonomous navigation algorithm should be able to ensure instant decision-making for movement strategies immediately after getting observation of environments, sequences of control
Conclusions
To enhance the intelligence of USVs in the sophisticated marine environment, the ANOA algorithm is proposed for real-time path planning with obstacle avoidance. On the constructed simulation platform, multiple experiments are carried out to verify the effectiveness and efficiency of ANOA algorithm. According to the experimental results, ANOA outperforms deep Q-network (DQN) and Deep Sarsa in the efficiency of exploration and the speed of convergence not only in static environment but also in
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This paper is supported by the National Natural Science Foundation of China (Grant No. 61625304) and by the State Key Program of National Nature Science Foundation of China (Grant No. 61936001).
References (45)
- et al.
Enhanced discrete particle swarm optimization path planning for uav vision-based surface inspection
Autom. Constr.
(2017) - et al.
Genetic algorithm based approach for autonomous mobile robot path planning
Procedia Comput. Sci.
(2018) - et al.
A survey of deep neural network architectures and their applications
Neurocomputing
(2017) - et al.
Framework for control and deep reinforcement learning in traffic
- et al.
Deep reinforcement learning with experience replay based on sarsa
Markov games as a framework for multi-agent reinforcement learning
Mach. Learn. Proc.
(1994)- et al.
Robust adaptive path following of underactuated ships
Automatica
(2004) - et al.
Autonomous navigation and sensorless obstacle avoidance for ugv with environment information from uav
- et al.
Decentralized planning and control for uav–ugv cooperative teams
Auton. Robots
(2018) - et al.
Distributed deep reinforcement learning: Learn how to play atari games in 21 minutes
Deep reinforcement learning framework for autonomous driving
Electron. Imaging
End-to-end deep reinforcement learning for lane keeping assist
An efficient backtracking-based approach to turn-constrained path planning for aerial mobile robots
Path-planning of automated guided vehicle based on improved dijkstra algorithm
Improved genetic algorithm route planning based on multiple constraints
A multi-uav minimum time search planner based on acor
Path planning of robot based on improved ant colony algorithm
Meas. Control Technol.
Application of improved particle swarm optimization algorithm in path planning of mobile robot
Fire Control Command Control
Path planning of mobile robot based on improved artificial potential field method
Appl. Mech. Mater.
Comparison of a and dynamic pathfinding algorithm with dynamic pathfinding algorithm for npc on car racing game
Path planning method for mobile robot based on improved genetic algorithm
Electron. World
Crowd intelligence in ai 2.0 era
Front. Inf. Technol. Electron. Eng.
Cited by (83)
Multi-strategy adaptable ant colony optimization algorithm and its application in robot path planning
2024, Knowledge-Based SystemsMulti-agent policy learning-based path planning for autonomous mobile robots
2024, Engineering Applications of Artificial IntelligenceSpatial memory-augmented visual navigation based on hierarchical deep reinforcement learning in unknown environments
2024, Knowledge-Based SystemsDeep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments
2024, Chinese Journal of Aeronautics