Review ArticleVisual SLAM for underwater vehicles: A survey
Introduction
With the development of robot technology, Autonomous Underwater Vehicle (AUV) has become one of the important means of marine resources exploration and exploitation. Accurate positioning and navigation play a critical role in ensuring that underwater vehicles can move stably and complete the tasks successfully. Due to the rapid attenuation of radio signals, including GPS signals, it is difficult to use GPS for autonomous navigation of underwater vehicles in the water. Long Baseline (LBL), Short Baseline (SBL) and Ultra Short Baseline (USBL) [1] and other methods rely on nearby ships and other carriers to transmit signals to underwater vehicles, making them not suitable for long-distance operation. However, the technology of Simultaneous Localization and Mapping (SLAM) [2] can use sensors to collect different data for analysis and processing. Then, based on the results of data analysis and processing, SLAM can estimate the positions of vehicles, thus making the autonomous localization and navigation of the underwater vehicles possible [3]. As an automatic navigation method, SLAM has made great progress over recent years, with wide application in robot [4] and automatic driving [5]. Different from other methods that rely on external information, SLAM only needs its own sensors to obtain real-time information of the surrounding environment, and can create maps and locate vehicles without any prior input. Thus, vehicles can complete autonomous navigation and positioning in a real sense in any strange environments [3]. Underwater SLAM plays a more and more important role in underwater vehicles navigation.
According to the types of sensors, underwater SLAM can be divided into Light Detection and Ranging (LiDAR) SLAM, sonar SLAM, and visual SLAM. LiDAR and sonar equipment are too expensive, so they are not suitable for civil robots. LiDAR uses laser to analyze the contour and structure of targets. However, because of the existence of small particles, laser will produce absorption and scattering in water, which will affect the measurement results. Therefore, the working range of LiDAR in underwater environment is limited, and maps constructed by LiDAR lack semantic information. Sonar uses a transmitter to emit sound waves and a receiver to receive echo signals, then analyzes and processes the echo signals to describe the contour and structure of the target. Sound wave propagation under water is not affected by light, so sonar is a good choice for underwater SLAM [6]. However, acoustic waves are significantly affected by water flow, seismic activity, ship traffic, marine life and other factors. In addition, in some special cases, such as underwater caves and other closed space, sound waves will be rebounded many times, eventually causing interferences. All of these factors would bring big challenges to underwater positioning and mapping. In contrast, vision-based SLAM has become a hot research field over recent years due to its low cost and high portability, although it can be affected both by particles and light conditions under water. However, various underwater image enhancement algorithms [7] can relieve the difficulties with SLAM to some extent. A comparison of the three SLAM methods is presented in Table 1.
In order to position and map in unknown environment, sensors shall be used to obtain the key features of the environment, and estimate the current states of vehicles based on the information obtained as well as the previous states of vehicles. As the vehicles keep moving, estimation errors will inevitably appear. To calibrate the errors and ensure the long-term stable work of vehicles, loop closure detection is needed.
The process of visual SLAM can be simply divided into five sections: sensor data, front-end, back-end, loop closure detection, and mapping, as shown in Fig. 1. In visual SLAM, sensors mainly include cameras, as well as some internal sensors of vehicles, such as Inertial Measurement Unit (IMU), depth sensor and so on. The front-end is often called Visual Odometry (VO), which mainly provides optimized sensor data for the back-end system. The back-end optimizes and updates the state of vehicles based on the data from the front-end and loop closure detection, and then calculates the trajectory of vehicles and the map of their surrounding environment. Loop closure detection is used to decide whether a vehicle has reached the previous position and solve the problem of vehicles drift [8] over time.
For static, rigid and unobvious illumination transformation in the scenes without too much interference, the SLAM technology is quite mature [9]. However, different from the ground or indoor controllable environment, underwater environment is highly unstructured, with various kinds of noise interference, which brings multifarious difficulties and challenges to underwater visual SLAM. For instance, due to scattering and absorption, light attenuation exists in water, so the image contrast becomes low. Moreover, the attenuation degrees of different lights in water are different, and it is easier for the lights at higher frequencies to penetrate the particles in water, thus resulting in blue–green underwater images. The dissolved organic matters and suspended particles in water bring huge noise interference. Underwater scenes often show a single structure and lack rich features, which makes it difficult to conduct feature detection and matching. Therefore, underwater SLAM is often much more difficult to implement than on the ground. As shown in Fig. 2, unstructured underwater scenes often include underwater buses, caves, ships, rocks, seaweeds, corals, etc.
Underwater SLAM has attracted wide attention from researchers. For example, [3], [10] summarized the state optimization algorithms commonly used in underwater SLAM, and [11] reviewed underwater acoustic SLAM from the perspective of sonar image registration and loop closure detection. Over recent years, underwater visual SLAM has developed rapidly and played an important role in marine resources exploration. However, there is still a lack of systematic review of it. Therefore, after consulting the literature on underwater visual SLAM in recent 15 years on Web of Science, IEEE Xplore and Google Scholar, starting from the framework of visual SLAM, this paper summarizes the development of underwater visual SLAM in recent years, and mainly introduce the following four parts of underwater visual SLAM: related sensors, front-end visual odometry, back-end state optimization, and loop closure detection, as shown in Fig. 3. For positioning, a map can be a simple set of landmarks to meet the requirements of the task. Once the locations of landmarks are determined, the map is constructed. Therefore, this paper will not introduce the mapping process at great length. The structure of the paper is organized as follows. Chapter 1 summarizes the basic situation of underwater SLAM, compares three different underwater SLAM methods, introduces the basic framework of visual SLAM, and highlights the difficulties in special underwater environment; Chapters 2, 3, 4 and 5 introduce the basic content and research status of underwater visual SLAM, covering the related sensors, front-end visual odometry, back-end state optimization and loop closure detection, respectively; Chapter 6 discusses difficulties and challenges in the field of underwater visual SLAM; Chapter 7 gives a summary and prospect.
Section snippets
Proprioceptive sensors
The sensors for underwater vehicles can be divided into proprioceptive sensors and exteroceptive ones. The latter is mainly used to perceive the external environment, while the former to estimate the state and position of the vehicle itself without external assistance. In addition to necessary external information, the realization of underwater SLAM often requires proprioceptive sensors to provide information such as depth, orientation, and acceleration. Fig. 4 illustrates the common sensors
Front-end visual odometry
The front-end visual odometry is used to roughly estimates the pose of a camera based on the information acquired from adjacent images, and provides a better initial value for the back-end. Visual SLAM without loop closure detection is also called visual odometry. The pose of a camera can be obtained by the following formula: where is the rotation matrix and translation vector of the camera, at a known initial position , and the camera pose corresponding to any time can
Back-end state optimization
SLAM is essentially an estimation of the uncertainty of the agent itself and the surrounding space [2]. State optimization is the core content of SLAM. Visual odometry gives a short-time pose estimation of cameras, so this process will inevitably lead to cumulative errors. With the accumulation of time, this estimation will become more and more unreliable. On the basis of the visual odometry, the back-end can realize the state optimization in a larger scale and longer time. For underwater
Loop closure detection
In the process of vehicle motion, it is inevitable to produce cumulative error (according to Formula (1)) and thus lead to unreliable long-term estimations as well as failures in establishing globally consistent trajectories and maps. Loop closure detection determines whether a vehicle has reached the previous position by calculating the similarity between maps, and transmits the detection information to the back-end for optimization (the diagram of loop closure detection process is shown in
Challenges in underwater visual SLAM
Compared with the laser and sonar methods, visual SLAM is not only cheap, easy to implement and install, but also convenient to create dense maps, because it can capture rich features [88]. For more complex tasks, such as reconstruction [89] and interaction [90], visual SLAM has more advantages. However, underwater is often an unstructured dynamic environment full of various noises, which impose on underwater visual SLAM a lot of great challenges:
1. The sensor data are noisy. As a result, the
Conclusions and outlook
Starting from the basic framework of visual SLAM, this article explores sensors, front-end visual odometry, back-end state optimization and loop closure detection related to underwater visual SLAM. Further, this article reviews and analyzes the development of underwater visual SLAM in recent years and discusses the existing challenges faced by underwater visual SLAM. Despite this, underwater visual SLAM has made considerable progress and development. Compared with other SLAM solutions, such as
CRediT authorship contribution statement
Song Zhang: Searching and finalizing articles, Formulating research questions, Data extraction, Data cross-checking and analyzing, Writing initial draft, Revising and finalizing article. Shili Zhao: Searching and finalizing articles, Revising and finalizing article. Dong An: Searching and finalizing articles, Data cross-checking and analyzing, Revising and finalizing article, Supervision. Jincun Liu: Searching and finalizing articles, Data cross-checking and analyzing, Revising and finalizing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This paper was supported by Ministry of Science and Technology of the People’s Republic of China (Grant No. 2019YFE0103700), Hebei Province Department of Science and Technology (Grant No. 20327217D) and Shandong Province Department of Science and Technology (Grant No. 2021TZXD006).
References (95)
- et al.
Improving RGB-D SLAM in dynamic environments: A motion removal approach
Robot. Auton. Syst.
(2017) - et al.
Motion removal for reliable RGB-D SLAM in dynamic environments
Robot. Auton. Syst.
(2018) - et al.
Indoor SLAM application using geometric and ICP matching methods based on line features
Robot. Auton. Syst.
(2018) - et al.
Speeded-up robust features (SURF)
Comput. Vis. Image Underst.
(2008) - et al.
Automatic red-channel underwater image restoration
J. Vis. Commun. Image Represent.
(2015) - et al.
An open-source bio-inspired solution to underwater SLAM
IFAC-PapersOnLine
(2015) - et al.
Deep learning for natural language processing in radiology—fundamentals and a systematic review
J. Am. Coll. Radiol.
(2020) - et al.
A survey on deep learning for big data
Inf. Fusion
(2018) - et al.
Underwater object tracking using sonar and USBL measurements
J. Sens.
(2016) - et al.
On the representation and estimation of spatial uncertainty
Int. J. Robot. Res.
(1986)