Robust RGB-D simultaneous localization and mapping using planar point features

https://doi.org/10.1016/j.robot.2015.03.007Get rights and content

Highlights

  • We propose the planar point features to accurately align point features.

  • Comparative results of planar and classic features are given.

  • The planar features benefit the accuracy and robustness of RGB-D SLAM systems.

Abstract

RGB-D cameras like PrimeSense and Microsoft Kinect are popular sensors in the simultaneous localization and mapping researches on mobile robots because they can provide both vision and depth information. Most of the state-of-the-art RGB-D SLAM systems employ the Iterative Closest Point (ICP) algorithm to align point features, whose spatial positions are computed by the corresponding depth data of the sensors. However, the depth measurements of features are often disturbed by noise because visual features tend to lie at the margins of real objects. In order to reduce the estimation error, we propose a method that extracts and selects the features with reliable depth values, i.e. planar point features. The planar features can benefit the accuracy and robustness of traditional ICP, while holding a reasonable computation cost for real-time applications. An efficient RGB-D SLAM system based on planar features is also demonstrated, with trajectory and map results from open datasets and a physical robot in real-world experiments.

Introduction

Recent years have seen great improvements of simultaneous localization and mapping (SLAM), which serves as the core of a large amount of robotic applications such as autonomous navigation, manipulation, telepresence, service robot  [1], [2], [3]. The choice of sensor is a crucial part of the SLAM systems, including Laser Range Finders (LRFs)  [4], cameras (monocular, stereo or omnidirectional)  [5], [6], [7], [8], RGB-D sensors  [9], [10], [11], [12], [13], or a fusion of them  [14]. Among them the RGB-D sensors, like Microsoft Kinect and PrimeSense, which are typically based on technology of infrared structured light or time-of-flight sensing, have attracted great interests because of their low cost, light weight and the ability of providing both color and per-pixel depth information. Therefore, the SLAM systems using RGB-D sensors are widely investigated and bring on many challenging yet attractive problems, like the restricted Field-of-View (FoV) and the noise in the depth data acquired by the sensor.

An RGB-D SLAM system typically consists of two parts. The front end, also known as the graphic end, performs image processing, e.g., image acquirement, feature extraction, frame-to-frame alignment and loop closure detection. The back end, or the SLAM end, deals with global optimization to integrate all the previous observations into a global model. The loop closure detection is often modeled as a recognition problem and trends to be an independent subsystem. Researchers have proposed a lot of appearance-only loop closure detection approaches  [15], [16], [17].

In the front end, point features such as FAST  [18], Harris corner  [19], SIFT  [20], SURF  [21], are widely used as landmarks in most vision-based SLAM systems because they are convenient to detect and manage  [20], [22]. Point features are detected in each frame and then matched between two adjacent frames according to the descriptors of features. After that, the Iterative Closest Point (ICP) algorithm or its variant is applied to estimate the ego-motion. The input of ICP is the spatial positions of feature points, which can be calculated by the depth values of the RGB-D camera. However, visual features are sensitive to corners in the grayscale image because the detecting algorithm usually searches for certain kinds of maximal difference functions of one pixel’s value with its neighborhoods, like the maximal Difference of Gaussian (DoG) for SIFT and the sum of absolute difference for FAST. It leads to the fact that visual features, as important landmarks in vision-based SLAM, are likely to occur at the margins of objects where the depth data from the RGB-D sensor is often invalid or of high uncertainty  [23]. Furthermore, the features on the missing part of the depth image cannot provide any useful information for ICP, increasing the possibility of encountering feature-less situations.

In order to find more reliable features for estimation, a novel approach is proposed in this paper that first extracts planes from the point cloud acquired by the RGB-D camera, then tries to enhance the texture on the extracted planes. The depth values of them can be corrected according to the parameters of plane models. Point features are then detected and matched on each pair of planes using the RANdom Sample Consensus (RANSAC) framework  [24]. By this way, the features on planes, which are considered more accurate, are picked out from the noisy data. A comparison of accuracy, robustness and computation cost of the planar features against the traditional visual-only features is given. Furthermore, results of the proposed RGB-D SLAM system, including the estimated trajectories and point cloud maps, are demonstrated on an open dataset  [25] and in physical experiments.

The remainder of this paper is organized as follows: in the next section an overview of current RGB-D SLAM research is given. Section  3 presents an analysis of visual features and then introduces the method of detecting and matching planar features. RGB-D system is demonstrated in Section  4. The results of open dataset and physical experiments are presented in Section  5. Section  6 is a brief conclusion of this paper.

Section snippets

Related work

Visual SLAM (vSLAM) are mapping systems that employ monocular, stereo or RGB-D cameras as main sensing means to construct the trajectory of the robot and maps of the environment  [26]. There are many issues relevant to vSLAM like how to choose the framework (filter or pose graph)  [27], how to choose the features [22], [28] and how to detect the loops  [16], [29]. We will mainly review the work about feature selection because this paper is focused on this topic.

There are numerous visual

Problem definition

The main contribution of this paper is to propose a robust method for detecting and registering keypoints on RGB-D SLAM systems. Here the meaning of “robust” should be clarified. We expect that the registration keeps its accuracy or at least does not frequently fail because of the following reasons:

  • The noise within the data of sensors.

  • The lack of features (points, planes, or other primitives).

  • The sudden motion of the robot.

First, the reason why classical visual features may find it hard to

RGB-D SLAM

We build an RGB-D SLAM system using planar features to align keyframes. The schematic overview of the system is illustrated in Fig. 6. The system is constructed by two parts: a graphic end which performs image process and ego-motion estimation, and an optimization end, which performs global pose graph optimization based on the framework of g2o   [44].

Results

To evaluate the performance of matching planar features, a series of experiments are presented: (1) Planar feature detection and matching. The average pose error and the failure rate are computed to compare the performance of normal and planar features. Here the normal features refer to the classic visual-only features. (2) The trajectories and maps built from the open dataset provided by  [25]. (3) Physical SLAM experiments of a Turtlebot in a real-world indoor laboratory environment. Results

Conclusion

In this paper we focus on the problem of selecting robust and efficient point features in visual SLAM systems. First, we discuss the problem that visual features of RGB-D cameras usually have large noise in depth data. To reduce the error of ICP, we propose the method of detecting and utilizing planar point features. The planar features take advantage of both visual and spatial information. They are detected on the planes of the pointcloud obtained by RGB-D cameras. We further build an RGB-D

Xiang Gao received B.E. degree in automatic control science and engineering from Tsinghua University, Beijing, China, in 2012 and is currently persuing the Ph.D. degree in control science and engineering from Tsinghua University. His research interests include computer vision, simultaneous localization and mapping, robotics.

References (44)

  • A. Davison et al.

    MonoSLAM: real-time single camera SLAM

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • R. Sim et al.

    A study of the rao-blackwellised particle filter for efficient and accurate vision-based SLAM

    Int. J. Comput. Vis.

    (2007)
  • M. Liu et al.

    Topological mapping and scene recognition with lightweight color descriptors for an omnidirectional camera

    IEEE Trans. Robot.

    (2014)
  • F. Endres et al.

    3-D mapping with an RGB-D camera

    IEEE Trans. Robot.

    (2014)
  • P. Henry et al.

    RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments

    Int. J. Robot. Res.

    (2012)
  • A. Bachrach et al.

    Estimation, planning, and mapping for autonomous flight using an RGB-D camera in gps-denied environments

    Int. J. Robot. Res.

    (2012)
  • M. Labbe et al.

    Appearance-based loop closure detection for online large-scale and long-term operation

    IEEE Trans. Robot.

    (2013)
  • M. Cummins et al.

    Appearance-only SLAM at large scale with FAB-MAP 2.0

    Int. J. Robot. Res.

    (2011)
  • K. Ho et al.

    Detecting loop closure with scene sequences

    Int. J. Comput. Vis.

    (2007)
  • E. Rosten et al.

    Machine learning for high-speed corner detection

  • J. Shi et al.

    Good features to track

  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • Cited by (0)

    Xiang Gao received B.E. degree in automatic control science and engineering from Tsinghua University, Beijing, China, in 2012 and is currently persuing the Ph.D. degree in control science and engineering from Tsinghua University. His research interests include computer vision, simultaneous localization and mapping, robotics.

    Tao Zhang received the Ph.D. degree in control science and engineering from Tsinghua University, Beijing, China, in 1999 and the Ph.D. degree in electrical engineering from Saga University, Saga, Japan, in 2002.

    He became an Associate Professor with Saga University in 2002 and a Research Scientist in the National Institute of Informatics, Tokyo, Japan, in 2003. In 2006, he became an Associate Professor in the Department of Automation, Tsinghua University. His current research interests include pattern recognition, nonlinear system control, robotics, control engineering and artificial intelligent.

    View full text