Elsevier

Robotics and Autonomous Systems

Volume 62, Issue 9, September 2014, Pages 1259-1270
Robotics and Autonomous Systems

Vision-based sparse topological mapping

https://doi.org/10.1016/j.robot.2014.03.015Get rights and content

Highlights

  • A hierarchical topological mapping algorithm with sparse node representation.

  • Hierarchical Inverted Files are proposed for efficient two-level map storage.

  • Various filters to process the similarity vectors at node and image levels.

  • A relative motion model to correlate odometry data of present and previous visits.

  • Comparison with state of the art techniques, and accuracy and sparsity analysis.

Abstract

Most of the existing appearance-based topological mapping algorithms produce dense topological maps in which each image stands as a node in the topological graph. Sparser maps can be built by representing groups of visually similar images of a sequence as nodes of a topological graph. In this paper, we present a sparse/hierarchical topological mapping framework which uses Image Sequence Partitioning (ISP) to group visually similar images of a sequence as nodes which are then connected on the occurrence of loop closures to form a topological graph. An indexing data structure called Hierarchical Inverted File (HIF) is proposed to store the sparse maps so as to perform loop closure at the two different resolutions of the map namely the node level and image level. TFIDF weighting is combined with spatial and frequency constraints on the detected features for improved loop closure robustness. Our approach is compared with two other existing sparse mapping approaches which use ISP. Sparsity, efficiency and accuracy of the resulting maps are evaluated and compared to that of the other two techniques on publicly available outdoor omni-directional image sequences.

Introduction

Mapping is one of the fundamental problems of Autonomous Mobile robotics. Mapping problem can be widely categorized as Topological and Metrical  [1]. Metrical mapping involves accurate position estimates of robots and landmarks of the environment. Topological mapping on the other hand represents an environment as a graph in which nodes correspond to places and the edges between them indicate some sort of connectivity. Recently, a third category called Topo-Metric Mapping  [2], [3] is gaining popularity. Topo-Metric mapping is a hybrid approach which uses both metrical and topological information in map building. Building an accurate map either metrical or topological depends on loop closure accuracy. Such maps are difficult to build using metrical information which is prone to gross errors in position estimation of robot and landmarks. On the other hand topological maps simply rely on the detection of topological connectivity of locations rather than demanding precise metrical accuracy of the environment. This connectivity is signaled by loop closure making it the crux of any topological mapping algorithm.

Many powerful vision-based topological mapping techniques that heavily rely on loop closure have been proposed over the past decade [4], [5], [6], [7], [8]. Most of them produce dense topological maps, in which every acquired image stands as a node in the topological graph. Similarly, sparse topological maps can be built in which each node represents a group of images rather than representing individual images. The images belonging to a node are sequentially contiguous and hence, spatially close, visually similar and collectively represent a place of the environment. Each node in a sparse topological map can be understood as a place — region of an environment throughout which visual appearance remains more or less constant. Sparse topological maps contain far less number of nodes compared to the total number of images and hence the name sparse. Sparse maps can also be interpreted as two level hierarchical maps since the maps consist of nodes which in turn consist images. Following are some of the important advantages of sparse maps:

  • For a loop closure all the images of a node/place are used rather than individual image comparison and hence accuracy can be increased.

  • Hierarchical loop closure can be performed which quickly shrinks the search space to a few images which are thoroughly matched still retaining real-time operation.

  • Place representation in maps can aid in accurate semantic labeling of topological map nodes  [9], [10] that can be used for lifelong operation and navigation of robots.

  • Accurate and efficient map merging  [11].

Breaking a sequence of images acquired by the robot into nodes/places is called Image Sequence Partitioning (ISP) and we achieve this using a local feature matching based similarity evaluation.

To facilitate hierarchical map representation and feature indexing, a data structure called Hierarchical Inverted File (HIF) is proposed. As opposed to the traditional inverted files  [12], [13], HIFs store features hierarchically in two levels–node level and image level. HIFs enable loop closure at two resolutions–a coarse node level loop closure which finds the most similar node and a finer image level loop closure which pin-points the most similar image inside a node. Constraints that exploit spatial locations and occurrence frequency of features in images are used to strengthen the image level loop closure and thereby eliminate false positives. A filter taking advantage of the temporal consistency of loop closure which also depends upon the vehicle velocity is used in validating loop closures. This paper is an extension of our previous work  [14], [15].

Experimentation was performed using omni-directional images from two of our own outdoor datasets and panoramic images from the popular NewCollege dataset  [16], all of which are publicly available. We compare our approach to two other mapping approaches that use ISP: the first is based on GIST  [17], [18] and the second one uses Optical Flow  [19]. Sparsity and accuracy of maps constructed using different ISP techniques are evaluated and the power of HIF representation in time efficient loop closure is demonstrated.

Section snippets

Related work

Scene Change Detection and Key Frame Selection for video segmentation and abstraction  [20], [21] have similar goals as that of ISP. They try to represent a video with fewer images called key frames whenever there is a sufficient change in the scene and most of them focus on the video compression domain. The major difference between these video abstraction problems and mapping is that mapping demands the localization of a query image which is obtained at a previously visited place, but with

Framework overview

Fig. 1 depicts a global overview of our framework. Given a query image or a newly acquired image, local image features are extracted and quantized into visual words. Any one out of the rich variety of existing local feature detectors like SIFT, SURF, etc., can be used for local image feature extraction. Quantization of local image features into visual words is performed using a visual vocabulary tree learned on a training dataset. The features extracted from the query image are then used for a

Image Sequence partitioning

Image Sequence Partitioning (ISP) answers the question: Does the query image belong to the current place or to a new place? We use a Local Feature Matching (LFM)-based similarity criterion to measure similarity between the current place and the query image. Let It be the query image, Nc be the current place node and Ip be the primordial image of the current place. A query image It is considered similar to the current node Nc if the percentage of matches between the of query image feature set Ft

Visual word indexing

Visual words which are the quantized values of local image features, simplify and accelerate feature matching tasks due to their compact representation. In our framework, visual words are used for node and image level loop closure operations. Storing visual words in the memory in such a way that facilitates their easy access is called indexing and can result in an optimized loop closure performance. Indexing is an essential need for numerous text retrieval applications and is commonly performed

Loop closure

As we have already seen in Fig. 1 loop closure is performed in two stages: first at the node level and then at the image level. The current section details these two loop closure processes in detail.

Similar approaches

This section provides a brief introduction to two recent approaches based on GIST and optical flow respectively, which also aim to build topological maps by partitioning image sequences and simultaneously use the maps to perform loop closure. Comparisons with respect to map sparsity, accuracy and computational time are provided in Section  8.

Experiments

Three image sequences in total are used in evaluating our approach. Out of the three sequences, two are our own publicly available datasets1 and the third one is the NewCollege dataset.2

A VIPALAB platform is used for the acquisition of our datasets, which is a car-like vehicle designed to serve as a prototype for research and

Discussion

We proposed a sparse/hierarchical topological mapping framework by organizing visual features of image sequences into node and image levels using Hierarchical Inverted Files. Image Sequence Partitioning is used to partition image sequences into nodes which represent different places in the environment. Loop closure is performed hierarchically at node and image levels. Our LFM+HIF approach is compared with two state of the art approaches which use ISP to produce sparse maps. The LFM+HIF approach

Acknowledgments

This work was funded by grants from the French program investissement d’avenir managed by the National Research Agency (ANR), the European Commission (Auvergne FEDER funds) and the Région Auvergne in the framework of the LabEx IMobS3 (ANR-10-LABX-16-01) and by the ANR project ARMEN (ANR-09-TECS-020).

Hemanth Korrapati received his M.S. by research degree from the International Institute of Information Technology, Hyderabad (IIIT-H), India in 2009.

He received his Ph.D. in Vision for Robotics at the Institut Pascal (formerly LASMEA-CNRS), Clermont- Ferrand.

His research focuses on the use of vision-based mapping, SLAM and machine learning applications.

References (36)

  • Z. Zivkovic et al.

    From images to rooms

    Robot. Auton. Syst.

    (2007)
  • J. Kosecká et al.

    Global localization and relative positioning based on scale-invariant keypoints

    Robot. Auton. Syst.

    (2005)
  • S. Thrun et al.

    Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)

    (2005)
  • N. Tomatis, Hybrid, metric-topological representation for localization and mapping, in: Robotics and Cognitive...
  • B. Kuipers, J. Modayil, P. Beeson, M. MacMahon, F. Savelli, Local metrical and global topological maps in the hybrid...
  • A. Angeli et al.

    Fast and incremental method for loop-closure detection using bags of visual words

    IEEE Trans. Robot.

    (2008)
  • A. Angeli, S. Doncieux, J.-A. Meyer, D. Filliat, Visual topological SLAM and global localization, in: ICRA, 2009, pp....
  • M. Cummins et al.

    FAB-MAP: probabilistic localization and mapping in the space of appearance

    Int. J. Robot. Res.

    (2008)
  • M. Cummins, P. Newman, Highly scalable appearance-only SLAM-FAB-MAP 2.0, in: Proceedings of Robotics: Science and...
  • F. Fraundorfer et al.

    Topological mapping, localization and navigation using image collections

  • A. Ranganathan, Pliss: detecting and labeling places using online change-point detection, in: Robotics: Science and...
  • A. Ranganathan

    Pliss: labeling places using online changepoint detection

    Auton. Robots

    (2012)
  • G. Erinc, S. Carpin, Anytime merging of appearance based maps, in: ICRA, 2012, pp....
  • N. Ziviani et al.

    Compression: A key for next-generation text retrieval systems

    Computer

    (2000)
  • D. Nister et al.

    Scalable recognition with a vocabulary tree

  • H. Korrapati, J. Courbon, Y. Mezouar, P. Martinet, Image sequence partitioning for outdoor mapping, in: ICRA, 2012, pp....
  • H. Korrapati, J. Courbon, Y. Mezouar, Topological mapping with image sequence partitioning, in: Frontiers of...
  • M. Smith et al.

    The new college vision and laser data set

    Int. J. Robot. Res.

    (2009)
  • Cited by (16)

    View all citing articles on Scopus

    Hemanth Korrapati received his M.S. by research degree from the International Institute of Information Technology, Hyderabad (IIIT-H), India in 2009.

    He received his Ph.D. in Vision for Robotics at the Institut Pascal (formerly LASMEA-CNRS), Clermont- Ferrand.

    His research focuses on the use of vision-based mapping, SLAM and machine learning applications.

    Youcef Mezouar received his Ph.D. degree in computer science from the University of Rennes 1, Rennes (France), in 2001. He has been Postdoctoral Associate in the Robotics Laboratory, Computer Science Department, Columbia University, New York, NY, and currently holds a Professor position at the Institut Français de Mécanique Avancée (IFMA), France. His research interests include robotics, microrobotics, computer vision, and vision-based control.

    View full text