Vision-based sparse topological mapping
Introduction
Mapping is one of the fundamental problems of Autonomous Mobile robotics. Mapping problem can be widely categorized as Topological and Metrical [1]. Metrical mapping involves accurate position estimates of robots and landmarks of the environment. Topological mapping on the other hand represents an environment as a graph in which nodes correspond to places and the edges between them indicate some sort of connectivity. Recently, a third category called Topo-Metric Mapping [2], [3] is gaining popularity. Topo-Metric mapping is a hybrid approach which uses both metrical and topological information in map building. Building an accurate map either metrical or topological depends on loop closure accuracy. Such maps are difficult to build using metrical information which is prone to gross errors in position estimation of robot and landmarks. On the other hand topological maps simply rely on the detection of topological connectivity of locations rather than demanding precise metrical accuracy of the environment. This connectivity is signaled by loop closure making it the crux of any topological mapping algorithm.
Many powerful vision-based topological mapping techniques that heavily rely on loop closure have been proposed over the past decade [4], [5], [6], [7], [8]. Most of them produce dense topological maps, in which every acquired image stands as a node in the topological graph. Similarly, sparse topological maps can be built in which each node represents a group of images rather than representing individual images. The images belonging to a node are sequentially contiguous and hence, spatially close, visually similar and collectively represent a place of the environment. Each node in a sparse topological map can be understood as a place — region of an environment throughout which visual appearance remains more or less constant. Sparse topological maps contain far less number of nodes compared to the total number of images and hence the name sparse. Sparse maps can also be interpreted as two level hierarchical maps since the maps consist of nodes which in turn consist images. Following are some of the important advantages of sparse maps:
- •
For a loop closure all the images of a node/place are used rather than individual image comparison and hence accuracy can be increased.
- •
Hierarchical loop closure can be performed which quickly shrinks the search space to a few images which are thoroughly matched still retaining real-time operation.
- •
Place representation in maps can aid in accurate semantic labeling of topological map nodes [9], [10] that can be used for lifelong operation and navigation of robots.
- •
Accurate and efficient map merging [11].
To facilitate hierarchical map representation and feature indexing, a data structure called Hierarchical Inverted File (HIF) is proposed. As opposed to the traditional inverted files [12], [13], HIFs store features hierarchically in two levels–node level and image level. HIFs enable loop closure at two resolutions–a coarse node level loop closure which finds the most similar node and a finer image level loop closure which pin-points the most similar image inside a node. Constraints that exploit spatial locations and occurrence frequency of features in images are used to strengthen the image level loop closure and thereby eliminate false positives. A filter taking advantage of the temporal consistency of loop closure which also depends upon the vehicle velocity is used in validating loop closures. This paper is an extension of our previous work [14], [15].
Experimentation was performed using omni-directional images from two of our own outdoor datasets and panoramic images from the popular NewCollege dataset [16], all of which are publicly available. We compare our approach to two other mapping approaches that use ISP: the first is based on GIST [17], [18] and the second one uses Optical Flow [19]. Sparsity and accuracy of maps constructed using different ISP techniques are evaluated and the power of HIF representation in time efficient loop closure is demonstrated.
Section snippets
Related work
Scene Change Detection and Key Frame Selection for video segmentation and abstraction [20], [21] have similar goals as that of ISP. They try to represent a video with fewer images called key frames whenever there is a sufficient change in the scene and most of them focus on the video compression domain. The major difference between these video abstraction problems and mapping is that mapping demands the localization of a query image which is obtained at a previously visited place, but with
Framework overview
Fig. 1 depicts a global overview of our framework. Given a query image or a newly acquired image, local image features are extracted and quantized into visual words. Any one out of the rich variety of existing local feature detectors like SIFT, SURF, etc., can be used for local image feature extraction. Quantization of local image features into visual words is performed using a visual vocabulary tree learned on a training dataset. The features extracted from the query image are then used for a
Image Sequence partitioning
Image Sequence Partitioning (ISP) answers the question: Does the query image belong to the current place or to a new place? We use a Local Feature Matching (LFM)-based similarity criterion to measure similarity between the current place and the query image. Let be the query image, be the current place node and be the primordial image of the current place. A query image is considered similar to the current node if the percentage of matches between the of query image feature set
Visual word indexing
Visual words which are the quantized values of local image features, simplify and accelerate feature matching tasks due to their compact representation. In our framework, visual words are used for node and image level loop closure operations. Storing visual words in the memory in such a way that facilitates their easy access is called indexing and can result in an optimized loop closure performance. Indexing is an essential need for numerous text retrieval applications and is commonly performed
Loop closure
As we have already seen in Fig. 1 loop closure is performed in two stages: first at the node level and then at the image level. The current section details these two loop closure processes in detail.
Similar approaches
This section provides a brief introduction to two recent approaches based on GIST and optical flow respectively, which also aim to build topological maps by partitioning image sequences and simultaneously use the maps to perform loop closure. Comparisons with respect to map sparsity, accuracy and computational time are provided in Section 8.
Experiments
Three image sequences in total are used in evaluating our approach. Out of the three sequences, two are our own publicly available datasets1 and the third one is the NewCollege dataset.2
A VIPALAB platform is used for the acquisition of our datasets, which is a car-like vehicle designed to serve as a prototype for research and
Discussion
We proposed a sparse/hierarchical topological mapping framework by organizing visual features of image sequences into node and image levels using Hierarchical Inverted Files. Image Sequence Partitioning is used to partition image sequences into nodes which represent different places in the environment. Loop closure is performed hierarchically at node and image levels. Our LFM+HIF approach is compared with two state of the art approaches which use ISP to produce sparse maps. The LFM+HIF approach
Acknowledgments
This work was funded by grants from the French program investissement d’avenir managed by the National Research Agency (ANR), the European Commission (Auvergne FEDER funds) and the Région Auvergne in the framework of the LabEx IMobS3 (ANR-10-LABX-16-01) and by the ANR project ARMEN (ANR-09-TECS-020).
Hemanth Korrapati received his M.S. by research degree from the International Institute of Information Technology, Hyderabad (IIIT-H), India in 2009.
He received his Ph.D. in Vision for Robotics at the Institut Pascal (formerly LASMEA-CNRS), Clermont- Ferrand.
His research focuses on the use of vision-based mapping, SLAM and machine learning applications.
References (36)
- et al.
From images to rooms
Robot. Auton. Syst.
(2007) - et al.
Global localization and relative positioning based on scale-invariant keypoints
Robot. Auton. Syst.
(2005) - et al.
Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)
(2005) - N. Tomatis, Hybrid, metric-topological representation for localization and mapping, in: Robotics and Cognitive...
- B. Kuipers, J. Modayil, P. Beeson, M. MacMahon, F. Savelli, Local metrical and global topological maps in the hybrid...
- et al.
Fast and incremental method for loop-closure detection using bags of visual words
IEEE Trans. Robot.
(2008) - A. Angeli, S. Doncieux, J.-A. Meyer, D. Filliat, Visual topological SLAM and global localization, in: ICRA, 2009, pp....
- et al.
FAB-MAP: probabilistic localization and mapping in the space of appearance
Int. J. Robot. Res.
(2008) - M. Cummins, P. Newman, Highly scalable appearance-only SLAM-FAB-MAP 2.0, in: Proceedings of Robotics: Science and...
- et al.
Topological mapping, localization and navigation using image collections
Pliss: labeling places using online changepoint detection
Auton. Robots
Compression: A key for next-generation text retrieval systems
Computer
Scalable recognition with a vocabulary tree
The new college vision and laser data set
Int. J. Robot. Res.
Cited by (16)
Vision-based topological mapping and localization methods: A survey
2015, Robotics and Autonomous SystemsCitation Excerpt :This allowed them to create maps containing over 11,000 images and a decent amount of frames per second. In a more recent work [201], they also proposed a hierarchical topological mapping algorithm using a sparse node representation where Hierarchical Inverted Files (HIF) were employed for an efficient two-level map storage. By way of example of the performance achievable by the approaches surveyed in this paper, this section compares a selection of the solutions described in the previous sections.
Scalable and Efficient Hierarchical Visual Topological Mapping
2023, 2023 21st International Conference on Advanced Robotics, ICAR 2023The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on Visual Loop Closure Detection
2022, IEEE Transactions on Intelligent Transportation SystemsThe Revisiting Problem in Simultaneous Localization and Mapping
2022, Springer Tracts in Advanced RoboticsHierarchical Multi-Process Fusion for Visual Place Recognition
2020, Proceedings - IEEE International Conference on Robotics and Automation
Hemanth Korrapati received his M.S. by research degree from the International Institute of Information Technology, Hyderabad (IIIT-H), India in 2009.
He received his Ph.D. in Vision for Robotics at the Institut Pascal (formerly LASMEA-CNRS), Clermont- Ferrand.
His research focuses on the use of vision-based mapping, SLAM and machine learning applications.
Youcef Mezouar received his Ph.D. degree in computer science from the University of Rennes 1, Rennes (France), in 2001. He has been Postdoctoral Associate in the Robotics Laboratory, Computer Science Department, Columbia University, New York, NY, and currently holds a Professor position at the Institut Français de Mécanique Avancée (IFMA), France. His research interests include robotics, microrobotics, computer vision, and vision-based control.