
Open access
Author
Date
2018Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Research on UGVs and MAVs has made great progress in recent years. These platforms typically come with different characteristics. Aerial robots can be rapidly deployed and survey large areas. However, their payload and operation times are typically limited, allowing only limited on-board computations, and sensor outfit. UGVs on the other side offer extended operation times, and high payloads, but are considerably slower than aerial robots. In contemporary applications, it is highly desirable to use teams of such heterogeneous robots to exploit these complementary features. One major application is the localization between these robots, e.g., by rapidly constructing a map using a MAV, and then localizing an UGV in this map. However, the localization between different view-points and potentially different sensors is a difficult task. We therefore identify a need to advance science on localization for heterogeneous robots.
The overall approach of this thesis is to investigate different abstractions of robotic mapping data that yield invariance to the heterogeneities of ground and aerial robots. We formulate the challenge of localization as a feature matching problem. The first contribution is focused on localizing between Vision and LiDAR data. Here, we propose an abstraction towards one of the data. Since localizing in 3D data is more view-point invariant than image data, our choice is to represent the data towards the LiDAR, i.e., as 3D point-clouds. We therefore propose to either perform a dense reconstruction from the vision data, or use sparse direct key-point mapping. The main challenge is then to bridge the different sampling characteristics of the data by a suitable data description. We find that 3D descriptors can be a promising avenue for the localization between the data. While established 3D descriptors can give good performance on matching between LiDAR and densely reconstructed data, they suffer when using sparse 3D key-point data from VI mapping. Inspired by successful invariant 2D image descriptors, we thus transfer their working principle to 3D space, and propose a novel descriptor based on binary density-comparisons of 3D points. Our experiments show that the descriptor works well for the challenge of localizing between Vision and LiDAR data.
While our first contribution is applicable to localization between the two most common mapping sensor configurations in robotic applications, we extend our formulation in the second part of this thesis. Instead of using an abstract appearance of an environment, e.g., as 3D points, or image features, we take a step further by considering the underlying structure of man-made environments. We find that the underlying semantic meaning of scenes does not change due to view-point, appearance, or season. With recent advances in semantic scene understanding, we therefore find that a localization based on semantics is a promising avenue for localization. While preserving the overall formulation of the feature-based localization architecture, we develop a novel map representation, and feature extraction method that accounts for semantic information, and spatial topology of scenes. Here, we propose to represent maps as graphs of connected semantic instances. Therefore, the localization problem is reduced to searching a sub-graph, i.e., the query, in a potentially large graph, i.e., the database. However, finding a sub-graph of a graph is an np-complete problem and computation prohibitively expensive for common robotic applications. Motivated by this, we propose to use graph descriptors that can capture the local structure of sub-graphs and can conveniently be matched in a KNN fashion, and can therefore be used in our general abstraction-layer based localization framework. One concern with the 3D based methods is scaling as the computational efforts for matching many high-dimensional features are generally high and scale linearly with the size of the environment. Thus, using semantic graphs, we can represent the environment very compactly with a single vertex per semantic object compared to multiple 3D key-points and features in our structure-based work. Hence, descriptor matching becomes more light-weight in larger scale scenarios. We evaluate the effectiveness of our approach on km-scale scenarios both on simulated and real data. The results show a much higher degree of view-point invariance of the proposed approach compared to state of the art appearance-based algorithms. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000315116Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Mapping; Localization; Semantic Segmentation; SLAM; ROBOT VISION; Robotics; ROBOT POSITION + ROBOT ORIENTATIONOrganisational unit
03737 - Siegwart, Roland Y. / Siegwart, Roland Y.
Funding
609763 - Long-Term Human-Robot Teaming for Robot-Assisted Disaster Response (EC)
Related publications and datasets
Compiles: https://doi.org/10.3929/ethz-a-010819655
More
Show all metadata
ETH Bibliography
yes
Altmetrics