Ground and aerial meta-data integration for localization and reconstruction: A review
Introduction
Localization and reconstruction are two highly related research areas. Localization is to determine the location in the scene of the data acquisition equipment, e.g. camera or laser scanner; while reconstruction is to recover the 3D information of the scene using the acquired meta-data, e.g. images or light detection and ranging (LIDAR) data.
Currently, approaches that tackle the localization problem are roughly divided into two categories: image based methods and structure based methods. Image based methods cast the localization problem as an image retrieval task and represent a scene as a database of geo-tagged images [62], [83], [98]. The location of the query image is usually approximated to the geo-tag of the most relevant retrieved image. This process is also termed image geo-localization. In contrast, structure based methods cast the localization problem as a camera resection task [84], [94], [116]. By matching features, the pose (location and orientation) of the data acquisition equipment (camera or laser scanner) is recovered. The estimated pose is relatively accurate and is usually used for robot localization or augmented reality (AR).
According to the involved meta-data, scene reconstruction approaches broadly fall into two types: image based methods and laser based methods. The pipeline of image based methods contain: (1) structure-from-motion (SfM) [11], [16], [17], [18], [86] to calibrate camera poses, (2) multi-view stereo (MVS) [87], [92] to get dense point cloud and (3) surface reconstruction [71], [101] to obtain surface mesh. In order to achieve more delicate scene reconstructions, sometimes LIDAR data is involved [66], [103], [117]. Compared with image based ones, though the laser based methods are more accurate and less dependent on the circumstances, they are costlier and less flexible.
Though both localization and reconstruction methods have achieved impressive improvements in recent years, most methods only consider single-source (ground or aerial) meta-data. However, there are several advantages of integrating ground and aerial meta-data for both localization and reconstruction tasks. Here, two examples are given. (1) For geo-localizing a ground image, aerial meta-data is more appropriate to be the localization reference. That is because compared with the non-uniformly sampled ground met-data, aerial meta-data provides more complete coverage of the scene. (2) For reconstructing an architectural scene, ground and aerial meta-data is complementary as they provide close-range observation and large-scale coverage of the scene respectively. By integrating them, both detail and completeness of the scene are guaranteed. Note that in this paper, the ground meta-data considered includes images (e.g. street-view images or images downloaded from the Internet) and LIDAR data. While the aerial meta-data also includes images (e.g. satellite images or oblique bird’s eye view images) and LIDAR data.
Though with many advantages, the integration of ground and aerial meta-data is non-trivial. There are several difficulties need to be considered. For finding 2D point matches (image matching), the differences between the ground and aerial images in viewpoint, scale, illumination, etc. are notable. As a result, the traditional 2D local features, e.g. scale invariant feature transform (SIFT) [54], speeded-up robust features (SURF) [4] and affine SIFT (ASIFT) [63], are inadequate to handle this image matching problem. In addition, the local geometrical constraint based robust feature matching modules [55], [56] would not achieve desirable results either, as they do not focus on handling the differences between the ground and aerial images, but on removing large proportions of outliers from the putative correspondence set. For obtaining 3D point correspondences (model registration), the discrepancies between the ground and aerial LIDAR data or the point clouds generated from ground and aerial images in terms of point density, accuracy, noise level, etc. are very large as well. Thus, neither the traditional 3D local features, e.g. spin images (SI) [33], fast point feature histograms (FPFH) [82] and rotational projection statistics (RoPS) [28], nor the commonly used iterative closest point (ICP) algorithm, e.g. point-to-point ICP [6] and point-to-plane ICP [12], can be used to deal with the model registration problem. Apart from these low-level feature based methods, points correspondence could also be treated as a graph matching problem [112], [113]. Though effective in many cases, these graph matching based methods are not appropriate for finding correspondences between ground and aerial meta-data with notable differences.
Recently, due to practical needs in areas such as autopilot, outdoor AR, and so on, the integration of ground and aerial meta-data for localization and reconstruction has been paid more and more attention. Lots of methods have been proposed to deal with the localization or reconstruction problem and achieved exciting results, which are reviewed respectively in this paper. The rest of this paper is organized as follows. Section 2 reviews the methods of integrating ground and aerial meta-data for localization, and Section 3 reviews the methods of integrating ground and aerial meta-data for reconstruction. Section 4 discusses the current datasets, evaluation metrics and future trends in integrating ground and aerial meta-data for both localization and reconstruction. Finally, Section 5 presents some concluding remarks.
Section snippets
Integrating ground and aerial meta-data for localization
As described in Section 1, localization methods are roughly divided into image based methods and structure based methods. Several methods belong to these two categories have been proposed to integrate ground and aerial meta-data for localization. They will be reviewed in this section.
Integrating ground and aerial meta-data for reconstruction
As described in Section 1, reconstruction methods broadly fall into two types: image based methods and laser based methods. Roughly speaking, the methods in computer vision community usually use images for reconstruction, while the methods in remote sensing community usually prefer integrating ground and aerial LIDAR data for reconstruction. These two kinds of methods are reviewed in this section.
Discussion
In this section, currently available datasets together with the evaluation metrics, and the potential future trends in integrating ground and aerial meta-data for localization and reconstruction are discussed.
Conclusion
In this paper, methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed respectively. First, the localization methods of integrating ground and aerial meta-data are divided into image based methods and structure based methods, and are reviewed. Then, the reconstruction methods are divided into image based methods and laser based methods, and are reviewed. To deal with the difficulties of integrating ground and aerial meta-data, several proposed methods
Conflict of interest
The authors declared that they have no conflicts of interest to this work.
Acknowledgements
This work was partially supported by the National Science Foundation of China (NSFC) under grants 61632003, 61333015 and 61421004, partially supported by Henan Science and Technology Innovation Outstanding Youth Program under grant 184100510009, and partially supported by Henan University Scientific and Technological Innovation Team Support Program under grant 19IRTSTHN012.
References (119)
- et al.
Speeded-up robust features (SURF)
Comput. Vision Image Understanding
(2008) - et al.
Tracks selection for robust, efficient and scalable large-scale structure from motion
Pattern Recognit.
(2017) - et al.
Extraction of lines from laser point clouds
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
(2006) - et al.
Articulated shape matching using laplacian eigenfunctions and unsupervised point registration
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2008) - et al.
Visual features for vehicle localization and ego-motion estimation
IEEE Intelligent Vehicles Symposium
(2009) - et al.
Cross-view image matching for geo-localization in urban environments
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017) - et al.
NetVLAD: CNN architecture for weakly supervised place recognition
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016) - et al.
Ultra-wide baseline facade matching for geo-localization
European Conference on Computer Vision Workshops (ECCVW)
(2012) - et al.
Geo-localization of street views with aerial image databases
ACM International Conference on Multimedia (MM)
(2011) - et al.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
(2002)
A method for registration of 3-D shapes
IEEE Trans. Pattern Anal. Mach. Intell.
Efficient volumetric fusion of airborne and street-side data for urban reconstruction
International Conference on Pattern Recognition (ICPR)
Efficient integration of aerial and terrestrial laser data
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Fast approximate energy minimization via graph cuts
IEEE Trans. Pattern Anal. Mach. Intell.
Semantic cross-view matching
IEEE International Conference on Computer Vision Workshop (ICCVW)
Robust relative rotation averaging
IEEE Trans. Pattern Anal. Mach. Intell.
Object modeling by registration of multiple range images
IEEE International Conference on Robotics and Automation
Semi-automatic registration of airborne and terrestrial laser scanning data using building corner matching with boundaries as reliability check
Remote Sens.
Learning a similarity metric discriminatively, with application to face verification
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
A variable-resolution probabilistic three-dimensional model for change detection
IEEE Trans. Geosci. Remote Sens.
HSfM: Hybrid structure-from-motion
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Global fusion of generalized camera model for efficient large-scale structure from motion
Sci. China Inf. Sci.
Multisensor fusion for volumetric reconstruction of large outdoor areas
IEEE International Conference on 3-D Digital Imaging and Modeling (3DIM)
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
Air-ground localization and map augmentation using monocular dense reconstruction
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
3D model generation for cities using aerial photographs and ground level laser scans
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Constructing 3D city models by merging ground-based and airborne views
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Accurate, dense, and robust multiview stereopsis
IEEE Trans. Pattern Anal. Mach. Intell.
Accurate and efficient ground-to-aerial model alignment
Pattern Recognit.
Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds
ISPRS J. Photogramm. Remote Sens.
Rotational projection statistics for 3D local surface description and object recognition
Int. J. Comput. Vis.
Extracting building footprints from 3D point clouds using terrestrial laser scanning at street level
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Line-based registration of terrestrial and airborne lidar data
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Sensor fusion: generating 3D by combining airborne and tripod mounted LIDAR data
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Using spin images for efficient object recognition in cluttered 3D scenes
IEEE Trans. Pattern Anal. Mach. Intell.
Dual contouring of hermite data
Proceedings of ACM SIGGRAPH
Alignment of 3D point clouds to overhead images
IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
An efficient algebraic solution to the perspective-three-point problem
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Terrestrial and aerial laser scanning data integration using wavelet analysis for the purpose of 3D building modeling
Sensors
PoseNet: A convolutional network for real-time 6-dof camera relocalization
IEEE International Conference on Computer Vision (ICCV)
Tanks and temples: benchmarking large-scale scene reconstruction
ACM Trans. Graph.
The tum-dlr multimodal earth observation evaluation benchmark
2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
The hungarian method for the assignment problem
Nav. Res. Logist.
Robust and efficient surface reconstruction from range data
Comput. Graphics Forum
Deep learning
Nature
Localization in urban environments by matching ground level video images with an aerial image
IEEE International Conference on Robotics and Automation (ICRA)
Planar structure matching under projective uncertainty for geolocation
European Conference on Computer Vision (ECCV)
Pairwise geometric matching for large-scale object retrieval
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Cross-view image geolocalization
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Cited by (27)
Understanding life and evolution using AI
2024, A Biologist's Guide to Artificial Intelligence: Building the foundations of Artificial Intelligence and Machine Learning for Achieving Advancements in Life SciencesAeroNet: An efficient relative localization and object detection network for cooperative aerial-ground unmanned vehicles
2023, Pattern Recognition LettersCross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images
2022, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :Therefore, there have been a number of works aiming to address this limitation through performing cross-view fusion between images collected at the ground level and the top-view sources (e.g., aerial 3D models) as an external reference. A common approach is to adopt a post-registration strategy (Surmann et al., 2017; Gao et al., 2019) that aligns separately reconstructed 3D models from different sources (i.e., ground and aerial), which may suffer from several drawbacks: firstly, highly complementary data sources may possess few common areas for alignment, which get more challenging when using the produced results (i.e., point clouds from the images) for registration, as they may be subject to various levels of geometric distortions (e.g., trajectory drifts), noises and filtering leading to further reduced common regions among these different sources; secondly, post-registration requires the 3D reconstruction of separate sources to be completed, thus has less flexibility to operate data fusion at the source data level to correct complex and non-rigid distortions; thirdly, post-registration is essentially an offline solution, which does not offer online capabilities to dynamically operate as data are collected such as in robotic mapping applications. To address these limitations, this paper proposes a Simultaneous Localization and Mapping (SLAM) solution for outdoor urban environments that dynamically performs frame-level alignment of ground-level video frames from a monocular camera to a given satellite 3D model generated using multi-view very-high-resolution (VHR) satellite images.
Image and Object Geo-Localization
2024, International Journal of Computer VisionA 3D urban scene reconstruction enhancement approach based on adaptive viewpoint selection of panoramic videos
2024, Photogrammetric RecordMultiplatform Bundle Adjustment Method Supported by Object Structural Information
2024, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing