Ground and aerial meta-data integration for localization and reconstruction: A review

doi:10.1016/j.patrec.2018.07.036

Pattern Recognition Letters

Volume 127, 1 November 2019, Pages 202-214

https://doi.org/10.1016/j.patrec.2018.07.036 Get rights and content

Highlights

•
Methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed.
•
Localization methods are reviewed in terms of image based methods and structure based methods.
•
Reconstruction methods are reviewed in terms of image based methods and laser based methods.

Abstract

Localization and reconstruction are two highly related research areas. Both of them have developed rapidly in recent years. Apparently, with the help of ground and aerial meta-data integration, the performance of both localization and reconstruction can go a step further. For localization, aerial meta-data provides a global reference, by which the ground query can achieve a cumulative error free absolute localization. As for reconstruction, a complete and detailed model can be reconstructed by integrating ground and aerial meta-data. Though with many advantages, the integration itself is non-trivial. It is difficult to obtain ground-to-aerial correspondences neither in 2D manner nor in 3D manner. That is because: (1) The differences between the ground and aerial images in viewpoint, scale, illumination, etc. are notable; (2) The discrepancies between the ground and aerial point clouds in terms of point density, accuracy, noise level, etc. are very large. To deal with these problems, lots of methods have been proposed recently. In this paper, the methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed respectively. Though many intermediate results with high quality have been achieved, we hope that inspired by the reviewed methods in this paper, more thorough methods and impressive results would emerge.

Introduction

Localization and reconstruction are two highly related research areas. Localization is to determine the location in the scene of the data acquisition equipment, e.g. camera or laser scanner; while reconstruction is to recover the 3D information of the scene using the acquired meta-data, e.g. images or light detection and ranging (LIDAR) data.

Currently, approaches that tackle the localization problem are roughly divided into two categories: image based methods and structure based methods. Image based methods cast the localization problem as an image retrieval task and represent a scene as a database of geo-tagged images [62], [83], [98]. The location of the query image is usually approximated to the geo-tag of the most relevant retrieved image. This process is also termed image geo-localization. In contrast, structure based methods cast the localization problem as a camera resection task [84], [94], [116]. By matching features, the pose (location and orientation) of the data acquisition equipment (camera or laser scanner) is recovered. The estimated pose is relatively accurate and is usually used for robot localization or augmented reality (AR).

According to the involved meta-data, scene reconstruction approaches broadly fall into two types: image based methods and laser based methods. The pipeline of image based methods contain: (1) structure-from-motion (SfM) [11], [16], [17], [18], [86] to calibrate camera poses, (2) multi-view stereo (MVS) [87], [92] to get dense point cloud and (3) surface reconstruction [71], [101] to obtain surface mesh. In order to achieve more delicate scene reconstructions, sometimes LIDAR data is involved [66], [103], [117]. Compared with image based ones, though the laser based methods are more accurate and less dependent on the circumstances, they are costlier and less flexible.

Though both localization and reconstruction methods have achieved impressive improvements in recent years, most methods only consider single-source (ground or aerial) meta-data. However, there are several advantages of integrating ground and aerial meta-data for both localization and reconstruction tasks. Here, two examples are given. (1) For geo-localizing a ground image, aerial meta-data is more appropriate to be the localization reference. That is because compared with the non-uniformly sampled ground met-data, aerial meta-data provides more complete coverage of the scene. (2) For reconstructing an architectural scene, ground and aerial meta-data is complementary as they provide close-range observation and large-scale coverage of the scene respectively. By integrating them, both detail and completeness of the scene are guaranteed. Note that in this paper, the ground meta-data considered includes images (e.g. street-view images or images downloaded from the Internet) and LIDAR data. While the aerial meta-data also includes images (e.g. satellite images or oblique bird’s eye view images) and LIDAR data.

Though with many advantages, the integration of ground and aerial meta-data is non-trivial. There are several difficulties need to be considered. For finding 2D point matches (image matching), the differences between the ground and aerial images in viewpoint, scale, illumination, etc. are notable. As a result, the traditional 2D local features, e.g. scale invariant feature transform (SIFT) [54], speeded-up robust features (SURF) [4] and affine SIFT (ASIFT) [63], are inadequate to handle this image matching problem. In addition, the local geometrical constraint based robust feature matching modules [55], [56] would not achieve desirable results either, as they do not focus on handling the differences between the ground and aerial images, but on removing large proportions of outliers from the putative correspondence set. For obtaining 3D point correspondences (model registration), the discrepancies between the ground and aerial LIDAR data or the point clouds generated from ground and aerial images in terms of point density, accuracy, noise level, etc. are very large as well. Thus, neither the traditional 3D local features, e.g. spin images (SI) [33], fast point feature histograms (FPFH) [82] and rotational projection statistics (RoPS) [28], nor the commonly used iterative closest point (ICP) algorithm, e.g. point-to-point ICP [6] and point-to-plane ICP [12], can be used to deal with the model registration problem. Apart from these low-level feature based methods, points correspondence could also be treated as a graph matching problem [112], [113]. Though effective in many cases, these graph matching based methods are not appropriate for finding correspondences between ground and aerial meta-data with notable differences.

Recently, due to practical needs in areas such as autopilot, outdoor AR, and so on, the integration of ground and aerial meta-data for localization and reconstruction has been paid more and more attention. Lots of methods have been proposed to deal with the localization or reconstruction problem and achieved exciting results, which are reviewed respectively in this paper. The rest of this paper is organized as follows. Section 2 reviews the methods of integrating ground and aerial meta-data for localization, and Section 3 reviews the methods of integrating ground and aerial meta-data for reconstruction. Section 4 discusses the current datasets, evaluation metrics and future trends in integrating ground and aerial meta-data for both localization and reconstruction. Finally, Section 5 presents some concluding remarks.

Section snippets

Integrating ground and aerial meta-data for localization

As described in Section 1, localization methods are roughly divided into image based methods and structure based methods. Several methods belong to these two categories have been proposed to integrate ground and aerial meta-data for localization. They will be reviewed in this section.

Integrating ground and aerial meta-data for reconstruction

As described in Section 1, reconstruction methods broadly fall into two types: image based methods and laser based methods. Roughly speaking, the methods in computer vision community usually use images for reconstruction, while the methods in remote sensing community usually prefer integrating ground and aerial LIDAR data for reconstruction. These two kinds of methods are reviewed in this section.

Discussion

In this section, currently available datasets together with the evaluation metrics, and the potential future trends in integrating ground and aerial meta-data for localization and reconstruction are discussed.

Conclusion

In this paper, methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed respectively. First, the localization methods of integrating ground and aerial meta-data are divided into image based methods and structure based methods, and are reviewed. Then, the reconstruction methods are divided into image based methods and laser based methods, and are reviewed. To deal with the difficulties of integrating ground and aerial meta-data, several proposed methods

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Acknowledgements

This work was partially supported by the National Science Foundation of China (NSFC) under grants 61632003, 61333015 and 61421004, partially supported by Henan Science and Technology Innovation Outstanding Youth Program under grant 184100510009, and partially supported by Henan University Scientific and Technological Innovation Team Support Program under grant 19IRTSTHN012.

References (119)

H. Bay et al.
Speeded-up robust features (SURF)
Comput. Vision Image Understanding
(2008)
H. Cui et al.
Tracks selection for robust, efficient and scalable large-scale structure from motion
Pattern Recognit.
(2017)
H. Gross et al.
Extraction of lines from laser point clouds
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
(2006)
D. Mateus et al.
Articulated shape matching using laplacian eigenfunctions and unsupervised point registration
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2008)
O. Pink et al.
Visual features for vehicle localization and ego-motion estimation
IEEE Intelligent Vehicles Symposium
(2009)
Y. Tian et al.
Cross-view image matching for geo-localization in urban environments
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017)
R. Arandjelovic et al.
NetVLAD: CNN architecture for weakly supervised place recognition
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016)
M. Bansal et al.
Ultra-wide baseline facade matching for geo-localization
European Conference on Computer Vision Workshops (ECCVW)
(2012)
M. Bansal et al.
Geo-localization of street views with aerial image databases
ACM International Conference on Multimedia (MM)
(2011)
S. Belongie et al.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
(2002)

P.J. Besl et al.

A method for registration of 3-D shapes

IEEE Trans. Pattern Anal. Mach. Intell.

(1992)

A. Bódis-Szomorú et al.

Efficient volumetric fusion of airborne and street-side data for urban reconstruction

International Conference on Pattern Recognition (ICPR)

(2016)

J. Böhm et al.

Efficient integration of aerial and terrestrial laser data

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

(2005)

Y. Boykov et al.

Fast approximate energy minimization via graph cuts

IEEE Trans. Pattern Anal. Mach. Intell.

(2001)

F. Castaldo et al.

Semantic cross-view matching

IEEE International Conference on Computer Vision Workshop (ICCVW)

(2015)

A. Chatterjee et al.

Robust relative rotation averaging

IEEE Trans. Pattern Anal. Mach. Intell.

(2018)

Y. Chen et al.

Object modeling by registration of multiple range images

IEEE International Conference on Robotics and Automation

(1991)

L. Cheng et al.

Semi-automatic registration of airborne and terrestrial laser scanning data using building corner matching with boundaries as reliability check

Remote Sens.

(2013)

S. Chopra et al.

Learning a similarity metric discriminatively, with application to face verification

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2005)

D. Crispell et al.

A variable-resolution probabilistic three-dimensional model for change detection

IEEE Trans. Geosci. Remote Sens.

(2012)

H. Cui et al.

HSfM: Hybrid structure-from-motion

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2017)

H. Cui et al.

Global fusion of generalized camera model for efficient large-scale structure from motion

Sci. China Inf. Sci.

(2016)

M. Fiocco et al.

Multisensor fusion for volumetric reconstruction of large outdoor areas

IEEE International Conference on 3-D Digital Imaging and Modeling (3DIM)

(2005)

M.A. Fischler et al.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

(1981)

C. Forster et al.

Air-ground localization and map augmentation using monocular dense reconstruction

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

(2013)

C. Frueh et al.

3D model generation for cities using aerial photographs and ground level laser scans

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2001)

C. Frueh et al.

Constructing 3D city models by merging ground-based and airborne views

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2003)

Y. Furukawa et al.

Accurate, dense, and robust multiview stereopsis

IEEE Trans. Pattern Anal. Mach. Intell.

(2010)

X. Gao et al.

Accurate and efficient ground-to-aerial model alignment

Pattern Recognit.

(2018)

X. Gao et al.

Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds

ISPRS J. Photogramm. Remote Sens.

(2018)

Y. Guo et al.

Rotational projection statistics for 3D local surface description and object recognition

Int. J. Comput. Vis.

(2013)

K. Hammoudi et al.

Extracting building footprints from 3D point clouds using terrestrial laser scanning at street level

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

(2009)

W.V. Hansen et al.

Line-based registration of terrestrial and airborne lidar data

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

(2008)

S. Hu et al.

CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2018)

A. Iavarone et al.

Sensor fusion: generating 3D by combining airborne and tripod mounted LIDAR data

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

(2003)

A.E. Johnson et al.

Using spin images for efficient object recognition in cluttered 3D scenes

IEEE Trans. Pattern Anal. Mach. Intell.

(1999)

T. Ju et al.

Dual contouring of hermite data

Proceedings of ACM SIGGRAPH

(2002)

R.S. Kaminsky et al.

Alignment of 3D point clouds to overhead images

IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

(2009)

T. Ke et al.

An efficient algebraic solution to the perspective-three-point problem

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2017)

M. Kedzierski et al.

Terrestrial and aerial laser scanning data integration using wavelet analysis for the purpose of 3D building modeling

Sensors

(2014)

A. Kendall et al.

PoseNet: A convolutional network for real-time 6-dof camera relocalization

IEEE International Conference on Computer Vision (ICCV)

(2015)

A. Knapitsch et al.

Tanks and temples: benchmarking large-scale scene reconstruction

ACM Trans. Graph.

(2017)

T. Koch et al.

The tum-dlr multimodal earth observation evaluation benchmark

2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

(2016)

H.W. Kuhn

The hungarian method for the assignment problem

Nav. Res. Logist.

(2005)

P. Labatut et al.

Robust and efficient surface reconstruction from range data

Comput. Graphics Forum

(2009)

Y. Lecun et al.

Deep learning

Nature

(2015)

K.Y.K. Leung et al.

Localization in urban environments by matching ground level video images with an aerial image

IEEE International Conference on Robotics and Automation (ICRA)

(2008)

A. Li et al.

Planar structure matching under projective uncertainty for geolocation

European Conference on Computer Vision (ECCV)

(2014)

X. Li et al.

Pairwise geometric matching for large-scale object retrieval

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2015)

T.Y. Lin et al.

Cross-view image geolocalization

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2013)

Cited by (27)

Understanding life and evolution using AI
2024, A Biologist's Guide to Artificial Intelligence: Building the foundations of Artificial Intelligence and Machine Learning for Achieving Advancements in Life Sciences
The human intellect has been fascinated by the quest to comprehend the basic principles of life and the complex process of evolution for ages. Scientists have worked carefully to find solutions to the riddles that underlie the web of life, from the origins of life on Earth to the diversity of organisms that call this planet home. However, this endeavor has frequently been hampered by the enormous complexity inherent in the study of life and the cumbersomeness of conventional approaches. But with the development of artificial intelligence (AI), a remarkable technical advance, our capacity to understand the complexities of life and evolution has undergone a radical change. AI has become a potent tool for assessing complicated biological data, interpreting genetic codes, and modeling evolutionary processes, particularly in the form of advanced machine learning algorithms. AI has also eased our understanding of the complicated interactions between genes and the environment. AI algorithms analyze environmental data to find patterns and connections that explain how organisms evolve and adapt to changing environmental situations. AI-driven computational modeling and simulation have revolutionized evolutionary biology by illuminating the processes underlying speciation, the origin of new species, and the forces that shape evolutionary change. By realizing the potential of AI in hastening scientific advancement, we can open up new vistas in our comprehension of the complexity of life. In light of this rapid pace of technological development, AI has become a potent instrument for examining the beginnings, complexity, and potential futures of life on Earth.
AeroNet: An efficient relative localization and object detection network for cooperative aerial-ground unmanned vehicles
2023, Pattern Recognition Letters
This paper proposes an efficient relative localization and object detection network (AeroNet) based on incremental learning for minimalistic high-speed cooperative navigation of aerial-ground unmanned vehicles in cluttered environments. Due to highly limited computation capability and memory resources in micro-UAVs, YOLO series are applied as the baseline of object detection network, and a lightweight backbone is built based on the depthwise separable convolution. To improve the real-time performance, the detection head is formulated with broad learning system. Besides, 6D relative pose estimation is achieved via equation fitting of an elliptical cooperative mark. To verify the effectiveness of AeroNet, experiments are conducted on Intel NUC and NVIDIA TX2 with our self-collected dataset. Results show that AeroNet can progressively increase the accuracy of object detection to 89%, and the computational time is only 76ms on Intel NUC and 28ms on Nvidia TX2, respectively, which meet the need of real-time requirement of on-board calculation in micro-UAV avionics systems.
Cross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images
2022, ISPRS Journal of Photogrammetry and Remote Sensing
Citation Excerpt :
Therefore, there have been a number of works aiming to address this limitation through performing cross-view fusion between images collected at the ground level and the top-view sources (e.g., aerial 3D models) as an external reference. A common approach is to adopt a post-registration strategy (Surmann et al., 2017; Gao et al., 2019) that aligns separately reconstructed 3D models from different sources (i.e., ground and aerial), which may suffer from several drawbacks: firstly, highly complementary data sources may possess few common areas for alignment, which get more challenging when using the produced results (i.e., point clouds from the images) for registration, as they may be subject to various levels of geometric distortions (e.g., trajectory drifts), noises and filtering leading to further reduced common regions among these different sources; secondly, post-registration requires the 3D reconstruction of separate sources to be completed, thus has less flexibility to operate data fusion at the source data level to correct complex and non-rigid distortions; thirdly, post-registration is essentially an offline solution, which does not offer online capabilities to dynamically operate as data are collected such as in robotic mapping applications. To address these limitations, this paper proposes a Simultaneous Localization and Mapping (SLAM) solution for outdoor urban environments that dynamically performs frame-level alignment of ground-level video frames from a monocular camera to a given satellite 3D model generated using multi-view very-high-resolution (VHR) satellite images.
Accurate pose estimation of monocular ground-level images with respect to a satellite/aerial photogrammetric dataset is an extremely challenging task. Existing solutions often perform an offline post-registration on 3D results from both sources, which however, suffers from non-rigid geometric distortions of the 3D monocular reconstruction and the lack of overlaps between the air to ground content. This paper provides an online solution that performs accurate pose estimation of the ground images with respect to a 3D model derived from satellite images, followed by a dense 3D reconstruction. Our solution takes advantage of the simultaneous localization and mapping (SLAM) paradigm to dynamically incorporate reference observations from the satellite 3D model during the incremental pose estimation, called a cross-view SLAM solver, which leverages both ground-to-satellite error and image-level reprojection errors at the frame level to yield image poses that are well-registered to the satellite 3D model for facade point cloud reconstruction. This process also has the advantage of correcting non-rigid distortions and trajectory drifts that are often presented in monocular SLAM systems. In addition, our solution leverages both the geometric and semantic information from the satellite model and ground images to perform a per-frame correction for frame-level pose initialization, in which a novel scheme called pose buffer is introduced to initialize the pose of each keyframe through robust visual hull alignment of ground objects. The proposed approach has been experimented using four trajectories of monocular videos collections (around 7,000 frames per trajectory on average) and a 3D semantic model from multi-view satellite images to estimate the poses of the video frames and yield point clouds consistent with the satellite 3D models, evaluated by using LiDAR ground-truth. Both qualitative and quantitative experiments demonstrate that our solution yields accurate, drift-free poses and point clouds consistent with the satellite data and visually much more pleasing 3D models with facade information. Compared to the LiDAR ground-truth, the derived 3D models with ground-level images have achieved a mean absolute error of 1.78 m (improved from 3.15 m achieved using SLAM without utilizing satellite 3D models) (A testing program will be made available through https://github.com/GDAOSU/Cross-View-SLAM).
Image and Object Geo-Localization
2024, International Journal of Computer Vision
A 3D urban scene reconstruction enhancement approach based on adaptive viewpoint selection of panoramic videos
2024, Photogrammetric Record
Multiplatform Bundle Adjustment Method Supported by Object Structural Information
2024, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

View all citing articles on Scopus

View full text

Ground and aerial meta-data integration for localization and reconstruction: A review

Highlights

Abstract

Introduction

Section snippets

Integrating ground and aerial meta-data for localization

Integrating ground and aerial meta-data for reconstruction

Discussion

Conclusion

Conflict of interest

Acknowledgements

Comput. Vision Image Understanding

Pattern Recognit.

NetVLAD: CNN architecture for weakly supervised place recognition

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Ultra-wide baseline facade matching for geo-localization

European Conference on Computer Vision Workshops (ECCVW)

Geo-localization of street views with aerial image databases

ACM International Conference on Multimedia (MM)

Shape matching and object recognition using shape contexts

IEEE Trans. Pattern Anal. Mach. Intell.

A method for registration of 3-D shapes

IEEE Trans. Pattern Anal. Mach. Intell.

Efficient volumetric fusion of airborne and street-side data for urban reconstruction

International Conference on Pattern Recognition (ICPR)

Efficient integration of aerial and terrestrial laser data

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Fast approximate energy minimization via graph cuts

IEEE Trans. Pattern Anal. Mach. Intell.

Semantic cross-view matching

IEEE International Conference on Computer Vision Workshop (ICCVW)

Robust relative rotation averaging

IEEE Trans. Pattern Anal. Mach. Intell.

Object modeling by registration of multiple range images

IEEE International Conference on Robotics and Automation

Semi-automatic registration of airborne and terrestrial laser scanning data using building corner matching with boundaries as reliability check

Remote Sens.

Learning a similarity metric discriminatively, with application to face verification

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

A variable-resolution probabilistic three-dimensional model for change detection

IEEE Trans. Geosci. Remote Sens.

HSfM: Hybrid structure-from-motion

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Global fusion of generalized camera model for efficient large-scale structure from motion

Sci. China Inf. Sci.

Multisensor fusion for volumetric reconstruction of large outdoor areas

IEEE International Conference on 3-D Digital Imaging and Modeling (3DIM)

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

Air-ground localization and map augmentation using monocular dense reconstruction

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

3D model generation for cities using aerial photographs and ground level laser scans

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Constructing 3D city models by merging ground-based and airborne views

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Accurate, dense, and robust multiview stereopsis

IEEE Trans. Pattern Anal. Mach. Intell.

Accurate and efficient ground-to-aerial model alignment

Pattern Recognit.

Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds

ISPRS J. Photogramm. Remote Sens.

Rotational projection statistics for 3D local surface description and object recognition

Int. J. Comput. Vis.

Extracting building footprints from 3D point clouds using terrestrial laser scanning at street level

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Line-based registration of terrestrial and airborne lidar data

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Sensor fusion: generating 3D by combining airborne and tripod mounted LIDAR data

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Using spin images for efficient object recognition in cluttered 3D scenes

IEEE Trans. Pattern Anal. Mach. Intell.

Dual contouring of hermite data

Proceedings of ACM SIGGRAPH

Alignment of 3D point clouds to overhead images

IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

An efficient algebraic solution to the perspective-three-point problem

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Terrestrial and aerial laser scanning data integration using wavelet analysis for the purpose of 3D building modeling