Elsevier

Pattern Recognition Letters

Volume 127, 1 November 2019, Pages 202-214
Pattern Recognition Letters

Ground and aerial meta-data integration for localization and reconstruction: A review

https://doi.org/10.1016/j.patrec.2018.07.036Get rights and content

Highlights

  • Methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed.

  • Localization methods are reviewed in terms of image based methods and structure based methods.

  • Reconstruction methods are reviewed in terms of image based methods and laser based methods.

Abstract

Localization and reconstruction are two highly related research areas. Both of them have developed rapidly in recent years. Apparently, with the help of ground and aerial meta-data integration, the performance of both localization and reconstruction can go a step further. For localization, aerial meta-data provides a global reference, by which the ground query can achieve a cumulative error free absolute localization. As for reconstruction, a complete and detailed model can be reconstructed by integrating ground and aerial meta-data. Though with many advantages, the integration itself is non-trivial. It is difficult to obtain ground-to-aerial correspondences neither in 2D manner nor in 3D manner. That is because: (1) The differences between the ground and aerial images in viewpoint, scale, illumination, etc. are notable; (2) The discrepancies between the ground and aerial point clouds in terms of point density, accuracy, noise level, etc. are very large. To deal with these problems, lots of methods have been proposed recently. In this paper, the methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed respectively. Though many intermediate results with high quality have been achieved, we hope that inspired by the reviewed methods in this paper, more thorough methods and impressive results would emerge.

Introduction

Localization and reconstruction are two highly related research areas. Localization is to determine the location in the scene of the data acquisition equipment, e.g. camera or laser scanner; while reconstruction is to recover the 3D information of the scene using the acquired meta-data, e.g. images or light detection and ranging (LIDAR) data.

Currently, approaches that tackle the localization problem are roughly divided into two categories: image based methods and structure based methods. Image based methods cast the localization problem as an image retrieval task and represent a scene as a database of geo-tagged images [62], [83], [98]. The location of the query image is usually approximated to the geo-tag of the most relevant retrieved image. This process is also termed image geo-localization. In contrast, structure based methods cast the localization problem as a camera resection task [84], [94], [116]. By matching features, the pose (location and orientation) of the data acquisition equipment (camera or laser scanner) is recovered. The estimated pose is relatively accurate and is usually used for robot localization or augmented reality (AR).

According to the involved meta-data, scene reconstruction approaches broadly fall into two types: image based methods and laser based methods. The pipeline of image based methods contain: (1)  structure-from-motion (SfM) [11], [16], [17], [18], [86] to calibrate camera poses, (2) multi-view stereo (MVS) [87], [92] to get dense point cloud and (3) surface reconstruction [71], [101] to obtain surface mesh. In order to achieve more delicate scene reconstructions, sometimes LIDAR data is involved [66], [103], [117]. Compared with image based ones, though the laser based methods are more accurate and less dependent on the circumstances, they are costlier and less flexible.

Though both localization and reconstruction methods have achieved impressive improvements in recent years, most methods only consider single-source (ground or aerial) meta-data. However, there are several advantages of integrating ground and aerial meta-data for both localization and reconstruction tasks. Here, two examples are given. (1) For geo-localizing a ground image, aerial meta-data is more appropriate to be the localization reference. That is because compared with the non-uniformly sampled ground met-data, aerial meta-data provides more complete coverage of the scene. (2) For reconstructing an architectural scene, ground and aerial meta-data is complementary as they provide close-range observation and large-scale coverage of the scene respectively. By integrating them, both detail and completeness of the scene are guaranteed. Note that in this paper, the ground meta-data considered includes images (e.g. street-view images or images downloaded from the Internet) and LIDAR data. While the aerial meta-data also includes images (e.g. satellite images or oblique bird’s eye view images) and LIDAR data.

Though with many advantages, the integration of ground and aerial meta-data is non-trivial. There are several difficulties need to be considered. For finding 2D point matches (image matching), the differences between the ground and aerial images in viewpoint, scale, illumination, etc. are notable. As a result, the traditional 2D local features, e.g. scale invariant feature transform (SIFT) [54], speeded-up robust features (SURF) [4] and affine SIFT (ASIFT) [63], are inadequate to handle this image matching problem. In addition, the local geometrical constraint based robust feature matching modules [55], [56] would not achieve desirable results either, as they do not focus on handling the differences between the ground and aerial images, but on removing large proportions of outliers from the putative correspondence set. For obtaining 3D point correspondences (model registration), the discrepancies between the ground and aerial LIDAR data or the point clouds generated from ground and aerial images in terms of point density, accuracy, noise level, etc. are very large as well. Thus, neither the traditional 3D local features, e.g. spin images (SI) [33], fast point feature histograms (FPFH) [82] and rotational projection statistics (RoPS) [28], nor the commonly used iterative closest point (ICP) algorithm, e.g. point-to-point ICP [6] and point-to-plane ICP [12], can be used to deal with the model registration problem. Apart from these low-level feature based methods, points correspondence could also be treated as a graph matching problem [112], [113]. Though effective in many cases, these graph matching based methods are not appropriate for finding correspondences between ground and aerial meta-data with notable differences.

Recently, due to practical needs in areas such as autopilot, outdoor AR, and so on, the integration of ground and aerial meta-data for localization and reconstruction has been paid more and more attention. Lots of methods have been proposed to deal with the localization or reconstruction problem and achieved exciting results, which are reviewed respectively in this paper. The rest of this paper is organized as follows. Section 2 reviews the methods of integrating ground and aerial meta-data for localization, and Section 3 reviews the methods of integrating ground and aerial meta-data for reconstruction. Section 4 discusses the current datasets, evaluation metrics and future trends in integrating ground and aerial meta-data for both localization and reconstruction. Finally, Section 5 presents some concluding remarks.

Section snippets

Integrating ground and aerial meta-data for localization

As described in Section 1, localization methods are roughly divided into image based methods and structure based methods. Several methods belong to these two categories have been proposed to integrate ground and aerial meta-data for localization. They will be reviewed in this section.

Integrating ground and aerial meta-data for reconstruction

As described in Section 1, reconstruction methods broadly fall into two types: image based methods and laser based methods. Roughly speaking, the methods in computer vision community usually use images for reconstruction, while the methods in remote sensing community usually prefer integrating ground and aerial LIDAR data for reconstruction. These two kinds of methods are reviewed in this section.

Discussion

In this section, currently available datasets together with the evaluation metrics, and the potential future trends in integrating ground and aerial meta-data for localization and reconstruction are discussed.

Conclusion

In this paper, methods of integrating ground and aerial meta-data for localization and reconstruction are reviewed respectively. First, the localization methods of integrating ground and aerial meta-data are divided into image based methods and structure based methods, and are reviewed. Then, the reconstruction methods are divided into image based methods and laser based methods, and are reviewed. To deal with the difficulties of integrating ground and aerial meta-data, several proposed methods

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Acknowledgements

This work was partially supported by the National Science Foundation of China (NSFC) under grants 61632003, 61333015 and 61421004, partially supported by Henan Science and Technology Innovation Outstanding Youth Program under grant 184100510009, and partially supported by Henan University Scientific and Technological Innovation Team Support Program under grant 19IRTSTHN012.

References (119)

  • P.J. Besl et al.

    A method for registration of 3-D shapes

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1992)
  • A. Bódis-Szomorú et al.

    Efficient volumetric fusion of airborne and street-side data for urban reconstruction

    International Conference on Pattern Recognition (ICPR)

    (2016)
  • J. Böhm et al.

    Efficient integration of aerial and terrestrial laser data

    International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

    (2005)
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • F. Castaldo et al.

    Semantic cross-view matching

    IEEE International Conference on Computer Vision Workshop (ICCVW)

    (2015)
  • A. Chatterjee et al.

    Robust relative rotation averaging

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • Y. Chen et al.

    Object modeling by registration of multiple range images

    IEEE International Conference on Robotics and Automation

    (1991)
  • L. Cheng et al.

    Semi-automatic registration of airborne and terrestrial laser scanning data using building corner matching with boundaries as reliability check

    Remote Sens.

    (2013)
  • S. Chopra et al.

    Learning a similarity metric discriminatively, with application to face verification

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2005)
  • D. Crispell et al.

    A variable-resolution probabilistic three-dimensional model for change detection

    IEEE Trans. Geosci. Remote Sens.

    (2012)
  • H. Cui et al.

    HSfM: Hybrid structure-from-motion

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • H. Cui et al.

    Global fusion of generalized camera model for efficient large-scale structure from motion

    Sci. China Inf. Sci.

    (2016)
  • M. Fiocco et al.

    Multisensor fusion for volumetric reconstruction of large outdoor areas

    IEEE International Conference on 3-D Digital Imaging and Modeling (3DIM)

    (2005)
  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • C. Forster et al.

    Air-ground localization and map augmentation using monocular dense reconstruction

    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    (2013)
  • C. Frueh et al.

    3D model generation for cities using aerial photographs and ground level laser scans

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2001)
  • C. Frueh et al.

    Constructing 3D city models by merging ground-based and airborne views

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2003)
  • Y. Furukawa et al.

    Accurate, dense, and robust multiview stereopsis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • X. Gao et al.

    Accurate and efficient ground-to-aerial model alignment

    Pattern Recognit.

    (2018)
  • X. Gao et al.

    Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds

    ISPRS J. Photogramm. Remote Sens.

    (2018)
  • Y. Guo et al.

    Rotational projection statistics for 3D local surface description and object recognition

    Int. J. Comput. Vis.

    (2013)
  • K. Hammoudi et al.

    Extracting building footprints from 3D point clouds using terrestrial laser scanning at street level

    International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

    (2009)
  • W.V. Hansen et al.

    Line-based registration of terrestrial and airborne lidar data

    International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

    (2008)
  • S. Hu et al.

    CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2018)
  • A. Iavarone et al.

    Sensor fusion: generating 3D by combining airborne and tripod mounted LIDAR data

    International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

    (2003)
  • A.E. Johnson et al.

    Using spin images for efficient object recognition in cluttered 3D scenes

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1999)
  • T. Ju et al.

    Dual contouring of hermite data

    Proceedings of ACM SIGGRAPH

    (2002)
  • R.S. Kaminsky et al.

    Alignment of 3D point clouds to overhead images

    IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    (2009)
  • T. Ke et al.

    An efficient algebraic solution to the perspective-three-point problem

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • M. Kedzierski et al.

    Terrestrial and aerial laser scanning data integration using wavelet analysis for the purpose of 3D building modeling

    Sensors

    (2014)
  • A. Kendall et al.

    PoseNet: A convolutional network for real-time 6-dof camera relocalization

    IEEE International Conference on Computer Vision (ICCV)

    (2015)
  • A. Knapitsch et al.

    Tanks and temples: benchmarking large-scale scene reconstruction

    ACM Trans. Graph.

    (2017)
  • T. Koch et al.

    The tum-dlr multimodal earth observation evaluation benchmark

    2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    (2016)
  • H.W. Kuhn

    The hungarian method for the assignment problem

    Nav. Res. Logist.

    (2005)
  • P. Labatut et al.

    Robust and efficient surface reconstruction from range data

    Comput. Graphics Forum

    (2009)
  • Y. Lecun et al.

    Deep learning

    Nature

    (2015)
  • K.Y.K. Leung et al.

    Localization in urban environments by matching ground level video images with an aerial image

    IEEE International Conference on Robotics and Automation (ICRA)

    (2008)
  • A. Li et al.

    Planar structure matching under projective uncertainty for geolocation

    European Conference on Computer Vision (ECCV)

    (2014)
  • X. Li et al.

    Pairwise geometric matching for large-scale object retrieval

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • T.Y. Lin et al.

    Cross-view image geolocalization

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2013)
  • Cited by (27)

    • Understanding life and evolution using AI

      2024, A Biologist's Guide to Artificial Intelligence: Building the foundations of Artificial Intelligence and Machine Learning for Achieving Advancements in Life Sciences
    • Cross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images

      2022, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      Therefore, there have been a number of works aiming to address this limitation through performing cross-view fusion between images collected at the ground level and the top-view sources (e.g., aerial 3D models) as an external reference. A common approach is to adopt a post-registration strategy (Surmann et al., 2017; Gao et al., 2019) that aligns separately reconstructed 3D models from different sources (i.e., ground and aerial), which may suffer from several drawbacks: firstly, highly complementary data sources may possess few common areas for alignment, which get more challenging when using the produced results (i.e., point clouds from the images) for registration, as they may be subject to various levels of geometric distortions (e.g., trajectory drifts), noises and filtering leading to further reduced common regions among these different sources; secondly, post-registration requires the 3D reconstruction of separate sources to be completed, thus has less flexibility to operate data fusion at the source data level to correct complex and non-rigid distortions; thirdly, post-registration is essentially an offline solution, which does not offer online capabilities to dynamically operate as data are collected such as in robotic mapping applications. To address these limitations, this paper proposes a Simultaneous Localization and Mapping (SLAM) solution for outdoor urban environments that dynamically performs frame-level alignment of ground-level video frames from a monocular camera to a given satellite 3D model generated using multi-view very-high-resolution (VHR) satellite images.

    • Image and Object Geo-Localization

      2024, International Journal of Computer Vision
    • Multiplatform Bundle Adjustment Method Supported by Object Structural Information

      2024, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
    View all citing articles on Scopus
    View full text