Skip to main content
Log in

Image and Object Geo-Localization

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The concept of geo-localization broadly refers to the process of determining an entity’s geographical location, typically in the form of Global Positioning System (GPS) coordinates. The entity of interest may be an image, a sequence of images, a video, a satellite image, or even objects visible within the image. Recently, massive datasets of GPS-tagged media have become available due to smartphones and the internet, and deep learning has risen to prominence and enhanced the performance capabilities of machine learning models. These developments have enabled the rise of image and object geo-localization, which has impacted a wide range of applications such as augmented reality, robotics, self-driving vehicles, road maintenance, and 3D reconstruction. This paper provides a comprehensive survey of visual geo-localization, which may involve either determining the location at which an image has been captured (image geo-localization) or geolocating objects within an image (object geo-localization). We will provide an in-depth study of visual geo-localization including a summary of popular algorithms, a description of proposed datasets, and an analysis of performance results to illustrate the current state of the field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34

Similar content being viewed by others

Notes

  1. www.openstreetmap.org.

  2. www.openstreetmap.org.

  3. http://graphics.cs.cmu.edu/projects/im2gps/.

  4. https://www.flickr.com/.

  5. http://graphics.cs.cmu.edu/projects/im2gps/.

  6. http://www.mediafire.com/file/7ht7sn78q27o9we/im2gps3ktest.zip/file.

  7. http://www.nn4d.com/sanfranciscolandmark.

  8. http://www.cvlibs.net/datasets/kitti/.

  9. http://www.multimediacommons.org/.

  10. http://cphoto.fit.vutbr.cz/geoPose3K/.

  11. http://mvrl.cs.uky.edu/datasets/cvusa/.

  12. https://www.flickr.com/.

  13. https://www.bing.com/maps/.

  14. https://github.com/Liumouliu/OriCNN.

  15. https://developers.google.com/maps/documentation/streetview/overview.

  16. https://github.com/lugiavn/gt-crossview.

  17. https://developers.google.com/maps/documentation/maps-static/overview.

  18. https://www.crcv.ucf.edu/research/cross-view-image-matching-for-geo-localization-in-urban-environments/.

  19. http://www.mapchannels.com/DualMaps.aspx.

  20. https://github.com/Jeff-Zilence/VIGOR.

  21. https://wiki.openstreetmap.org/wiki/Zoom_levels.

  22. https://github.com/layumi/University1652-Baseline.

  23. https://earth.google.com/web/.

  24. https://github.com/MedChaabane/Static_Objects_Geolocalization.

  25. https://drive.google.com/drive/folders/1u_nx38M0_owB0cR-qA6IOWgZhGpb9sWU?usp=sharing.

References

  • Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., & Szeliski, R. (2011). Building rome in a day. Communications of the ACM, 54(10), 105–112. https://doi.org/10.1145/2001269.2001293

    Article  Google Scholar 

  • Almutairy, F., Alshaabi, T., Nelson, J., & Wshah, S. (2021). Arts: Automotive repository of traffic signs for the united states. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Intelligent Transportation Systems, 22(1), 457–465. https://doi.org/10.1109/TITS.2019.2958486

    Article  Google Scholar 

  • Anguelov, D., Dulong, C., Filip, D., Frueh, C., Lafon, S., Lyon, R., & Weaver, J. (2010). Google street view: Capturing the world at street level. Institute of Electrical and Electronics Engineers (IEEE) Computer, 43(6), 32–38.

    Google Scholar 

  • Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). Optics: Ordering points to identify the clustering structure. Proceedings of the 1999 ACM Sigmod International Conference on Management of Data (p. 49–60). Association for Computing Machinery. https://doi.org/10.1145/304182.304187

  • Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2018). Netvlad: CNN architecture for weakly supervised place recognition. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(6), 1437–1451. https://doi.org/10.1109/TPAMI.2017.2711011

    Article  Google Scholar 

  • Baatz, G., Saurer, O., Köser, K., & Pollefeys, M. (2012). Large scale visual geo-localization of images in mountainous terrain. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y. & Schmid, C. (Eds.) Computer Vision—ECCV 2012 (pp. 517–530). Springer.

  • Baatz, G., Saurer, O., Köser, K., & Pollefeys, M. (2012). Leveraging topographic maps for image to terrain alignment (p. 487-492). https://doi.org/10.1109/3DIMPVT.2012.33

  • Bansal, M., & Daniilidis, K. (2014). Geometric urban geo-localization. In Institute of electrical and electronics engineers (ieee) conference on computer vision and pattern recognition (CVPR) (p. 3978–3985). https://doi.org/10.1109/CVPR.2014.508

  • Benbihi, A., Arravechia, S., Geist, M., & Pradalier, C. (2020). Image-based place recognition on bucolic environment across seasons from semantic edge description (pp. 3032–3038). https://doi.org/10.1109/ICRA40945.2020.9197529

  • Brejcha, J., & Cadik, M. (2017). Geopose3k: Mountain landscape dataset for camera pose estimation in outdoor environments. Image and Vision Computing, 66, 1. https://doi.org/10.1016/j.imavis.2017.05.009

    Article  Google Scholar 

  • Brejcha, J., & Čadík, M. (2017). State-of-the-art in visual geo-localization. Pattern Analysis and Applications, 20(3), 613–637.

    Article  MathSciNet  Google Scholar 

  • Brejcha, J., Lukác, M., Chen, Z., DiVerdi, S., & Cadík, M. (2018). Immersive trip reports. In Proceedings of the 31st Annual ACM symposium on user interface software and technology (pp. 389–401). Association for Computing Machinery. https://doi.org/10.1145/3242587.3242653

  • Brejcha, J., Lukáč, M., Hold-Geoffroy, Y., Wang, O., & Cadik, M. (2020). Landscapear: Large scale outdoor augmented reality by matching photographs with terrain models using learned descriptors (pp. 295–312). https://doi.org/10.1007/978-3-030-58526-6_18

  • Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. International conference on learning representations (ICLR).

  • Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., & Shah, R. (1993). Signature verification using a “siamese’’ time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), 7(04), 669–688.

    Article  Google Scholar 

  • Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., & Beijbom, O. (2020). nuscenes: A multimodal dataset for autonomous driving. Institute of Electrical and Electronics Engineers (IEEE)/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 11618–11628).

  • Cai, S., Guo, Y., Khan, S., Hu, J., & Wen, G. (2019). Ground-to-aerial image geolocalization with a hard exemplar reweighting triplet loss. In Proceedings of the institute of electrical and electronics engineers (IEEE)/cvf international conference on computer vision (ICCV).

  • Castaldo, F., Zamir, A., Angst, R., Palmieri, F., & Savarese, S. (2015). Semantic crossview matching. In Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV) workshops.

  • Chaabane, M., Gueguen, L., Trabelsi, A., Beveridge, R., & O’Hara, S. (2021). End-to-end learning improves static object geo-localization from video. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 2063–2072).

  • Chen, D.M., Baatz, G., Köser, K., Tsai, S.S., Vedantham, R., Pylvänäinen, T., & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. Computer vision and pattern recognition (CVPR) (pp. 737–744). https://doi.org/10.1109/CVPR.2011.5995610

  • Chen, W., Liu, Y., Wang, W., Bakker, E., Georgiou, T., Fieguth, P., & Lew, M. (2021). Deep image retrieval: A survey.

  • Chen, Y., Qian, G., Gunda, K., Gupta, H., & Shafique, K. (2015). Camera geolocation from mountain images. In 18th International Conference on Information Fusion (Fusion) (pp. 1587–1596).

  • Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 539–546).

  • Clark, B., Kerrigan, A., Kulkarni, P., Cepeda, V., & Shah, M. (2023). Where we are and what we’re looking at: Query based worldwide image geo-localization using hierarchies and scenes. https://doi.org/10.48550/arXiv.2303.04249

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Costea, D., & Leordeanu, M. (2016). Aerial image geolocalization from recognition and matching of roads and intersections. Richard, E. R. H., Wilson, C., & Smith, W. A. P. (Eds.) Proceedings of the british machine vision conference (bmvc) (pp. 118.1–118.12). BMVA Press. https://doi.org/10.5244/C.30.118

  • Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems (NeurIPS), 26, 2292–2300.

    Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 886–893).

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. Institute of electrical and electronics engineers (ieee) conference on computer vision and pattern recognition (cvpr) (pp. 248–255).

  • Dünser, A., Billinghurst, M., Wen, J., Lehtinen, V., & Nurminen, A. (2012). Exploring the use of handheld AR for outdoor navigation. Computers & Graphics, 36(8), 1084–1095.

    Article  Google Scholar 

  • Fu, C., Xiang, C., Wang, C., & Cai, D. (2019). Fast approximate nearest neighbor search with the navigating spreading-out graph. Proceedings of the VLDB Endowment, 12(5), 461–474. https://doi.org/10.14778/3303753.3303754

    Article  Google Scholar 

  • Gao, X., Shen, S., Hu, Z., & Wang, Z. (2019). Ground and aerial meta-data integration for localization and reconstruction: A review. Pattern Recognition Letters, 127, 202–214. https://doi.org/10.1016/j.patrec.2018.07.036

    Article  Google Scholar 

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR).

  • Girshick, R. (2015). Fast r-cnn. Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (iccv) (pp. 1440–1448).

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems (NeurIPS), 27, 1.

    Google Scholar 

  • Gu, Y., Wang, Y., & Li, Y. (2019). A survey on deep learning-driven remote sensing image scene understanding: Scene classification, scene retrieval and scene-guided object detection. https://doi.org/10.3390/app9102110

  • Haas, L., Alberti, S., & Skreta, M. (2023). Pigeon: Predicting image geolocations.

  • Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 1735–1742).

  • Hakeem, A., Vezzani, R., Shah, M., & Cucchiara, R. (2006). Estimating geospatial trajectory of a moving camera. In 18th International conference on pattern recognition (ICPR) (Vol. 2, pp. 82–87). https://doi.org/10.1109/ICPR.2006.499

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). New York: Cambridge University Press.

    Google Scholar 

  • Hartley, R. I., & Sturm, P. (1997). Triangulation. Computer Vision and Image Understanding, 68(2), 146–157. https://doi.org/10.1006/cviu.1997.0547

    Article  Google Scholar 

  • Hays, J., & Efros, A. (2015). Large-scale image geolocalization. Multimodal Location Estimation of Videos and Images, 1, 41–62. https://doi.org/10.1007/978-3-319-09861-6_3

    Article  Google Scholar 

  • Hays, J., & Efros, A. A. (2008). im2gps: Estimating geographic information from a single image. In Proceedings of the institute of electrical and electronics engineers (ieee) conference on computer vision and pattern recognition (cvpr).

  • Hu, S., Feng, M., Nguyen, R. M., & Lee, G. H. (2018). CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (cvpr) (pp. 7258–7267).

  • Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (cvpr).

  • Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (cvpr) (pp. 3304–3311). https://doi.org/10.1109/CVPR.2010.5540039

  • Kalogerakis, E., Vesselova, O., Hays, J., Efros, A. A., & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In Institute of Electrical and Electronics Engineers (IEEE) 12th International Conference on Computer Vision (ICCV) (pp. 253–260).

  • Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the institute of electrical and electronics engineers (ieee)/cvf conference on computer vision and pattern recognition (cvpr).

  • Kendall, A., & Cipolla, R. (2016). Modelling uncertainty in deep learning for camera relocalization. Institute of electrical and electronics engineers (IEEE) international conference on robotics and automation (ICRA) (pp. 4762–4769). https://doi.org/10.1109/ICRA.2016.7487679

  • Kim, D.-K., & Walter, M. R. (2017). Satellite image-based localization via learned embeddings. In Institute of electrical and electronics engineers (IEEE) international conference on robotics and automation (ICRA) (pp. 2073–2080). https://doi.org/10.1109/ICRA.2017.7989239

  • Kim, H. J., Dunn, E., & Frahm, J.-M. (2015). Predicting good features for image geo-localization using per-bundle vlad. Institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV) (pp. 1170–1178). https://doi.org/10.1109/ICCV.2015.139

  • Kim, H. J., Dunn, E., & Frahm, J.-M. (2017). Learned contextual feature reweighting for image geolocalization. In Institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR) (pp. 3251–3260). https://doi.org/10.1109/CVPR.2017.346

  • Kim, J., Lee, J. K., & Lee, K. M. (2016). Accurate image super-resolution using very deep convolutional networks. 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1646–1654). https://doi.org/10.1109/CVPR.2016.182

  • Knight, P. A. (2008). The Sinkhorn–Knopp algorithm: Convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1), 261–275.

    Article  MathSciNet  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 25, 1097–1105.

    Google Scholar 

  • Krylov, V. A., Kenny, E., & Dahyot, R. (2018). Automatic discovery and geotagging of objects from street view imagery. Remote Sensing, 10(5), 1. https://doi.org/10.3390/rs10050661

    Article  Google Scholar 

  • Lam, D., Kuzma, R., McGee, K., Dooley, S., Laielli, M., Klaric, M. K., & McCord, B. (2018). xview: Objects in context in overhead imagery. ArXiv arXiv:1802.07856.

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 2169–2178). https://doi.org/10.1109/CVPR.2006.68

  • Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  • Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., & Shi, W. (2017). Photo-realistic single image superresolution using a generative adversarial network. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (cvpr).

  • Lin, T.-Y., Belongie, S., & Hays, J. (2013). Crossview image geolocalization. In Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Lin, T.-Y., Cui, Y., Belongie, S., & Hays, J. (2015). Learning deep representations for ground-toaerial geolocalization. In Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2020). Focal loss for dense object detection. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Pattern Analysis and Machine Intelligence (PAMI), 42(2), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  • Liu, L., & Li, H. (2019). Lending orientation to neural networks for cross-view geolocalization. In Proceedings of the institute of electrical and electronics engineers (IEEE)/cvf conference on computer vision and pattern recognition (CVPR).

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  • Lu, X., Li, Z., Cui, Z., Oswald, M.R., Pollefeys, M., & Qin, R. (2020). Geometry-aware satellite-to-ground image synthesis for urban areas. Proceedings of the institute of electrical and electronics engineers (IEEE)/cvf conference on computer vision and pattern recognition (CVPR).

  • Martinson, E., Furlong, B., & Gillies, A. (2021). Training rare object detection in satellite imagery with synthetic gan images. In 2021 institute of electrical and electronics engineers (IEEE)/cvf conference on computer vision and pattern recognition workshops (cvprw) (pp. 2763–2770). https://doi.org/10.1109/CVPRW53098.2021.00311

  • Masone, C., & Caputo, B. (2021). A survey on deep visual place recognition. IEEE Access, 9, 19516–19547. https://doi.org/10.1109/ACCESS.2021.3054937

    Article  Google Scholar 

  • Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767. https://doi.org/10.1016/j.imavis.2004.02.006

    Article  Google Scholar 

  • McManus, C., Churchill, W., Maddern, W., Stewart, A. D., & Newman, P. (2014). Shady dealings: Robust, long-term visual localisation using illumination invariance. Institute of electrical and electronics engineers (IEEE) international conference on robotics and automation (ICRA) (pp. 901–906). https://doi.org/10.1109/ICRA.2014.6906961

  • Mertan, A., Duff, D. J., & Unal, G. (2021). Single image depth estimation: An overview. ArXiv arXiv:2104.06456.

  • Middelberg, S., Sattler, T., Untzelmann, O., & Kobbelt, L. (2014). Scalable 6-dof localization on mobile devices. In Fleet, D., Pajdla, T., Schiele, B., & T. Tuytelaars (Eds.) European conference on computer vision (eccv) (pp. 268–283). Springer.

  • Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.

  • Muller-Budack, E., Pustu-Iren, K., & Ewerth, R. (2018). Geolocation estimation of photos using a hierarchical model and scene classification. Proceedings of the European conference on computer vision (ECCV).

  • Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Müller, R., Wieghardt, J., & Lindinger, C. (2006). Augmented reality navigation systems. Universal Access in the Information Society (UAIS), 4(3), 177–187.

    Article  Google Scholar 

  • Nassar, A. S., D’Aronco, S., Lefèvre, S., Wegner, J. D. (2020). Geograph: graph-based multi-view object detection with geometric cues end-toend. Vedaldi, A., Bischof, H., Brox, T., & Frahm, J.-M. (Eds.) European conference on computer vision (eccv) (pp. 488–504). Springer.

  • Nassar, A. S., Lefevre, S., Wegner, & J. D. (2019). Simultaneous multi-view instance detection with learned geometric soft-constraints. Proceedings of the IEEE/CVF international conference on computer vision (ICCV).

  • Neuhold, G., Ollmann, T., Bulò, S. R., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In Institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV) (pp. 5000–5009). https://doi.org/10.1109/ICCV.2017.534

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision (IJCV), 42(3), 145–175.

    Article  Google Scholar 

  • Pavan, M., & Pelillo, M. (2003). A new graphtheoretic approach to clustering and segmentation. Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. I-I). https://doi.org/10.1109/CVPR.2003.1211348

  • Pavan, M., & Pelillo, M. (2007). Dominant sets and pairwise clustering. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(1), 167–172. https://doi.org/10.1109/TPAMI.2007.250608

    Article  Google Scholar 

  • Pearson, K. (1901). Liii. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572. https://doi.org/10.1080/14786440109462720

    Article  Google Scholar 

  • Piasco, N., Sidibé, D., Demonceaux, C., & Gouet- Brunet, V. (2018). A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognition, 74, 90–109. https://doi.org/10.1016/j.patcog.2017.09.013

    Article  Google Scholar 

  • Pramanick, S., Nowara, E.M., Gleason, J., Castillo, C.D., & Chellappa, R. (2022). Where in the world is this image? Transformer-based geo-localization in the wild. Avidan, S., Brostow, G., Cissé, M., Farinella, G. M. & Hassner, T. (Eds.) Computer vision—ECCV 2022 (pp. 196–215). Springer.

  • Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. Proceedings of the european conference on computer vision (eccv) (pp. 818–833).

  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Meila, M., & Zhang, T. (Eds.) Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, virtual event (Vol. 139, pp. 8748–8763). PMLR. http://proceedings.mlr.press/v139/radford21a.html

  • Regmi, K., & Borji, A. (2018). Cross-view image synthesis using conditional gans. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Regmi, K., & Shah, M. (2019). Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the institute of electrical and electronics engineers (IEEE)/CVF international conference on computer vision (ICCV).

  • Ren, X., Bo, L., & Fox, D. (2012). Rgb-(d) scene labeling: Features and algorithms. Institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR) (pp. 2759–2766).

  • Rodrigues, R., & Tani, M. (2021). Are these from the same place? seeing the unseen in crossview image geo-localization. In Proceedings of the institute of electrical and electronics engineers (IEEE)/CVF winter conference on applications of computer vision (WACV) (pp. 3753–3761).

  • Roshan Zamir, A., Ardeshir, S., & Shah, M. (2014). Gps-tag refinement using random walks with an adaptive damping factor. In Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Santana, L.V., Brandao, A.S., & Sarcinelli-Filho, M. (2015). Outdoor waypoint navigation with the ar. drone quadrotor. International conference on unmanned aircraft systems (ICUAS) (pp. 303–311).

  • Saputra, M. R. U., Markham, A., & Trigoni, N. (2018). Visual slam and structure from motion in dynamic environments. ACM Computing Surveys (CSUR), 51, 1–36.

    Article  Google Scholar 

  • Saurer, O., Baatz, G., Köser, K., Ladický, L., & Pollefeys, M. (2015). Image based geolocalization in the Alps. International Journal of Computer Vision, 116, 1. https://doi.org/10.1007/s11263-015-0830-0

    Article  Google Scholar 

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Gradcam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV).

  • Seo, P. H., Weyand, T., Sim, J., & Han, B. (2018). Cplanet: Enhancing image geolocalization by combinatorial partitioning of maps. In Ferrari, V., Hebert, M., Sminchisescu, C., & Weiss, Y. (Eds.) European conference on computer vision (ECCV) (pp. 544–560). Springer.

  • Shechtman, E., & Irani, M. (2007). Matching local self-similarities across images and videos. Institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR) (pp. 1–8). https://doi.org/10.1109/CVPR.2007.383198

  • Shermeyer, J., & Etten, A. V. (2019). The effects of super-resolution on object detection performance in satellite imagery. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, 1432–1441.

    Google Scholar 

  • Shi, Y., Campbell, D., Yu, X., & Li, H. (2021). Geometry-guided street-view panorama synthesis from satellite imagery. arXiv preprint arXiv:2103.01623.

  • Shi, Y., Liu, L., Yu, X., & Li, H. (2019). Spatial-aware feature aggregation for image based cross-view geo-localization. Advances in Neural Information Processing Systems (NeurIPS), 32, 10090–10100.

    Google Scholar 

  • Shi, Y., Yu, X., Campbell, D., & Li, H. (2020, June). Where am i looking at? Joint location and orientation estimation by cross-view matching. In Proceedings of the institute of electrical and electronics engineers (IEEE)/CVF conference on computer vision and pattern recognition (CVPR).

  • Shi, Y., Yu, X., Liu, L., Zhang, T., & Li, H. (2020). Optimal feature transport for cross-view image geo-localization. Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, 34(07), 11990–11997. https://doi.org/10.1609/aaai.v34i07.6875

    Article  Google Scholar 

  • Shi, Y., Yu, X., Wang, S., & Li, H. (2022). Cvlnet: Cross-view semantic correspondence learning for video-based camera localization. arXiv preprint arXiv:2208.03660.

  • Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. In Proceedings of the 2011 Siggraph Asia Conference. Association for Computing Machinery (ACM). https://doi.org/10.1145/2024156.2024188

  • Sinkhorn, R., & Knopp, P. (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2), 343–348.

    Article  MathSciNet  Google Scholar 

  • Suenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., & Milford, M. (2015). Place recognition with convnet landmarks: Viewpoint-robust, condition-robust, training-free. Hsu, D. (Ed.) Robotics: Science and systems xi (pp. 1–10). Robotics: Science and Systems Conference.

  • Tang, H., Liu, H., Xu, D., Torr, P. H., & Sebe, N. (2021). Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Neural Networks and Learning Systems (TNNLS).

  • Tang, H., Xu, D., Sebe, N.,Wang, Y., Corso, J. J., & Yan, Y. (2019). Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation. Proceedings of the institute of electrical and electronics engineers (IEEE)/CVF conference on computer vision and pattern recognition (CVPR).

  • Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., & Li, L.-J. (2016). Yfcc100m: The new data in multimedia research. Commun. ACM, 59(2), 64–73. https://doi.org/10.1145/2812802

    Article  Google Scholar 

  • Tian, Y., Chen, C., & Shah, M. (2017). Cross-view image matching for geo-localization in urban environments. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR).

  • Toker, A., Zhou, Q., Maximov, M., & Leal-Taixe, L. (2021). Coming down to earth: Satelliteto- street view synthesis for geo-localization. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (cvpr) (pp. 6488–6497).

  • Tomešek, J., Čadík, M., & Brejcha, J. (2022). Crosslocate: Cross-modal large-scale visual geolocalization in natural environments using rendered modalities. In 2022 IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 2193–2202). https://doi.org/10.1109/WACV51458.2022.00225

  • Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., & Pajdla, T. (2015). 24/7 place recognition by view synthesis. In Institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR) (pp. 1808–1817). https://doi.org/10.1109/CVPR.2015.7298790

  • Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jegou, H. (2021). Training data efficient image transformers distillation through attention. International Conference on Machine Learning, 139, 10347–10357.

    Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Guyon, I. et al. (Eds.) Advances in neural information processing systems (Vol. 30). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  • Verde, S., Resek, T., Milani, S., & Rocha, A. (2020). Ground-to-aerial viewpoint localization via landmark graphs matching. Institute of Electrical and Electronics Engineers (IEEE) Signal Processing Letters, 27, 1490–1494. https://doi.org/10.1109/LSP.2020.3017380

    Article  Google Scholar 

  • Vishal, K., Jawahar, C. V., & Chari, V. (2015). Accurate localization by fusing images and GPS signals. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR) workshops.

  • Vo, N., & Hays, J. (2016). Localizing and orienting street views using overhead imagery. Leibe, B., Matas, J., Sebe, N., & Welling, M. (Eds.) European conference on computer vision (ECCV) (pp. 494–509). Springer.

  • Vo, N., Jacobs, N., & Hays, J. (2017). Revisiting im2gps in the deep learning era. In Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV).

  • Vyas, S., Chen, C., & Shah, M. (2022). Gama: Cross-view video geo-localization. Avidan, S., Brostow, G., Cissé, M., Farinella, G. M., & Hassner, T (Eds.) Computer vision—ECCV 2022 (pp. 440–456). Springer.

  • Wang, T., Zheng, Z., Yan, C., Zhang, J., Sun, Y., Zheng, B., & Yang, Y. (2021). Each part matters: Local patterns facilitate cross-view geo-localization. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Circuits and Systems for Video Technology (TCSVT), 1-1. https://doi.org/10.1109/TCSVT.2021.3061265

  • Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., & Change Loy, C. (2018). Esrgan: Enhanced super-resolution generative adversarial networks. Proceedings of the European conference on computer vision (ECCV) workshops

  • Weyand, T., Kostrikov, I., & Philbin, J. (2016). Planet—photo geolocation with convolutional neural networks. In Leibe, B., Matas, J., Sebe, N., & Welling, W. (Eds.) European conference on computer vision (eccv) (pp. 37–55). Springer.

  • Wilson, D., Alshaabi, T., Oort, C. M. V., Zhang, X., Nelson, J., & Wshah, S. (2021). Object tracking and geo-localization from street images. CoRR arXiv:2107.06257.

  • Woo, S., Park, J., Lee, J.-Y., & Kweon, I.S. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV).

  • Workman, S., Souvenir, R., & Jacobs, N. (2015). Wide-area image geolocalization with aerial reference imagery. In Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV).

  • Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., & Zhang, L. (2018). Dota: A large-scale dataset for object detection in aerial images. Proceedings of the institute of electrical and electronics engineers (IEEE) conference on computer vision and pattern recognition (CVPR) (pp. 3974–3983).

  • Xia, H., Zhao, H., & Ding, Z. (2021). Adaptive adversarial network for source-free domain adaptation. Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 9010–9019).

  • Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (p. 3485–3492). https://doi.org/10.1109/CVPR.2010.5539970

  • Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017). Dualgan: Unsupervised dual learning for image-toimage translation. In Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV).

  • You, K., Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2019). Universal domain adaptation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (cvpr).

  • Zamir, A. R., & Shah, M. (2010). Accurate image localization based on google maps street view. In Daniilidis, K., Maragos, P., & Paragios, N. (Eds.) European conference on computer vision (eccv) (pp. 255–268). Springer.

  • Zamir, A. R., & Shah, M. (2014). Image geolocalization based on multiple nearest neighbor feature matching using generalized graphs. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(8), 1546–1558. https://doi.org/10.1109/TPAMI.2014.2299799

    Article  Google Scholar 

  • Zhai, M., Bessinger, Z., Workman, S., & Jacobs, N. (2017). Predicting ground-level scene layout from aerial imagery. In Proceedings of the ieee conference on computer vision and pattern recognition (cvpr).

  • Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. Institute of electrical and electronics engineers (IEEE) computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 2126–2136). https://doi.org/10.1109/CVPR.2006.301

  • Zhang, X., Li, X., Sultani, W., Zhou, Y., & Wshah, S. (2023). Cross-view geo-localization via learning disentangled geometric layout correspondence. In Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3480–3488. https://doi.org/10.1609/aaai.v37i3.25457

  • Zhang, X., Sultani, W., & Wshah, S. (2023). Cross-view image sequence geo-localization. Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 2914–2923).

  • Zheng, L., Yang, Y., & Tian, Q. (2016). Sift meets CNN: A decade survey of instance retrieval. In IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2017.2709749

  • Zheng, Z., Wei, Y., & Yang, Y. (2020). University- 1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th acm international conference on multimedia (p. 1395–1403). Association for Computing Machinery. https://doi.org/10.1145/3394171.3413896

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., & Weinberger, K. Q. (Eds.) Advances in neural information processing systems (neurips) (Vol. 27). Curran Associates, Inc.

  • Zhou, B., Liu, L., Oliva, A., & Torralba, A. (2014). Recognizing city identity via attribute analysis of geo-tagged images. In Fleet, D., Pajdla, T., Schiele, B., & Tuytelaars, T. (Eds.) European conference on computer vision (eccv) (pp. 519–534). Springer.

  • Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the institute of electrical and electronics engineers (IEEE) international conference on computer vision (ICCV).

  • Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017). Toward multimodal image-to-image translation. In Guyon, I. et al. (Eds.) Advances in neural information processing systems (Vol. 30, pp. 465–476). Curran Associates, Inc.

  • Zhu, S., Shah, M., & Chen, C. (2022). Transgeo: Transformer is all you need for cross view image geo-localization. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1162–1171).

  • Zhu, S., Yang, T., & Chen, C. (2021a). Revisiting street-to-aerial view image geo-localization and orientation estimation. In Proceedings of the institute of electrical and electronics engineers (IEEE)/cvf winter conference on applications of computer vision (wacv) (pp. 756–765).

  • Zhu, S., Yang, T., & Chen, C. (2021b). Vigor: Cross-view image geo-localization beyond oneto- one retrieval. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3640–3649).

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Daniel Wilson or Safwan Wshah.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Communicated by Ondra Chum.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We have obtained the copyright for all figures used in this paper by purchasing all the applicable rights from their publishers.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wilson, D., Zhang, X., Sultani, W. et al. Image and Object Geo-Localization. Int J Comput Vis 132, 1350–1392 (2024). https://doi.org/10.1007/s11263-023-01942-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01942-3

Keywords

Navigation