Abstract
Cross-view geolocalization identifies the geographic location of street view images by matching them with a georeferenced satellite database. Significant challenges arise due to the drastic appearance and geometry differences between views. In this paper, we propose a new approach for cross-view image geo-localization, i.e., the Panorama-BEV Co-Retrieval Network. Specifically, by utilizing the ground plane assumption and geometric relations, we convert street view panorama images into the BEV view, reducing the gap between street panoramas and satellite imagery. In the existing retrieval of street view panorama images and satellite images, we introduce BEV and satellite image retrieval branches for collaborative retrieval. By retaining the original street view retrieval branch, we overcome the limited perception range issue of BEV representation. Our network enables comprehensive perception of both the global layout and local details around the street view capture locations. Additionally, we introduce CVGlobal, a global cross-view dataset that is closer to real-world scenarios. This dataset adopts a more realistic setup, with street view directions not aligned with satellite images. CVGlobal also includes cross-regional, cross-temporal, and street view to map retrieval tests, enabling a comprehensive evaluation of algorithm performance. Our method excels in multiple tests on common cross-view datasets such as CVUSA, CVACT, VIGOR, and our newly introduced CVGlobal, surpassing the current state-of-the-art approaches. The code and datasets can be found at https://github.com/yejy53/EP-BEV.
This work was partially done during the internship at Shanghai AI Lab.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bansal, M., Sawhney, H.S., Cheng, H., Daniilidis, K.: Geo-localization of street views with aerial image databases. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1125–1128 (2011)
Cai, S., Guo, Y., Khan, S., Hu, J., Wen, G.: Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8391–8400 (2019)
Deuser, F., Habel, K., Oswald, N.: Sample4geo: hard negative sampling for cross-view geo-localisation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16847–16856 (2023)
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Hu, S., Feng, M., Nguyen, R.M., Lee, G.H.: CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7258–7267 (2018)
Hu, S., Lee, G.H.: Image-based geo-localization using satellite imagery. Int. J. Comput. Vision 128(5), 1205–1219 (2020)
Lin, T.Y., Belongie, S., Hays, J.: Cross-view image geolocalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2013)
Liu, L., Li, H.: Lending orientation to neural networks for cross-view geo-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5624–5633 (2019)
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Pan, B., Sun, J., Leung, H.Y.T., Andonian, A., Zhou, B.: Cross-view semantic segmentation for sensing surroundings. IEEE Robot. Autom. Lett. 5(3), 4867–4873 (2020)
Peng, L., Chen, Z., Fu, Z., Liang, P., Cheng, E.: Bevsegformer: bird’s eye view semantic segmentation from arbitrary camera rigs. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5935–5943 (2023)
Reiher, L., Lampe, B., Eckstein, L.: A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–7. IEEE (2020)
Sarlin, P.E., et al.: Orienternet: visual localization in 2D public maps with neural matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21632–21642 (2023)
Sarlin, P.E., Trulls, E., Pollefeys, M., Hosang, J., Lynen, S.: Snap: self-supervised neural maps for visual positioning and semantic understanding. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Shi, Y., Liu, L., Yu, X., Li, H.: Spatial-aware feature aggregation for image based cross-view geo-localization. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Shi, Y., Wu, F., Perincherry, A., Vora, A., Li, H.: Boosting 3-DoF ground-to-satellite camera localization accuracy via geometry-guided cross-view transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21516–21526 (2023)
Shi, Y., Yu, X., Campbell, D., Li, H.: Where am i looking at? Joint location and orientation estimation by cross-view matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4064–4072 (2020)
Shi, Y., Yu, X., Liu, L., Campbell, D., Koniusz, P., Li, H.: Accurate 3-DoF camera geo-localization via ground-to-satellite image matching. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 2682–2697 (2022)
Shi, Y., Yu, X., Liu, L., Zhang, T., Li, H.: Optimal feature transport for cross-view image geo-localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11990–11997 (2020)
Thoma, J., Paudel, D.P., Chhatkuli, A., Probst, T., Gool, L.V.: Mapping, localization and path planning for image-based navigation using visual features and map. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7383–7391 (2019)
Tian, Y., Chen, C., Shah, M.: Cross-view image matching for geo-localization in urban environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3608–3616 (2017)
Toker, A., Zhou, Q., Maximov, M., Leal-Taixé, L.: Coming down to earth: satellite-to-street view synthesis for geo-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6488–6497 (2021)
Vo, N.N., Hays, J.: Localizing and orienting street views using overhead imagery. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 494–509. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_30
Wang, T., et al.: Each part matters: local patterns facilitate cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 32(2), 867–879 (2021)
Wang, X., Xu, R., Cui, Z., Wan, Z., Zhang, Y.: Fine-grained cross-view geo-localization using a correlation-aware homography estimator. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3961–3969 (2015)
Yang, H., Lu, X., Zhu, Y.: Cross-view geo-localization with layer-to-layer transformer. Adv. Neural. Inf. Process. Syst. 34, 29009–29020 (2021)
Ye, J., et al.: SG-BEV: satellite-guided bev fusion for cross-view semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27748–27757 (2024)
Zhai, M., Bessinger, Z., Workman, S., Jacobs, N.: Predicting ground-level scene layout from aerial imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 867–875 (2017)
Zhang, X., Li, X., Sultani, W., Zhou, Y., Wshah, S.: Cross-view geo-localization via learning disentangled geometric layout correspondence. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3480–3488 (2023)
Zheng, Z., Wei, Y., Yang, Y.: University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1395–1403 (2020)
Zhu, S., Shah, M., Chen, C.: Transgeo: transformer is all you need for cross-view image geo-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1162–1171 (2022)
Zhu, S., Yang, T., Chen, C.: Vigor: cross-view image geo-localization beyond one-to-one retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2021)
Zhu, Y., Chen, S., Lu, X., Chen, J.: Cross-view image synthesis from a single image with progressive parallel GAN. IEEE Trans. Geosci. Remote Sens. (2023)
Zhu, Y., Yang, H., Lu, Y., Huang, Q.: Simple, effective and general: a new backbone for cross-view image geo-localization. arXiv preprint arXiv:2302.01572 (2023)
Acknowledgements
This project was funded in part by National Natural Science Foundation of China (Grant No. 42201358) and Shanghai AI Lab.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ye, J. et al. (2025). Cross-View Image Geo-Localization with Panorama-BEV Co-retrieval Network. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15095. Springer, Cham. https://doi.org/10.1007/978-3-031-72913-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-72913-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72912-6
Online ISBN: 978-3-031-72913-3
eBook Packages: Computer ScienceComputer Science (R0)