research-article

Modern Backbone for Efficient Geo-localization

Authors:

Yujin ZhangAuthors Info & Claims

UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

Pages 31 - 37

https://doi.org/10.1145/3607834.3616562

Published: 29 October 2023 Publication History

Abstract

With the development of autonomous driving technology, vision geo-localization has obtained a consistently growing following. How to match correct image pair from different perspectives is the key technology. Existing geo-localization methods focus on designing complex attention mechanism based on traditional backbone, e.g., VGG, ResNet, but neglect the importance of backbone network. In this article, we propose a modern backbone based geo-localization method (MBEG). MBEG introduces the latest vision fundamental network EVA-02 as backbone, which has been fully trained in large datasets. In addition, the feature rotate encoding strategy is presented to eliminate the effects of image rotation. We also apply the knowledge distillation to squeeze network's parameters for actual application. Our work exhibited excellent performance on the University-1652 dataset, and our solution attained the top-1 ranking in the UAVs in Multimedia Challenge for the University-160k dataset.

References

[1]

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).

[2]

Duc Viet Bui, Masao Kubo, and Hiroshi Sato. 2022. A Part-aware Attention Neural Network for Cross-view Geo-localization between UAV and Satellite. Journal of Robotics, Networking and Artificial Life, Vol. 9, 3 (2022), 275--284.

[3]

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.

[4]

Ming Dai, Jianhong Hu, Jiedong Zhuang, and Enhui Zheng. 2021. A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 7 (2021), 4376--4389.

Digital Library

[5]

Fabian Deuser, Konrad Habel, and Norbert Oswald. 2023. Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation. arXiv preprint arXiv:2303.11851 (2023).

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[7]

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. 2023 a. Eva-02: A visual representation for neon genesis. arXiv preprint arXiv:2303.11331 (2023).

[8]

Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. 2023 b. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19358--19369.

[9]

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision, Vol. 129 (2021), 1789--1819.

Digital Library

[10]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000--16009.

[11]

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, et al. 2022. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10965--10975.

[12]

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. 2022. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12009--12019.

[13]

Sachin Mehta and Mohammad Rastegari. 2022. Separable self-attention for mobile vision transformers. arXiv preprint arXiv:2206.02680 (2022).

[14]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[15]

Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019).

[16]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[17]

Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, and Yi Yang. 2021. Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 2 (2021), 867--879.

[18]

Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, and Chenggang Yan. 2022b. Learning cross-view geo-localization embeddings via dynamic weighted decorrelation regularization. arXiv preprint arXiv:2211.05296 (2022).

[19]

Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, et al. 2022a. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. arXiv preprint arXiv:2208.10442 (2022).

[20]

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. 2023. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14408--14419.

[21]

Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, and Baining Guo. 2022. Contrastive learning rivals masked image modeling in fine-tuning via feature distillation. arXiv preprint arXiv:2205.14141 (2022).

[22]

Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. 2022. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9653--9663.

[23]

Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, and Lucas Beyer. 2022. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12104--12113.

[24]

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. 2022. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022).

[25]

Zhedong Zheng, Yujiao Shi, Tingyu Wang, Jun Liu, Jianwu Fang, Yunchao Wei, and Tat-seng Chua. 2023. UAVM '23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective. In Proceedings of the 31th ACM International Conference on Multimedia Workshop.

[26]

Zhedong Zheng, Yunchao Wei, and Yi Yang. 2020. University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM international conference on Multimedia. 1395--1403.

Digital Library

[27]

Runzhe Zhu, Mingze Yang, Ling Yin, Fei Wu, and Yuncheng Yang. 2023 b. Uav's status is worth considering: A fusion representations matching method for geo-localization. Sensors, Vol. 23, 2 (2023), 720.

[28]

Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu, Yuncheng Yang, and Wenbo Hu. 2023 c. SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across Drone and Satellite. IEEE Transactions on Circuits and Systems for Video Technology (2023), 1--1. https://doi.org/10.1109/TCSVT.2023.3249204

Digital Library

[29]

Sijie Zhu, Mubarak Shah, and Chen Chen. 2022. Transgeo: Transformer is all you need for cross-view image geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1162--1171.

[30]

Yingying Zhu, Hongji Yang, Yuxin Lu, and Qiang Huang. 2023 a. Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization. arXiv preprint arXiv:2302.01572 (2023). io

Cited By

Feng TLi QWang XWang MLi GZhu WZheng ZShi YWang TChen CZhu PHartley R(2024)Multi-weather Cross-view Geo-localization Using Denoising Diffusion ModelsProceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective10.1145/3689095.3689103(35-39)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689095.3689103
Wang XXu YFu XZha ZZheng ZShi YWang TChen CZhu PHartley R(2024)MGAW: An Effective Method for Geo-localization in Adverse WeatherProceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective10.1145/3689095.3689101(19-23)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689095.3689101
Deuser FWerner MHabel KOswald NZheng ZShi YWang TChen CZhu PHartley R(2024)Optimizing Geo-Localization with k-Means Re-Ranking in Challenging Weather ConditionsProceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective10.1145/3689095.3689099(9-13)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689095.3689099
Show More Cited By

Index Terms

Modern Backbone for Efficient Geo-localization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization
UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

Cross-view geo-localization aims to locate the target image of the same geographic location from different viewpoints, which is a challenging task in the field of computer vision. Due to the interference of similar images and the surrounding environment ...
Image and Object Geo-Localization
Abstract
The concept of geo-localization broadly refers to the process of determining an entity’s geographical location, typically in the form of Global Positioning System (GPS) coordinates. The entity of interest may be an image, a sequence of images, a ...
Efficient Distributed Low-Cost Backbone Formation for Wireless Networks

Backbone has been used extensively in various aspects (e.g., routing, route maintenance, broadcast, scheduling) for wireless ad hoc or sensor networks recently. Previous methods are mostly designed to minimize the size of the backbone. However, in many ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

November 2023

86 pages

ISBN:9798400702860

DOI:10.1145/3607834

General Chairs:
Zhedong Zheng
National University of Singapore, Singapore
,
Yujiao Shi
The Australian National University, Australia
,
Tingyu Wang
Hangzhou Dianzi University, China
,
Jun Liu
Singapore University of Technology and Design, Singapore
,
Jianwu Fang
Chang'an University, China
,
Yunchao Wei
Beijing Jiaotong University, China
,
Tat-seng Chua
National University of Singapore, Singapore

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Science and Technology Commission of Shanghai Municipality

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

November 2, 2023

Ottawa ON, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
210
Total Downloads

Downloads (Last 12 months)123
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng TLi QWang XWang MLi GZhu WZheng ZShi YWang TChen CZhu PHartley R(2024)Multi-weather Cross-view Geo-localization Using Denoising Diffusion ModelsProceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective10.1145/3689095.3689103(35-39)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689095.3689103
Wang XXu YFu XZha ZZheng ZShi YWang TChen CZhu PHartley R(2024)MGAW: An Effective Method for Geo-localization in Adverse WeatherProceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective10.1145/3689095.3689101(19-23)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689095.3689101
Deuser FWerner MHabel KOswald NZheng ZShi YWang TChen CZhu PHartley R(2024)Optimizing Geo-Localization with k-Means Re-Ranking in Challenging Weather ConditionsProceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective10.1145/3689095.3689099(9-13)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689095.3689099
Wang TYang ZChen QSun YYan C(2024)Rethinking Pooling for Multi-Granularity Features in Aerial-View Geo-LocalizationIEEE Signal Processing Letters10.1109/LSP.2024.348433031(3005-3009)Online publication date: 2024
https://doi.org/10.1109/LSP.2024.3484330
Berton GStoken ACaputo BMasone C(2024)EarthLoc: Astronaut Photography Localization by Indexing Earth from Space2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01212(12754-12764)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01212

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten