skip to main content
10.1145/3607834.3616572acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization

Published:29 October 2023Publication History

ABSTRACT

Cross-view geo-localization aims to locate the target image of the same geographic location from different viewpoints, which is a challenging task in the field of computer vision. Due to the interference of similar images and the surrounding environment of the target building, the matching accuracy is significantly reduced when facing complex scenes. To solve this problem, we propose a cross-view geo-localization method based on dual-branch pattern and multi-scale context to provide a solution for challenging dataset with numerous distractors. This method exploits a Transformer feature extraction network to reduce the loss of fine-grained features. Meanwhile, a dual-branch structure is designed to capture image semantic information and local context information bidirectionally, which can effectively deal with the problem of more interference items in satellite images and improve the accuracy of geographic location tasks in complex scenes. After quantitative experimental verification, both recall rate (Recall) and image retrieval average precision (AP) indicators have been significantly improved on benchmark dataset University-1652 and challenging dataset University-160K, our method can achieve advanced cross-view geo-localization performance.

References

  1. Khawaja Tehseen Ahmed, Shahida Ummesafi, and Amjad Iqbal. 2019. Content based image retrieval using image features information fusion. Information Fusion , Vol. 51 (2019), 76--99. https://doi.org/10.1016/j.inffus.2018.11.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hritam Basak, Rohit Kundu, Pawan Kumar Singh, Muhammad Fazal Ijaz, Marcin Wo'zniak, and Ram Sarkar. 2022. A union of deep learning and swarm-based optimization for 3D human action recognition. Scientific Reports, Vol. 12, 1 (2022), 5494. https://doi.org/10.1038/s41598-022-09293--8Google ScholarGoogle ScholarCross RefCross Ref
  3. Francesco Castaldo, Amir Zamir, Roland Angst, Francesco Palmieri, and Silvio Savarese. 2015. Semantic cross-view matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE Computer Society, 9--17. https://doi.org/10.1109/ICCVW.2015.137Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ming Dai, Jianhong Hu, Jiedong Zhuang, and Enhui Zheng. 2021. A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 7 (2021), 4376--4389. https://doi.org/10.1109/TCSVT.2021.3135013Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lirong Ding, Ji Zhou, Lingxuan Meng, and Zhiyong Long. 2020. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sensing, Vol. 13, 1 (2020), 47. https://doi.org/10.3390/rs13010047Google ScholarGoogle ScholarCross RefCross Ref
  6. Yalda Ghasemi, Heejin Jeong, Sung Ho Choi, Kyeong-Beom Park, and Jae Yeol Lee. 2022. Deep learning-based object detection in augmented reality: A systematic review. Computers in Industry , Vol. 139 (2022), 103661. https://doi.org/10.1016/j.compind.2022.103661Google ScholarGoogle ScholarCross RefCross Ref
  7. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770--778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  8. Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, and Nicu Sebe. 2022. Joint representation learning and keypoint detection for cross-view geo-localization. IEEE Transactions on Image Processing , Vol. 31 (2022), 3780--3792. https://doi.org/10.1109/TIP.2022.3175601Google ScholarGoogle ScholarCross RefCross Ref
  9. Tsung-Yi Lin, Serge Belongie, and James Hays. 2013. Cross-view image geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 891--898. https://doi.org/10.1109/CVPR.2013.120Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tsung-Yi Lin, Yin Cui, Serge Belongie, and James Hays. 2015. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 5007--5015. https://doi.org/10.1109/CVPR.2015.7299135Google ScholarGoogle ScholarCross RefCross Ref
  11. Liu Liu and Hongdong Li. 2019. Lending orientation to neural networks for cross-view geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 5624--5633. https://doi.org/10.1109/CVPR.2019.00577Google ScholarGoogle ScholarCross RefCross Ref
  12. Zifei Luo, Wenzhu Yang, Yunfeng Yuan, Ruru Gou, and Xiaonan Li. 2023. Semantic segmentation of agricultural images: a survey. Information Processing in Agriculture (2023). https://doi.org/10.1016/j.inpa.2023.02.001Google ScholarGoogle ScholarCross RefCross Ref
  13. Yujian Mo, Yan Wu, Xinneng Yang, Feilin Liu, and Yujun Liao. 2022. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing , Vol. 493 (2022), 626--646. https://doi.org/10.1016/j.neucom.2022.01.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vipul Narayan, Pawan Kumar Mall, Shashank Awasthi, Swapnita Srivastava, and Anurag Gupta. 2023. FuzzyNet: Medical Image Classification based on GLCM Texture Feature. In 2023 International Conference on Artificial Intelligence and Smart Communication (AISC). IEEE, 769--773.Google ScholarGoogle ScholarCross RefCross Ref
  15. Fatma Outay, Hanan Abdullah Mengash, and Muhammad Adnan. 2020. Applications of unmanned aerial vehicle (UAV) in road safety, traffic and highway infrastructure management: Recent advances and challenges. Transportation Research Part A: Policy and Practice , Vol. 141 (2020), 116--129. https://doi.org/10.1016/j.tra.2020.09.018Google ScholarGoogle ScholarCross RefCross Ref
  16. Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 470--479. https://doi.org/10.1109/ICCV.2019.00056Google ScholarGoogle ScholarCross RefCross Ref
  17. Royston Rodrigues and Masahiro Tani. 2021. Are these from the same place? seeing the unseen in cross-view image geo-localization. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 3753--3761. https://doi.org/10.1109/WACV48630.2021.00380Google ScholarGoogle ScholarCross RefCross Ref
  18. R Rani Saritha, Varghese Paul, and P Ganesh Kumar. 2019. Content based image retrieval using deep learning process. Cluster Computing , Vol. 22 (2019), 4187--4200. https://doi.org/10.1007/s10586-018--1731-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Olivier Saurer, Georges Baatz, Kevin Köser, L'ubor Ladickỳ, and Marc Pollefeys. 2016. Image based geo-localization in the alps. International Journal of Computer Vision , Vol. 116, 3 (2016), 213--225. https://doi.org/10.1007/s11263-015-0830-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-aware feature aggregation for image based cross-view geo-localization. Advances in Neural Information Processing Systems , Vol. 32 (2019), 10090--10100.Google ScholarGoogle Scholar
  21. Yujiao Shi, Xin Yu, Dylan Campbell, and Hongdong Li. 2020a. Where am i looking at? joint location and orientation estimation by cross-view matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 4064--4072. https://doi.org/10.1109/CVPR42600.2020.00412Google ScholarGoogle ScholarCross RefCross Ref
  22. Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, and Hongdong Li. 2020b. Optimal feature transport for cross-view image geo-localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. AAAI Press, 11990--11997. https://doi.org/10.48550/arXiv.1907.05021Google ScholarGoogle ScholarCross RefCross Ref
  23. Jing Sun, Rui Yan, Bing Zhang, Bing Zhu, and Fuming Sun. 2023. A cross-view geo-localization method guided by relation-aware global attention. Multimedia Systems, Vol. 29, 4 (2023), 2205--2216. https://doi.org/10.1007/s00530-023-01101--1Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xiaoyang Tian, Jie Shao, Deqiang Ouyang, and Heng Tao Shen. 2021. UAV-satellite view synthesis for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 7 (2021), 4804--4815. https://doi.org/10.1109/TCSVT.2021.3121987Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Aysim Toker, Qunjie Zhou, Maxim Maximov, and Laura Leal-Taixé. 2021. Coming down to earth: Satellite-to-street view synthesis for geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 6488--6497. https://doi.org/10.1109/CVPR46437.2021.00642Google ScholarGoogle ScholarCross RefCross Ref
  26. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems , Vol. 30 (2017). https://doi.org/10.48550/arXiv.1706.03762Google ScholarGoogle ScholarCross RefCross Ref
  27. Pin Wang, En Fan, and Peng Wang. 2021a. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognition Letters , Vol. 141 (2021), 61--67. https://doi.org/10.1016/j.patrec.2020.07.042Google ScholarGoogle ScholarCross RefCross Ref
  28. Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, and Yi Yang. 2021b. Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 2 (2021), 867--879. https://doi.org/10.1109/TCSVT.2021.3061265Google ScholarGoogle ScholarCross RefCross Ref
  29. Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, and Chenggang Yan. 2022b. Learning cross-view geo-localization embeddings via dynamic weighted decorrelation regularization. arXiv preprint arXiv:2211.05296 (2022). https://doi.org/10.48550/arXiv.2211.05296Google ScholarGoogle ScholarCross RefCross Ref
  30. Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2022a. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media , Vol. 8, 3 (2022), 415--424. https://doi.org/10.1007/s41095-022-0274--8Google ScholarGoogle ScholarCross RefCross Ref
  31. Scott Workman, Richard Souvenir, and Nathan Jacobs. 2015. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision. IEEE Computer Society, 3961--3969. https://doi.org/10.1109/ICCV.2015.451Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, and Yu-Gang Jiang. 2023. Svformer: Semi-supervised video transformer for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18816--18826. https://doi.org/10.48550/arXiv.2211.13222Google ScholarGoogle ScholarCross RefCross Ref
  33. Hongji Yang, Xiufan Lu, and Yingying Zhu. 2021. Cross-view geo-localization with evolving transformer. arXiv preprint arXiv:2107.00842 , Vol. abs/2107.00842 (2021). https://doi.org/10.48550/arXiv.2107.00842Google ScholarGoogle ScholarCross RefCross Ref
  34. Menghua Zhai, Zachary Bessinger, Scott Workman, and Nathan Jacobs. 2017. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 867--875. https://doi.org/10.1109/CVPR.2017.440Google ScholarGoogle ScholarCross RefCross Ref
  35. Dan Zhang, Mao Ye, Yiguang Liu, Lin Xiong, and Lihua Zhou. 2022. Multi-source unsupervised domain adaptation for object detection. Information Fusion , Vol. 78 (2022), 138--148. https://doi.org/10.1016/j.inffus.2021.09.011Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhedong Zheng, Yujiao Shi, Tingyu Wang, Jun Liu, Jianwu Fang, Yunchao Wei, and Tat-seng Chua. 2023. UAVs in Multimedia: Capturing the World from a New Perspective. In Proceedings of the 31th ACM International Conference on Multimedia Workshop.Google ScholarGoogle Scholar
  37. Zhedong Zheng, Yunchao Wei, and Yi Yang. 2020. University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, 1395--1403. https://doi.org/10.1145/3394171.3413896Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jiedong Zhuang, Ming Dai, Xuruoyan Chen, and Enhui Zheng. 2021. A faster and more effective cross-view matching method of uav and satellite images for uav geolocalization. Remote Sensing, Vol. 13, 19 (2021), 3979. https://doi.org/10.3390/rs13193979 ioGoogle ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective
          November 2023
          86 pages
          ISBN:9798400702860
          DOI:10.1145/3607834

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 October 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia
        • Article Metrics

          • Downloads (Last 12 months)60
          • Downloads (Last 6 weeks)13

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader