ABSTRACT
Cross-view geo-localization aims to locate the target image of the same geographic location from different viewpoints, which is a challenging task in the field of computer vision. Due to the interference of similar images and the surrounding environment of the target building, the matching accuracy is significantly reduced when facing complex scenes. To solve this problem, we propose a cross-view geo-localization method based on dual-branch pattern and multi-scale context to provide a solution for challenging dataset with numerous distractors. This method exploits a Transformer feature extraction network to reduce the loss of fine-grained features. Meanwhile, a dual-branch structure is designed to capture image semantic information and local context information bidirectionally, which can effectively deal with the problem of more interference items in satellite images and improve the accuracy of geographic location tasks in complex scenes. After quantitative experimental verification, both recall rate (Recall) and image retrieval average precision (AP) indicators have been significantly improved on benchmark dataset University-1652 and challenging dataset University-160K, our method can achieve advanced cross-view geo-localization performance.
- Khawaja Tehseen Ahmed, Shahida Ummesafi, and Amjad Iqbal. 2019. Content based image retrieval using image features information fusion. Information Fusion , Vol. 51 (2019), 76--99. https://doi.org/10.1016/j.inffus.2018.11.004Google ScholarDigital Library
- Hritam Basak, Rohit Kundu, Pawan Kumar Singh, Muhammad Fazal Ijaz, Marcin Wo'zniak, and Ram Sarkar. 2022. A union of deep learning and swarm-based optimization for 3D human action recognition. Scientific Reports, Vol. 12, 1 (2022), 5494. https://doi.org/10.1038/s41598-022-09293--8Google ScholarCross Ref
- Francesco Castaldo, Amir Zamir, Roland Angst, Francesco Palmieri, and Silvio Savarese. 2015. Semantic cross-view matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE Computer Society, 9--17. https://doi.org/10.1109/ICCVW.2015.137Google ScholarDigital Library
- Ming Dai, Jianhong Hu, Jiedong Zhuang, and Enhui Zheng. 2021. A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 7 (2021), 4376--4389. https://doi.org/10.1109/TCSVT.2021.3135013Google ScholarDigital Library
- Lirong Ding, Ji Zhou, Lingxuan Meng, and Zhiyong Long. 2020. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sensing, Vol. 13, 1 (2020), 47. https://doi.org/10.3390/rs13010047Google ScholarCross Ref
- Yalda Ghasemi, Heejin Jeong, Sung Ho Choi, Kyeong-Beom Park, and Jae Yeol Lee. 2022. Deep learning-based object detection in augmented reality: A systematic review. Computers in Industry , Vol. 139 (2022), 103661. https://doi.org/10.1016/j.compind.2022.103661Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770--778. https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
- Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, and Nicu Sebe. 2022. Joint representation learning and keypoint detection for cross-view geo-localization. IEEE Transactions on Image Processing , Vol. 31 (2022), 3780--3792. https://doi.org/10.1109/TIP.2022.3175601Google ScholarCross Ref
- Tsung-Yi Lin, Serge Belongie, and James Hays. 2013. Cross-view image geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 891--898. https://doi.org/10.1109/CVPR.2013.120Google ScholarDigital Library
- Tsung-Yi Lin, Yin Cui, Serge Belongie, and James Hays. 2015. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 5007--5015. https://doi.org/10.1109/CVPR.2015.7299135Google ScholarCross Ref
- Liu Liu and Hongdong Li. 2019. Lending orientation to neural networks for cross-view geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 5624--5633. https://doi.org/10.1109/CVPR.2019.00577Google ScholarCross Ref
- Zifei Luo, Wenzhu Yang, Yunfeng Yuan, Ruru Gou, and Xiaonan Li. 2023. Semantic segmentation of agricultural images: a survey. Information Processing in Agriculture (2023). https://doi.org/10.1016/j.inpa.2023.02.001Google ScholarCross Ref
- Yujian Mo, Yan Wu, Xinneng Yang, Feilin Liu, and Yujun Liao. 2022. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing , Vol. 493 (2022), 626--646. https://doi.org/10.1016/j.neucom.2022.01.005Google ScholarDigital Library
- Vipul Narayan, Pawan Kumar Mall, Shashank Awasthi, Swapnita Srivastava, and Anurag Gupta. 2023. FuzzyNet: Medical Image Classification based on GLCM Texture Feature. In 2023 International Conference on Artificial Intelligence and Smart Communication (AISC). IEEE, 769--773.Google ScholarCross Ref
- Fatma Outay, Hanan Abdullah Mengash, and Muhammad Adnan. 2020. Applications of unmanned aerial vehicle (UAV) in road safety, traffic and highway infrastructure management: Recent advances and challenges. Transportation Research Part A: Policy and Practice , Vol. 141 (2020), 116--129. https://doi.org/10.1016/j.tra.2020.09.018Google ScholarCross Ref
- Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 470--479. https://doi.org/10.1109/ICCV.2019.00056Google ScholarCross Ref
- Royston Rodrigues and Masahiro Tani. 2021. Are these from the same place? seeing the unseen in cross-view image geo-localization. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 3753--3761. https://doi.org/10.1109/WACV48630.2021.00380Google ScholarCross Ref
- R Rani Saritha, Varghese Paul, and P Ganesh Kumar. 2019. Content based image retrieval using deep learning process. Cluster Computing , Vol. 22 (2019), 4187--4200. https://doi.org/10.1007/s10586-018--1731-0Google ScholarDigital Library
- Olivier Saurer, Georges Baatz, Kevin Köser, L'ubor Ladickỳ, and Marc Pollefeys. 2016. Image based geo-localization in the alps. International Journal of Computer Vision , Vol. 116, 3 (2016), 213--225. https://doi.org/10.1007/s11263-015-0830-0Google ScholarDigital Library
- Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-aware feature aggregation for image based cross-view geo-localization. Advances in Neural Information Processing Systems , Vol. 32 (2019), 10090--10100.Google Scholar
- Yujiao Shi, Xin Yu, Dylan Campbell, and Hongdong Li. 2020a. Where am i looking at? joint location and orientation estimation by cross-view matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 4064--4072. https://doi.org/10.1109/CVPR42600.2020.00412Google ScholarCross Ref
- Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, and Hongdong Li. 2020b. Optimal feature transport for cross-view image geo-localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. AAAI Press, 11990--11997. https://doi.org/10.48550/arXiv.1907.05021Google ScholarCross Ref
- Jing Sun, Rui Yan, Bing Zhang, Bing Zhu, and Fuming Sun. 2023. A cross-view geo-localization method guided by relation-aware global attention. Multimedia Systems, Vol. 29, 4 (2023), 2205--2216. https://doi.org/10.1007/s00530-023-01101--1Google ScholarDigital Library
- Xiaoyang Tian, Jie Shao, Deqiang Ouyang, and Heng Tao Shen. 2021. UAV-satellite view synthesis for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 7 (2021), 4804--4815. https://doi.org/10.1109/TCSVT.2021.3121987Google ScholarDigital Library
- Aysim Toker, Qunjie Zhou, Maxim Maximov, and Laura Leal-Taixé. 2021. Coming down to earth: Satellite-to-street view synthesis for geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation, 6488--6497. https://doi.org/10.1109/CVPR46437.2021.00642Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems , Vol. 30 (2017). https://doi.org/10.48550/arXiv.1706.03762Google ScholarCross Ref
- Pin Wang, En Fan, and Peng Wang. 2021a. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognition Letters , Vol. 141 (2021), 61--67. https://doi.org/10.1016/j.patrec.2020.07.042Google ScholarCross Ref
- Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, and Yi Yang. 2021b. Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 2 (2021), 867--879. https://doi.org/10.1109/TCSVT.2021.3061265Google ScholarCross Ref
- Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, and Chenggang Yan. 2022b. Learning cross-view geo-localization embeddings via dynamic weighted decorrelation regularization. arXiv preprint arXiv:2211.05296 (2022). https://doi.org/10.48550/arXiv.2211.05296Google ScholarCross Ref
- Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2022a. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media , Vol. 8, 3 (2022), 415--424. https://doi.org/10.1007/s41095-022-0274--8Google ScholarCross Ref
- Scott Workman, Richard Souvenir, and Nathan Jacobs. 2015. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision. IEEE Computer Society, 3961--3969. https://doi.org/10.1109/ICCV.2015.451Google ScholarDigital Library
- Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, and Yu-Gang Jiang. 2023. Svformer: Semi-supervised video transformer for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18816--18826. https://doi.org/10.48550/arXiv.2211.13222Google ScholarCross Ref
- Hongji Yang, Xiufan Lu, and Yingying Zhu. 2021. Cross-view geo-localization with evolving transformer. arXiv preprint arXiv:2107.00842 , Vol. abs/2107.00842 (2021). https://doi.org/10.48550/arXiv.2107.00842Google ScholarCross Ref
- Menghua Zhai, Zachary Bessinger, Scott Workman, and Nathan Jacobs. 2017. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 867--875. https://doi.org/10.1109/CVPR.2017.440Google ScholarCross Ref
- Dan Zhang, Mao Ye, Yiguang Liu, Lin Xiong, and Lihua Zhou. 2022. Multi-source unsupervised domain adaptation for object detection. Information Fusion , Vol. 78 (2022), 138--148. https://doi.org/10.1016/j.inffus.2021.09.011Google ScholarDigital Library
- Zhedong Zheng, Yujiao Shi, Tingyu Wang, Jun Liu, Jianwu Fang, Yunchao Wei, and Tat-seng Chua. 2023. UAVs in Multimedia: Capturing the World from a New Perspective. In Proceedings of the 31th ACM International Conference on Multimedia Workshop.Google Scholar
- Zhedong Zheng, Yunchao Wei, and Yi Yang. 2020. University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, 1395--1403. https://doi.org/10.1145/3394171.3413896Google ScholarDigital Library
- Jiedong Zhuang, Ming Dai, Xuruoyan Chen, and Enhui Zheng. 2021. A faster and more effective cross-view matching method of uav and satellite images for uav geolocalization. Remote Sensing, Vol. 13, 19 (2021), 3979. https://doi.org/10.3390/rs13193979 ioGoogle ScholarCross Ref
Index Terms
- Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization
Recommendations
AFPN: Attention-guided Feature Partition Network for Cross-view Geo-localization
UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New PerspectiveCross-view geo-localization is to retrieve images of the same geographic target from different platforms. Since drones have received increasing attention in recent years because of their ability to capture high-quality multimedia data from the sky, we ...
Image and Object Geo-Localization
AbstractThe concept of geo-localization broadly refers to the process of determining an entity’s geographical location, typically in the form of Global Positioning System (GPS) coordinates. The entity of interest may be an image, a sequence of images, a ...
Learning discriminative representations via variational self-distillation for cross-view geo-localization
AbstractCross-view geo-localization is to localize the same geographic target in images from different perspectives, e.g., satellite-view and drone-view. The primary challenge faced by existing methods is the large visual appearance changes ...
Highlights- Variational self-distillation is used for cross-view geo-localization.
- Square-...
Comments