Abstract
Cross-view geo-localization is finding images containing the same geographic target in multi-views. For example, given a query image from UAV view, a proposed matching model can find an exact image of the same location in a gallery collected by satellites. Using a UAV-view image to acquire the true-matched satellite-view image with a geo-tag, the current geographic location of the UAV can be easily localized based on flight records. However, due to the extreme change of viewpoints across platforms, traditional image processing methods have met difficulties matching multi-view images. This paper proposed advanced neural network-based approaches, which applied the attention mechanism to the feature learning process to improve the ability to learn essential features from the input image. A different pooling method was also implemented to increase the global descriptor. Our proposed models have significantly improved accuracy and have achieved competitive results on the University-1652 dataset.
Similar content being viewed by others
References
Kontitsis M, Valavanis KP, Tsourveloudis N (2004) “A uav vision system for airborne surveillance,” In: IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, vol.1, pp.77–83, IEEE
Chen SW, Shivakumar SS, Dcunha S, Das J, Okon E, Qu C, Taylor CJ, Kumar V (2017) Counting apples and oranges with deep learning: A data-driven approach. IEEE Robot Automat Lett 2(2):781–788
Hu S, Feng M, Nguyen RM, Lee GH (2018) “Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7258–7267
Liu L, Li H, Dai Y (2019) “Stochastic attraction-repulsion embedding for large scale image localization,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.2570–2579
Shi Y, Yu X, Liu L, Zhang T, Li H (2020) Optimal feature transport for cross-view image geo-localization. Proc AAAI Conf Artif Intellig 34:11990–11997
Tian Y, Chen C, Shah M (2017) “Cross-view image matching for geo-localization in urban environments,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3608–3616
Lowe DG (1999) Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comp vision 2:1150–1157
Bay H, Tuytelaars T, Gool LV (2006) Surf: Speeded up robust features. In: Priya D (ed) European Conf Comp Vision. Springer, UK, pp 404–417
Liu L, Li H (2019) “Lending orientation to neural networks for cross-view geo-localization,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.5624–5633
Shi Y, Liu L, Yu X, Li H (2019) “Spatial-aware feature aggregation for image based cross-view geo-localization,” Advances in Neural Information Processing Systems, vol.32
Cao Y, Long M, Wang J, Yu PS (2016) “Correlation hashing network for efficient cross-modal retrieval,” arXiv preprint arXiv:1602.06697
Workman S, Souvenir R, Jacobs N (2015) “Wide-area image geolocalization with aerial reference imagery,” In: Proceedings of the IEEE International Conference on Computer Vision, pp.3961–3969
Castaldo F, Zamir A, Angst R, Palmieri F, Savarese S (2015) “Semantic cross-view matching,” In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp.9–17
Senlet T, Elgammal A (2011) “A framework for global vehicle localization using stereo images and satellite and road maps.” In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.2034–2041, IEEE
Lin T-Y, Belongie S, Hays J (2013) “Cross-view image geolocalization.” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.891–898
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comp Vision 115(3):211–252
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Simonyan K, Zisserman A (2014) “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition.” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770–778
Chopra S, Hadsell R, LeCun Y (2005) “Learning a similarity metric discriminatively, with application to face verification.” In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1: 539–546, IEEE
Schroff F, Kalenichenko D, Philbin J (2015) “Facenet: A unified embedding for face recognition and clustering.” In: Proceedings of the IEEE conference on computer vision and pattern recognition, p.815–823
Wen Y, Zhang K, Li Z, Qiao Y (2016) “A discriminative feature learning approach for deep face recognition,” In: European conference on computer vision, pp.499–515, Springer
Melekhov I, Kannala J, Rahtu E (2016) “Siamese network features for image matching,” In: 2016 23rd international conference on pattern recognition (ICPR), p.378–383, IEEE
Cao Q, Ying Y, Li P (2013) “Similarity metric learning for face recognition.” In: Proceedings of the IEEE international conference on computer vision, p.2408–2415
Moutafis P, Leng M, Kakadiaris IA (2016) An overview and empirical comparison of distance metric learning methods. IEEE Transact Cybernet 47(3):612–625
Hoi SC, Liu W, Chang S-F (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transact Multimedia Comp Commun Applicat (TOMM) 6(3):1–26
Lee J-E, Jin R, Jain AK (2008) “Rank-based distance metric learning: An application to image retrieval,” In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1–8, IEEE
Perwaiz N, Fraz MM, Shahzad M (2018) Person re-identification using hybrid representation reinforced by metric learning. IEEE Access 6:77334–77349
Ren C-X, Xu X-L, Lei Z (2019) A deep and structured metric learning method for robust person re-identification. Pattern Recog 96:106995
Vo NN, Hays J (2016) Localizing and orienting street views using overhead imagery. European conference on computer vision. Springer, UK, pp 494–509
Lin T-Y, Cui Y, Belongie S, Hays J (2015) “Learning deep representations for ground-to-aerial geolocalization,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.5007–5015
Chechik G, Sharma V, Shalit U, Bengio S (2010) “Large scale online learning of image similarity through ranking.” Journal of Machine Learning Research 11 (3)
Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Transact Image Proc 27(8):3893–3903
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017), “Attention is all you need,” Advances in neural information processing systems, vol.30
Devlin J, Chang M-W, Lee K, Toutanova K (2018) “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805
Hu D (2019) An introductory survey on attention mechanisms in nlp problems. In: Priya D (ed) Proceedings of SAI intelligent systems conference. Springer, pp 432–448
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2020) et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929
Hu J, Shen L, Sun G (2018) “Squeeze-and-excitation networks,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) “Cbam: Convolutional block attention module.” In: Proceedings of the European conference on computer vision (ECCV), pp.3–19
McDonnell MD, Amblard P-O, Stocks NG (2009) Stochastic pooling networks. J Statist Mech 2009(01):P01012
Zeiler MD, Fergus R (2013) “Stochastic pooling for regularization of deep convolutional neural networks,” arXiv preprint arXiv:1301.3557
Yu D, Wang H, Chen P, Wei Z (2014) Mixed pooling for convolutional neural networks. International conference on rough sets and knowledge technology. Springer, UK, pp 364–375
Chen W, Liu Y, Wang W, Bakker E, Georgiou T, Fieguth P, Liu L, Lew MS (2021) “Deep learning for instance retrieval: A survey,” arXiv preprint arXiv:2101.11282
Babenko A, Lempitsky V (2015) “Aggregating deep convolutional features for image retrieval,” arXiv preprint arXiv:1510.07493
Radenović F, Tolias G, Chum O (2018) Fine-tuning cnn image retrieval with no human annotation. IEEE Transact Pattern Anal Mach Intellig 41(7):1655–1668
Gu Y, Li C, Xie J (2018) “Attention-aware generalized mean pooling for image retrieval,” arXiv preprint arXiv:1811.00202
X.Wu, G.Irie, K.Hiramatsu, and K.Kashino, “Weighted generalized mean pooling for deep image retrieval.” In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp.495–499, IEEE, 2018
Vaccaro F, Bertini M, Uricchio T, Del Bimbo A (2020) “Image retrieval using multi-scale cnn features pooling” In: Proceedings of the 2020 international conference on multimedia retrieval, p.311–315
Deng Y, Lin X, Li R, Ji R (2019) “Multi-scale gem pooling with n-pair center loss for fine-grained image search.” In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp.1000–1005, IEEE
Zheng Z, Wei Y, Yang Y (2020) “University-1652: A multi-view multi-source benchmark for drone-based geo-localization.” In: Proceedings of the 28th ACM international conference on Multimedia, pp.1395–1403
Wang T, Zheng Z, Yan C, Zhang J, Sun Y, Zheng B, Yang Y (2021) Each part matters: local patterns facilitate cross-view geo-localization. IEEE Transact Circuits Syst Video Technol 32(2):867–879
Park J, Woo S, Lee J-Y, Kweon IS (2018) “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514
Qilong W, Banggu W, Pengfei Z, Peihua L, Wangmeng Z, Qinghua H (2020) “Eca-net: Efficient channel attention for deep convolutional neural networks,” In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Acknowledgments
This work was supported by the NEC C&C Foundation Grants for Researchers.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).
About this article
Cite this article
Bui, D.V., Kubo, M. & Sato, H. Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite. Artif Life Robotics 28, 560–570 (2023). https://doi.org/10.1007/s10015-023-00867-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-023-00867-x