Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite

Bui, Duc Viet; Kubo, Masao; Sato, Hiroshi

doi:10.1007/s10015-023-00867-x

Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite

Original Article
Published: 15 April 2023

Volume 28, pages 560–570, (2023)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Duc Viet Bui¹,
Masao Kubo¹ &
Hiroshi Sato¹

261 Accesses
1 Citation
Explore all metrics

Abstract

Cross-view geo-localization is finding images containing the same geographic target in multi-views. For example, given a query image from UAV view, a proposed matching model can find an exact image of the same location in a gallery collected by satellites. Using a UAV-view image to acquire the true-matched satellite-view image with a geo-tag, the current geographic location of the UAV can be easily localized based on flight records. However, due to the extreme change of viewpoints across platforms, traditional image processing methods have met difficulties matching multi-view images. This paper proposed advanced neural network-based approaches, which applied the attention mechanism to the feature learning process to improve the ability to learn essential features from the input image. A different pooling method was also implemented to increase the global descriptor. Our proposed models have significantly improved accuracy and have achieved competitive results on the University-1652 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UAV-Satellite Cross-View Image Matching Based on Siamese Network

USuperGlue: an unsupervised UAV image matching network based on local self-attention

Article 15 August 2023

RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification

References

Kontitsis M, Valavanis KP, Tsourveloudis N (2004) “A uav vision system for airborne surveillance,” In: IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, vol.1, pp.77–83, IEEE
Chen SW, Shivakumar SS, Dcunha S, Das J, Okon E, Qu C, Taylor CJ, Kumar V (2017) Counting apples and oranges with deep learning: A data-driven approach. IEEE Robot Automat Lett 2(2):781–788
Article Google Scholar
Hu S, Feng M, Nguyen RM, Lee GH (2018) “Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7258–7267
Liu L, Li H, Dai Y (2019) “Stochastic attraction-repulsion embedding for large scale image localization,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.2570–2579
Shi Y, Yu X, Liu L, Zhang T, Li H (2020) Optimal feature transport for cross-view image geo-localization. Proc AAAI Conf Artif Intellig 34:11990–11997
Google Scholar
Tian Y, Chen C, Shah M (2017) “Cross-view image matching for geo-localization in urban environments,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3608–3616
Lowe DG (1999) Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comp vision 2:1150–1157
Article Google Scholar
Bay H, Tuytelaars T, Gool LV (2006) Surf: Speeded up robust features. In: Priya D (ed) European Conf Comp Vision. Springer, UK, pp 404–417
Google Scholar
Liu L, Li H (2019) “Lending orientation to neural networks for cross-view geo-localization,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.5624–5633
Shi Y, Liu L, Yu X, Li H (2019) “Spatial-aware feature aggregation for image based cross-view geo-localization,” Advances in Neural Information Processing Systems, vol.32
Cao Y, Long M, Wang J, Yu PS (2016) “Correlation hashing network for efficient cross-modal retrieval,” arXiv preprint arXiv:1602.06697
Workman S, Souvenir R, Jacobs N (2015) “Wide-area image geolocalization with aerial reference imagery,” In: Proceedings of the IEEE International Conference on Computer Vision, pp.3961–3969
Castaldo F, Zamir A, Angst R, Palmieri F, Savarese S (2015) “Semantic cross-view matching,” In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp.9–17
Senlet T, Elgammal A (2011) “A framework for global vehicle localization using stereo images and satellite and road maps.” In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.2034–2041, IEEE
Lin T-Y, Belongie S, Hays J (2013) “Cross-view image geolocalization.” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.891–898
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comp Vision 115(3):211–252
Article MathSciNet Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Simonyan K, Zisserman A (2014) “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition.” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770–778
Chopra S, Hadsell R, LeCun Y (2005) “Learning a similarity metric discriminatively, with application to face verification.” In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1: 539–546, IEEE
Schroff F, Kalenichenko D, Philbin J (2015) “Facenet: A unified embedding for face recognition and clustering.” In: Proceedings of the IEEE conference on computer vision and pattern recognition, p.815–823
Wen Y, Zhang K, Li Z, Qiao Y (2016) “A discriminative feature learning approach for deep face recognition,” In: European conference on computer vision, pp.499–515, Springer
Melekhov I, Kannala J, Rahtu E (2016) “Siamese network features for image matching,” In: 2016 23rd international conference on pattern recognition (ICPR), p.378–383, IEEE
Cao Q, Ying Y, Li P (2013) “Similarity metric learning for face recognition.” In: Proceedings of the IEEE international conference on computer vision, p.2408–2415
Moutafis P, Leng M, Kakadiaris IA (2016) An overview and empirical comparison of distance metric learning methods. IEEE Transact Cybernet 47(3):612–625
Article Google Scholar
Hoi SC, Liu W, Chang S-F (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transact Multimedia Comp Commun Applicat (TOMM) 6(3):1–26
Article Google Scholar
Lee J-E, Jin R, Jain AK (2008) “Rank-based distance metric learning: An application to image retrieval,” In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1–8, IEEE
Perwaiz N, Fraz MM, Shahzad M (2018) Person re-identification using hybrid representation reinforced by metric learning. IEEE Access 6:77334–77349
Article Google Scholar
Ren C-X, Xu X-L, Lei Z (2019) A deep and structured metric learning method for robust person re-identification. Pattern Recog 96:106995
Article Google Scholar
Vo NN, Hays J (2016) Localizing and orienting street views using overhead imagery. European conference on computer vision. Springer, UK, pp 494–509
Google Scholar
Lin T-Y, Cui Y, Belongie S, Hays J (2015) “Learning deep representations for ground-to-aerial geolocalization,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.5007–5015
Chechik G, Sharma V, Shalit U, Bengio S (2010) “Large scale online learning of image similarity through ranking.” Journal of Machine Learning Research 11 (3)
Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Transact Image Proc 27(8):3893–3903
Article MathSciNet MATH Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017), “Attention is all you need,” Advances in neural information processing systems, vol.30
Devlin J, Chang M-W, Lee K, Toutanova K (2018) “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805
Hu D (2019) An introductory survey on attention mechanisms in nlp problems. In: Priya D (ed) Proceedings of SAI intelligent systems conference. Springer, pp 432–448
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2020) et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929
Hu J, Shen L, Sun G (2018) “Squeeze-and-excitation networks,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) “Cbam: Convolutional block attention module.” In: Proceedings of the European conference on computer vision (ECCV), pp.3–19
McDonnell MD, Amblard P-O, Stocks NG (2009) Stochastic pooling networks. J Statist Mech 2009(01):P01012
Article Google Scholar
Zeiler MD, Fergus R (2013) “Stochastic pooling for regularization of deep convolutional neural networks,” arXiv preprint arXiv:1301.3557
Yu D, Wang H, Chen P, Wei Z (2014) Mixed pooling for convolutional neural networks. International conference on rough sets and knowledge technology. Springer, UK, pp 364–375
Chapter Google Scholar
Chen W, Liu Y, Wang W, Bakker E, Georgiou T, Fieguth P, Liu L, Lew MS (2021) “Deep learning for instance retrieval: A survey,” arXiv preprint arXiv:2101.11282
Babenko A, Lempitsky V (2015) “Aggregating deep convolutional features for image retrieval,” arXiv preprint arXiv:1510.07493
Radenović F, Tolias G, Chum O (2018) Fine-tuning cnn image retrieval with no human annotation. IEEE Transact Pattern Anal Mach Intellig 41(7):1655–1668
Article Google Scholar
Gu Y, Li C, Xie J (2018) “Attention-aware generalized mean pooling for image retrieval,” arXiv preprint arXiv:1811.00202
X.Wu, G.Irie, K.Hiramatsu, and K.Kashino, “Weighted generalized mean pooling for deep image retrieval.” In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp.495–499, IEEE, 2018
Vaccaro F, Bertini M, Uricchio T, Del Bimbo A (2020) “Image retrieval using multi-scale cnn features pooling” In: Proceedings of the 2020 international conference on multimedia retrieval, p.311–315
Deng Y, Lin X, Li R, Ji R (2019) “Multi-scale gem pooling with n-pair center loss for fine-grained image search.” In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp.1000–1005, IEEE
Zheng Z, Wei Y, Yang Y (2020) “University-1652: A multi-view multi-source benchmark for drone-based geo-localization.” In: Proceedings of the 28th ACM international conference on Multimedia, pp.1395–1403
Wang T, Zheng Z, Yan C, Zhang J, Sun Y, Zheng B, Yang Y (2021) Each part matters: local patterns facilitate cross-view geo-localization. IEEE Transact Circuits Syst Video Technol 32(2):867–879
Article Google Scholar
Park J, Woo S, Lee J-Y, Kweon IS (2018) “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514
Qilong W, Banggu W, Pengfei Z, Peihua L, Wangmeng Z, Qinghua H (2020) “Eca-net: Efficient channel attention for deep convolutional neural networks,” In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Download references

Acknowledgments

This work was supported by the NEC C&C Foundation Grants for Researchers.

Author information

Authors and Affiliations

Department of Computer Science, National Defense Academy, 1-10-20 Hashirimizu, Yokosuka, 239-8686, Kanagawa, Japan
Duc Viet Bui, Masao Kubo & Hiroshi Sato

Authors

Duc Viet Bui
View author publications
You can also search for this author in PubMed Google Scholar
Masao Kubo
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc Viet Bui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).

About this article

Cite this article

Bui, D.V., Kubo, M. & Sato, H. Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite. Artif Life Robotics 28, 560–570 (2023). https://doi.org/10.1007/s10015-023-00867-x

Download citation

Received: 13 May 2022
Accepted: 06 March 2023
Published: 15 April 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10015-023-00867-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite

Abstract

Access this article

Similar content being viewed by others

UAV-Satellite Cross-View Image Matching Based on Siamese Network

USuperGlue: an unsupervised UAV image matching network based on local self-attention

RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite

Abstract

Access this article

Similar content being viewed by others

UAV-Satellite Cross-View Image Matching Based on Siamese Network

USuperGlue: an unsupervised UAV image matching network based on local self-attention

RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation