Skip to main content
Log in

Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Cross-view geo-localization is finding images containing the same geographic target in multi-views. For example, given a query image from UAV view, a proposed matching model can find an exact image of the same location in a gallery collected by satellites. Using a UAV-view image to acquire the true-matched satellite-view image with a geo-tag, the current geographic location of the UAV can be easily localized based on flight records. However, due to the extreme change of viewpoints across platforms, traditional image processing methods have met difficulties matching multi-view images. This paper proposed advanced neural network-based approaches, which applied the attention mechanism to the feature learning process to improve the ability to learn essential features from the input image. A different pooling method was also implemented to increase the global descriptor. Our proposed models have significantly improved accuracy and have achieved competitive results on the University-1652 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Kontitsis M, Valavanis KP, Tsourveloudis N (2004) “A uav vision system for airborne surveillance,” In: IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, vol.1, pp.77–83, IEEE

  2. Chen SW, Shivakumar SS, Dcunha S, Das J, Okon E, Qu C, Taylor CJ, Kumar V (2017) Counting apples and oranges with deep learning: A data-driven approach. IEEE Robot Automat Lett 2(2):781–788

    Article  Google Scholar 

  3. Hu S, Feng M, Nguyen RM, Lee GH (2018) “Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7258–7267

  4. Liu L, Li H, Dai Y (2019) “Stochastic attraction-repulsion embedding for large scale image localization,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.2570–2579

  5. Shi Y, Yu X, Liu L, Zhang T, Li H (2020) Optimal feature transport for cross-view image geo-localization. Proc AAAI Conf Artif Intellig 34:11990–11997

    Google Scholar 

  6. Tian Y, Chen C, Shah M (2017) “Cross-view image matching for geo-localization in urban environments,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3608–3616

  7. Lowe DG (1999) Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comp vision 2:1150–1157

    Article  Google Scholar 

  8. Bay H, Tuytelaars T, Gool LV (2006) Surf: Speeded up robust features. In: Priya D (ed) European Conf Comp Vision. Springer, UK, pp 404–417

    Google Scholar 

  9. Liu L, Li H (2019) “Lending orientation to neural networks for cross-view geo-localization,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.5624–5633

  10. Shi Y, Liu L, Yu X, Li H (2019) “Spatial-aware feature aggregation for image based cross-view geo-localization,” Advances in Neural Information Processing Systems, vol.32

  11. Cao Y, Long M, Wang J, Yu PS (2016) “Correlation hashing network for efficient cross-modal retrieval,” arXiv preprint arXiv:1602.06697

  12. Workman S, Souvenir R, Jacobs N (2015) “Wide-area image geolocalization with aerial reference imagery,” In: Proceedings of the IEEE International Conference on Computer Vision, pp.3961–3969

  13. Castaldo F, Zamir A, Angst R, Palmieri F, Savarese S (2015) “Semantic cross-view matching,” In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp.9–17

  14. Senlet T, Elgammal A (2011) “A framework for global vehicle localization using stereo images and satellite and road maps.” In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.2034–2041, IEEE

  15. Lin T-Y, Belongie S, Hays J (2013) “Cross-view image geolocalization.” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.891–898

  16. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comp Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  17. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  18. Simonyan K, Zisserman A (2014) “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556

  19. He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition.” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770–778

  20. Chopra S, Hadsell R, LeCun Y (2005) “Learning a similarity metric discriminatively, with application to face verification.” In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1: 539–546, IEEE

  21. Schroff F, Kalenichenko D, Philbin J (2015) “Facenet: A unified embedding for face recognition and clustering.” In: Proceedings of the IEEE conference on computer vision and pattern recognition, p.815–823

  22. Wen Y, Zhang K, Li Z, Qiao Y (2016) “A discriminative feature learning approach for deep face recognition,” In: European conference on computer vision, pp.499–515, Springer

  23. Melekhov I, Kannala J, Rahtu E (2016) “Siamese network features for image matching,” In: 2016 23rd international conference on pattern recognition (ICPR), p.378–383, IEEE

  24. Cao Q, Ying Y, Li P (2013) “Similarity metric learning for face recognition.” In: Proceedings of the IEEE international conference on computer vision, p.2408–2415

  25. Moutafis P, Leng M, Kakadiaris IA (2016) An overview and empirical comparison of distance metric learning methods. IEEE Transact Cybernet 47(3):612–625

    Article  Google Scholar 

  26. Hoi SC, Liu W, Chang S-F (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transact Multimedia Comp Commun Applicat (TOMM) 6(3):1–26

    Article  Google Scholar 

  27. Lee J-E, Jin R, Jain AK (2008) “Rank-based distance metric learning: An application to image retrieval,” In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1–8, IEEE

  28. Perwaiz N, Fraz MM, Shahzad M (2018) Person re-identification using hybrid representation reinforced by metric learning. IEEE Access 6:77334–77349

    Article  Google Scholar 

  29. Ren C-X, Xu X-L, Lei Z (2019) A deep and structured metric learning method for robust person re-identification. Pattern Recog 96:106995

    Article  Google Scholar 

  30. Vo NN, Hays J (2016) Localizing and orienting street views using overhead imagery. European conference on computer vision. Springer, UK, pp 494–509

    Google Scholar 

  31. Lin T-Y, Cui Y, Belongie S, Hays J (2015) “Learning deep representations for ground-to-aerial geolocalization,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.5007–5015

  32. Chechik G, Sharma V, Shalit U, Bengio S (2010) “Large scale online learning of image similarity through ranking.” Journal of Machine Learning Research 11 (3)

  33. Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Transact Image Proc 27(8):3893–3903

    Article  MathSciNet  MATH  Google Scholar 

  34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017), “Attention is all you need,” Advances in neural information processing systems, vol.30

  35. Devlin J, Chang M-W, Lee K, Toutanova K (2018) “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805

  36. Hu D (2019) An introductory survey on attention mechanisms in nlp problems. In: Priya D (ed) Proceedings of SAI intelligent systems conference. Springer, pp 432–448

    Google Scholar 

  37. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2020) et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929

  38. Hu J, Shen L, Sun G (2018) “Squeeze-and-excitation networks,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.7132–7141

  39. Woo S, Park J, Lee J-Y, Kweon IS (2018) “Cbam: Convolutional block attention module.” In: Proceedings of the European conference on computer vision (ECCV), pp.3–19

  40. McDonnell MD, Amblard P-O, Stocks NG (2009) Stochastic pooling networks. J Statist Mech 2009(01):P01012

    Article  Google Scholar 

  41. Zeiler MD, Fergus R (2013) “Stochastic pooling for regularization of deep convolutional neural networks,” arXiv preprint arXiv:1301.3557

  42. Yu D, Wang H, Chen P, Wei Z (2014) Mixed pooling for convolutional neural networks. International conference on rough sets and knowledge technology. Springer, UK, pp 364–375

    Chapter  Google Scholar 

  43. Chen W, Liu Y, Wang W, Bakker E, Georgiou T, Fieguth P, Liu L, Lew MS (2021) “Deep learning for instance retrieval: A survey,” arXiv preprint arXiv:2101.11282

  44. Babenko A, Lempitsky V (2015) “Aggregating deep convolutional features for image retrieval,” arXiv preprint arXiv:1510.07493

  45. Radenović F, Tolias G, Chum O (2018) Fine-tuning cnn image retrieval with no human annotation. IEEE Transact Pattern Anal Mach Intellig 41(7):1655–1668

    Article  Google Scholar 

  46. Gu Y, Li C, Xie J (2018) “Attention-aware generalized mean pooling for image retrieval,” arXiv preprint arXiv:1811.00202

  47. X.Wu, G.Irie, K.Hiramatsu, and K.Kashino, “Weighted generalized mean pooling for deep image retrieval.” In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp.495–499, IEEE, 2018

  48. Vaccaro F, Bertini M, Uricchio T, Del Bimbo A (2020) “Image retrieval using multi-scale cnn features pooling” In: Proceedings of the 2020 international conference on multimedia retrieval, p.311–315

  49. Deng Y, Lin X, Li R, Ji R (2019) “Multi-scale gem pooling with n-pair center loss for fine-grained image search.” In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp.1000–1005, IEEE

  50. Zheng Z, Wei Y, Yang Y (2020) “University-1652: A multi-view multi-source benchmark for drone-based geo-localization.” In: Proceedings of the 28th ACM international conference on Multimedia, pp.1395–1403

  51. Wang T, Zheng Z, Yan C, Zhang J, Sun Y, Zheng B, Yang Y (2021) Each part matters: local patterns facilitate cross-view geo-localization. IEEE Transact Circuits Syst Video Technol 32(2):867–879

    Article  Google Scholar 

  52. Park J, Woo S, Lee J-Y, Kweon IS (2018) “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514

  53. Qilong W, Banggu W, Pengfei Z, Peihua L, Wangmeng Z, Qinghua H (2020) “Eca-net: Efficient channel attention for deep convolutional neural networks,” In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Download references

Acknowledgments

This work was supported by the NEC C&C Foundation Grants for Researchers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc Viet Bui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was presented in part at the joint symposium of the 27th International Symposium on Artificial Life and Robotics, the 7th International Symposium on BioComplexity, and the 5th International Symposium on Swarm Behavior and Bio-Inspired Robotics (Online, January 25–27, 2022).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bui, D.V., Kubo, M. & Sato, H. Attention-based neural network with Generalized Mean Pooling for cross-view geo-localization between UAV and satellite. Artif Life Robotics 28, 560–570 (2023). https://doi.org/10.1007/s10015-023-00867-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-023-00867-x

Keywords

Navigation