Abstract:
Cross-view geo-localization has been widely used as an important technique for determining the geographical location of unmanned aerial vehicles (UAVs). Despite various i...Show MoreMetadata
Abstract:
Cross-view geo-localization has been widely used as an important technique for determining the geographical location of unmanned aerial vehicles (UAVs). Despite various image retrieval methods proposed, drone and satellite image cross-view geo-localization still remain challenging due to their wildly inconsistent view angles. In this article, we propose a new framework, the Swin-radial-locality network (SRLN), to extract robust image feature representations. Specifically, SRLN is based on a pruned version of the Swin transformer, which integrates multiscale feature aggregation within a Siamese network structure, featuring shared weights and equipped with multiclassification heads. SRLN is mainly comprised of a radial-slicer-network (RSN) and a local-pattern-network (LPN), which is designed to effectively harmonize directional information from drone-captured images and broader environmental features from satellite imagery, crucial for capturing angle and feature details between drone and satellite images. The RSN part focuses on capturing fine-grained features that represent the drone’s directional information, while the LPN is utilized for a more comprehensive analysis of broader environmental features. Extensive experiments are carried out on widely used public benchmark datasets, i.e., University-1652 and SUES-200. With more than 3% improvement over existing methods in both drone-view target localization tasks and drone navigation applications, the results validate the superior performance of our multiscale feature fusion model, achieving a state-of-the-art performance record.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 62)