Abstract:
Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global n...Show MoreMetadata
Abstract:
Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.
Published in: IEEE Robotics and Automation Letters ( Volume: 10, Issue: 2, February 2025)