Abstract:
Drone-based vehicle detection is a critical task within intelligent transportation systems. The existing methods that rely solely on single visible or infrared modalities...View moreMetadata
Abstract:
Drone-based vehicle detection is a critical task within intelligent transportation systems. The existing methods that rely solely on single visible or infrared modalities often struggle to achieve both precise and robust detection. Effectively integrating cross-modal information to assist in vehicle detection remains a significant challenge. In this article, we propose a mask-guided Mamba fusion (MGMF) method for visible-infrared vehicle detection in aerial scenes. The proposed MGMF framework consists of two key components: the masked regularization constraint module (MRCM) and the state-space fusion module (SSFM). First, in MAEM, we use candidate regions from one modality to cover corresponding regions of intermediate-level features from another modality, while a regularization constraint extracts cross-modal guidance. This design allows cross-modal features focused on vehicle areas to be extracted from both modalities for fusion. Second, in SSFM, we propose mapping cross-modal features into a shared hidden state for interaction. This reduces disparities between the cross-modal features and enhances the representation, enabling better perception of intermodal correlations. When evaluated on the DroneVehicle dataset, our MGMF achieves an 80.24% with respect to mAP, establishing a new benchmark for state-of-the-art performance. Ablation studies further demonstrate the effectiveness of our MAEM and SSFM in enhancing visible-infrared fusion for vehicle detection.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 62)