Abstract:
Over the past ten years, multispectral pedestrian detection has attracted a lot of interest. The RGB-thermal image pairs used in existing methods are well-aligned by defa...Show MoreMetadata
Abstract:
Over the past ten years, multispectral pedestrian detection has attracted a lot of interest. The RGB-thermal image pairs used in existing methods are well-aligned by default, but there is a weak alignment issue between both image pairs captured by different sensors, which leads to the inaccuracy of pedestrian detection. To alleviate the problem of weak alignment in multispectral tasks, a cross-modal learning network (CMLNet) is proposed in this paper. A novel spatial-semantic alignment strategy is firstly designed to align the RGB-thermal features with the spatial transformation and semantic mapping between both modalities. A feature reselection module is implemented to filter the redundant features before the fusion. Finally, YOLOX is chosen as the detection framework. The open KAIST dataset is used to validate the suggested technique. Experimental results demonstrate that the proposed method can be applied in real-time applications, i.e., the pedestrian can be detected in 16 ms for each pair of RGB-thermal images. And the miss rate of pedestrian detection can reach 18.12% with competitive performance, compared with the state-of-the-art approaches.
Date of Conference: 17-20 July 2023
Date Added to IEEE Xplore: 20 September 2023
ISBN Information: