Abstract:
Visible-infrared vehicle re-identification (VIVR) seeks to match vehicle images of the same identity taken by cameras of different modalities. The noticeable disparity be...Show MoreMetadata
Abstract:
Visible-infrared vehicle re-identification (VIVR) seeks to match vehicle images of the same identity taken by cameras of different modalities. The noticeable disparity between visible and infrared modalities leads to attention deviations, causing deep models to incorrectly focus on different local regions of vehicles in visible and infrared images. We observed that the spatial distributions of distinguishing local regions, such as logos, front windows, and wheels, exhibit similarity in average images obtained from both visible and infrared images. Based on this, we propose a modality-consistent attention (MCA) approach for VIVR. Unlike image-level attention, our MCA is identity-level attention that holistically emphasizes the distinguishing regions of a vehicle identity across multiple images captured from various viewpoints. Furthermore, we constrain the differences between the identity-level spatial attention masks resulting from visible and infrared modalities. This approach helps deep networks focus consistently on learning the distinguishing local characteristics of vehicles across different modalities and viewpoints. Our experiments on RGBN300 and MSVR310 datasets demonstrate that our approach achieves state-of-the-art performance.
Published in: IEEE Signal Processing Letters ( Volume: 31)