Abstract
Given a person of interest in RGB images, Visible-Infrared Person Re-identification (VI-REID) aims at searching for this person in infrared images. It faces a number of challenges due to large cross-modality discrepancies and intra-modality variations caused by illuminations, human poses, viewpoints and cluttered backgrounds, etc. This paper proposes a Mask-guided Dual Attention-aware Network (MDAN) for VI-REID. MDAN consists of two individual networks for two different modalities respectively, whose feature representations are driven by mask-guided attention-aware information and multi-loss constraints. Specifically, we first utilize masked image as a supplement to the original image, so as to enhance the contour and appearance information which are extremely important clues for matching the features of pedestrians from visible and infrared modalities. Second, a Residual Attention Module (RAM) is put forward to capture fine-grained features and subtle differences among pedestrians, so as to learn more discriminative features of pedestrians from heterogeneous modalities by adaptively calibrating feature responses along channel and spatial dimensions. Third, features from two individual streams of two modalities will be directly aggregated to form a cross-modality identity representation. Extensive experiments demonstrate that the proposed approach effectively improves the performance of VI-REID task and remarkably outperforms the state-of-the-art methods.













Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: PICLR
Barra P, Bisogni C, Nappi M, Freire-Obregón D, Castrillón-Santana M (2020) Gotcha-i: a multiview human videos dataset. security in computing and communications
Bedagkar-Gala A, Shah S (2014) A survey of approaches and trends in person re-identification. In: Image Vision Comput, pp 270–286
Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Wang Z (2019) ABD-Net:, Attentive but Diverse Person Re-Identification. arXiv:1908.01114
Chen D, Zhang S, Ouyang W, Yang J, Tai Y (2018) Person search via a mask-guided two-stream cnn model. arXiv:1807.08107
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chu T (2017) SCA-CNN : Spatial And channel-wise attention in convolutional networks for image captioning. In: CVPR
Cheng D, Li X, Qi M, Liu X, Chen C, Niu D (2019) Exploring cross-modality commonalities via dual-stream multi-branch network for infrared-visible person re-identification. In: IEEE Access, pp 12824–12834
Choi S, Lee S, Kim Y, Kim T, Kim C (2020) Hi-cmd: hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: CVPR
Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Crossmodality person re-identification with generative adversarial training. In: IJCAI, pp 677–683
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR
De Marsico M, Distasi R, Ricciardi S, Riccio D (2014) A comparison of approaches for person re-identification. In: ICPRAM, pp 189–198
Feng Z, Lai J, Xie X (2019) Learning modality-specific representations for visible-infrared person re-identification, IEEE Transactions on Image Processing, 29, 579–590
Fu Y, Wei Y, Zhou Y, Shi H, Huang G, Wang X, Yao Z, Huang T (2018) Horizontal pyramid matching for person reidentification. arXiv:1804.05275
Guler RA, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkino I (2016) Densereg: Fully convolutional dense shape regression in-the-wild. arXiv:1612.01202
Hao Y, Li J, Wang N, Gao X (2020) Modality adversarial neural network for visible-thermal person re-identification, p Pattern Recognition
Hao Y, Wang N, Li J, Gao X (2019) Hsme: Hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp 8385–8392
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. arXiv:1703.06870
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. arXiv:1709.01507
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: NIPS
Jiang J, Jin K, Qi M, Wang Q, Wu J, Chen C (2020) A cross-modal multi-granularity attention network for rgb-ir person re-identification. In: Neurocomputing
Kalayeh MM, Basaran E, Gokmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: CVPR, pp 1062–1071
Kang JK, Hoang TM, Park KR (2019) Person re-identification between visible and thermal camera images based on deep residual CNN using single input. [J]. IEEE Access, 7: pp 57972–57984.
Kumar V, Namboodiri A, Paluri M, Jawahar C (2017) Pose-aware person recognition. In: CVPR
Lan X, Wang H, Gong S, Zhu X (2017) Deep reinforcement learning attention selection for person re-identification. In: BMVC
Li S, Bak S, Car P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identificatio. In: CVPR
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: CVPR
Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: CVPR
Liang X, Gong K, Shen X, Lin L (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. arXiv:1804.01984
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: CVPR, pp 2197–2206
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCo: common objects in context. In: ECCV
Lin D, Tang X (2006) Inter-modality face recognition. In: ECCV
Lin L, Wang G, Zuo W, Feng X, Zhang L (2017) Cross-domain visual matching via generalized similarity measure and feature learning. In: TPAMI, pp 1089–1102
Liu X, Zhao H, Tian M, Sheng L, Shao J, Yi S, Yan J, Wang X (2017) Hydraplus-net: Attentive deep features for pedestrian analysis. In: ICCV
Nguyen DT, Hong HG, Kim KW, Park KR (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. In: IJCV
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: CVPR, pp 1227–1236
Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: CVPR
Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV
Sun Y, Xu Q, Li Y, Zhang C, Li Y, Wang S, Sun J (2019) Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. In: CVPR
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and A strong convolutional baseline). In: ECCV, pp 501–518
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Vezzani R, Baltieri D, Cucchiara R (2013) People Reidentification in surveillance and forensics: a survey. In: ACM computing surveys
Wang X, Girshick RB, Gupta A, He K (2018) Non-local neural networks. In: CVPR
Wang Y, Wang L, You Y, Zou X, Chen V, Li S, Huang G, Hariharan B, et al., Weinberger KQ (2018) Resource aware person re-identification across multiple resolutions. In: CVPR, pp 8042–8051
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person reidentification. arXiv:1804.01438
Wang Z, Zheng Y, Chuang Y-Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infraredvisible person re-identification. In: CVPR
Wu J, Liu H, Jiang J, Qi M, Ren B, Li X, Wang Y (2020) Person attribute recognition by sequence contextual relation learning. In: IEEE
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: ICCV, pp 5380–5389
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person reidentification. In: IEEE, pp 4733–4742
Yang F, Yan K, Lu S, Jia H, Xie X, Gao W (2019) Attention driven person re-identification. In: Pattern Recognit, pp 143–155
Ye M, Lan X, Li J, Yuen PC (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI
Ye M, Lan X, Wang Z, Yuen PC (2019) Bi-directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. In: IEEE TIFS
Ye M, Wang Z, Lan X, Yuen PC (2018) Visible thermal person re-identification via dual-constrained topranking. In: IJCAI
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928
Zhang Y, Guo J, Huang Z, Qiu W, Fan H (2019) Multi-layer attention for person re-identification. In: MATEC web of conferences, Vol. 277
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: Surpassing human-level performance in person re-identification. arXiv:1711.08184
Zhao L, Li X, Zhuang Y, JingdongWang (2017) Deeply-learned part-aligned representations for person re-identification. In: ICCV
Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle Net: Person re-identification with human body region guided feature decomposition and fusion. In: CVPR
Zheng L, Huang Y, Lu H, Yang Y (2017) Pose invariant embedding for deep person re-identification. arXiv:1701.07732
Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: CVPR
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:1610.02984
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: CVPR, pp 2921–2929
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by theNational Natural Science Foundation of China Grant 61876056 and Grant 61771180
Rights and permissions
About this article
Cite this article
Qi, M., Wang, S., Huang, G. et al. Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed Tools Appl 80, 17645–17666 (2021). https://doi.org/10.1007/s11042-020-10431-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10431-5