Mask-guided dual attention-aware network for visible-infrared person re-identification

Qi, Meibin; Wang, Suzhi; Huang, Guanghong; Jiang, Jianguo; Wu, Jingjing; Chen, Cuiqun

doi:10.1007/s11042-020-10431-5

Mask-guided dual attention-aware network for visible-infrared person re-identification

Published: 10 February 2021

Volume 80, pages 17645–17666, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Meibin Qi¹,
Suzhi Wang ORCID: orcid.org/0000-0002-4668-0836^1,2,
Guanghong Huang¹,
Jianguo Jiang¹,
Jingjing Wu¹ &
…
Cuiqun Chen¹

764 Accesses
9 Citations
Explore all metrics

Abstract

Given a person of interest in RGB images, Visible-Infrared Person Re-identification (VI-REID) aims at searching for this person in infrared images. It faces a number of challenges due to large cross-modality discrepancies and intra-modality variations caused by illuminations, human poses, viewpoints and cluttered backgrounds, etc. This paper proposes a Mask-guided Dual Attention-aware Network (MDAN) for VI-REID. MDAN consists of two individual networks for two different modalities respectively, whose feature representations are driven by mask-guided attention-aware information and multi-loss constraints. Specifically, we first utilize masked image as a supplement to the original image, so as to enhance the contour and appearance information which are extremely important clues for matching the features of pedestrians from visible and infrared modalities. Second, a Residual Attention Module (RAM) is put forward to capture fine-grained features and subtle differences among pedestrians, so as to learn more discriminative features of pedestrians from heterogeneous modalities by adaptively calibrating feature responses along channel and spatial dimensions. Third, features from two individual streams of two modalities will be directly aggregated to form a cross-modality identity representation. Extensive experiments demonstrate that the proposed approach effectively improves the performance of VI-REID task and remarkably outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learn Robust Pedestrian Representation Within Minimal Modality Discrepancy for Visible-Infrared Person Re-Identification

Article 31 May 2022

Infrared-visible person re-identification via Dual-Channel attention mechanism

Article 06 February 2023

Combining information augmentation aggregation and dual-granularity feature fusion for visible-infrared person re-identification

Article 30 December 2024

References

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: PICLR
Barra P, Bisogni C, Nappi M, Freire-Obregón D, Castrillón-Santana M (2020) Gotcha-i: a multiview human videos dataset. security in computing and communications
Bedagkar-Gala A, Shah S (2014) A survey of approaches and trends in person re-identification. In: Image Vision Comput, pp 270–286
Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Wang Z (2019) ABD-Net:, Attentive but Diverse Person Re-Identification. arXiv:1908.01114
Chen D, Zhang S, Ouyang W, Yang J, Tai Y (2018) Person search via a mask-guided two-stream cnn model. arXiv:1807.08107
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chu T (2017) SCA-CNN : Spatial And channel-wise attention in convolutional networks for image captioning. In: CVPR
Cheng D, Li X, Qi M, Liu X, Chen C, Niu D (2019) Exploring cross-modality commonalities via dual-stream multi-branch network for infrared-visible person re-identification. In: IEEE Access, pp 12824–12834
Choi S, Lee S, Kim Y, Kim T, Kim C (2020) Hi-cmd: hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: CVPR
Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Crossmodality person re-identification with generative adversarial training. In: IJCAI, pp 677–683
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR
De Marsico M, Distasi R, Ricciardi S, Riccio D (2014) A comparison of approaches for person re-identification. In: ICPRAM, pp 189–198
Feng Z, Lai J, Xie X (2019) Learning modality-specific representations for visible-infrared person re-identification, IEEE Transactions on Image Processing, 29, 579–590
Fu Y, Wei Y, Zhou Y, Shi H, Huang G, Wang X, Yao Z, Huang T (2018) Horizontal pyramid matching for person reidentification. arXiv:1804.05275
Guler RA, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkino I (2016) Densereg: Fully convolutional dense shape regression in-the-wild. arXiv:1612.01202
Hao Y, Li J, Wang N, Gao X (2020) Modality adversarial neural network for visible-thermal person re-identification, p Pattern Recognition
Hao Y, Wang N, Li J, Gao X (2019) Hsme: Hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp 8385–8392
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. arXiv:1703.06870
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. arXiv:1709.01507
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: NIPS
Jiang J, Jin K, Qi M, Wang Q, Wu J, Chen C (2020) A cross-modal multi-granularity attention network for rgb-ir person re-identification. In: Neurocomputing
Kalayeh MM, Basaran E, Gokmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: CVPR, pp 1062–1071
Kang JK, Hoang TM, Park KR (2019) Person re-identification between visible and thermal camera images based on deep residual CNN using single input. [J]. IEEE Access, 7: pp 57972–57984.
Kumar V, Namboodiri A, Paluri M, Jawahar C (2017) Pose-aware person recognition. In: CVPR
Lan X, Wang H, Gong S, Zhu X (2017) Deep reinforcement learning attention selection for person re-identification. In: BMVC
Li S, Bak S, Car P, Wang X (2018) Diversity regularized spatiotemporal attention for video-based person re-identificatio. In: CVPR
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: CVPR
Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: CVPR
Liang X, Gong K, Shen X, Lin L (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. arXiv:1804.01984
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: CVPR, pp 2197–2206
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCo: common objects in context. In: ECCV
Lin D, Tang X (2006) Inter-modality face recognition. In: ECCV
Lin L, Wang G, Zuo W, Feng X, Zhang L (2017) Cross-domain visual matching via generalized similarity measure and feature learning. In: TPAMI, pp 1089–1102
Liu X, Zhao H, Tian M, Sheng L, Shao J, Yi S, Yan J, Wang X (2017) Hydraplus-net: Attentive deep features for pedestrian analysis. In: ICCV
Nguyen DT, Hong HG, Kim KW, Park KR (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. In: IJCV
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: CVPR, pp 1227–1236
Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: CVPR
Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: ICCV
Sun Y, Xu Q, Li Y, Zhang C, Li Y, Wang S, Sun J (2019) Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. In: CVPR
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and A strong convolutional baseline). In: ECCV, pp 501–518
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Vezzani R, Baltieri D, Cucchiara R (2013) People Reidentification in surveillance and forensics: a survey. In: ACM computing surveys
Wang X, Girshick RB, Gupta A, He K (2018) Non-local neural networks. In: CVPR
Wang Y, Wang L, You Y, Zou X, Chen V, Li S, Huang G, Hariharan B, et al., Weinberger KQ (2018) Resource aware person re-identification across multiple resolutions. In: CVPR, pp 8042–8051
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person reidentification. arXiv:1804.01438
Wang Z, Zheng Y, Chuang Y-Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infraredvisible person re-identification. In: CVPR
Wu J, Liu H, Jiang J, Qi M, Ren B, Li X, Wang Y (2020) Person attribute recognition by sequence contextual relation learning. In: IEEE
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: ICCV, pp 5380–5389
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person reidentification. In: IEEE, pp 4733–4742
Yang F, Yan K, Lu S, Jia H, Xie X, Gao W (2019) Attention driven person re-identification. In: Pattern Recognit, pp 143–155
Ye M, Lan X, Li J, Yuen PC (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI
Ye M, Lan X, Wang Z, Yuen PC (2019) Bi-directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. In: IEEE TIFS
Ye M, Wang Z, Lan X, Yuen PC (2018) Visible thermal person re-identification via dual-constrained topranking. In: IJCAI
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928
Zhang Y, Guo J, Huang Z, Qiu W, Fan H (2019) Multi-layer attention for person re-identification. In: MATEC web of conferences, Vol. 277
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: Surpassing human-level performance in person re-identification. arXiv:1711.08184
Zhao L, Li X, Zhuang Y, JingdongWang (2017) Deeply-learned part-aligned representations for person re-identification. In: ICCV
Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle Net: Person re-identification with human body region guided feature decomposition and fusion. In: CVPR
Zheng L, Huang Y, Lu H, Yang Y (2017) Pose invariant embedding for deep person re-identification. arXiv:1701.07732
Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: CVPR
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:1610.02984
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: CVPR, pp 2921–2929

Download references

Author information

Authors and Affiliations

School of Computer and Information, Hefei University of Technology, Hefei, Anhui, 230009, China
Meibin Qi, Suzhi Wang, Guanghong Huang, Jianguo Jiang, Jingjing Wu & Cuiqun Chen
The Anhui Siliepoch Technology Co., Ltd, China
Suzhi Wang

Authors

Meibin Qi
View author publications
You can also search for this author inPubMed Google Scholar
Suzhi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Guanghong Huang
View author publications
You can also search for this author inPubMed Google Scholar
Jianguo Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Jingjing Wu
View author publications
You can also search for this author inPubMed Google Scholar
Cuiqun Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Suzhi Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by theNational Natural Science Foundation of China Grant 61876056 and Grant 61771180

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qi, M., Wang, S., Huang, G. et al. Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed Tools Appl 80, 17645–17666 (2021). https://doi.org/10.1007/s11042-020-10431-5

Download citation

Received: 10 January 2020
Revised: 14 August 2020
Accepted: 22 December 2020
Published: 10 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11042-020-10431-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mask-guided dual attention-aware network for visible-infrared person re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learn Robust Pedestrian Representation Within Minimal Modality Discrepancy for Visible-Infrared Person Re-Identification

Infrared-visible person re-identification via Dual-Channel attention mechanism

Combining information augmentation aggregation and dual-granularity feature fusion for visible-infrared person re-identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now