Abstract
Cross-modal person re-identification between the visible (RGB) modality and infrared (IR) modality is extremely important for nighttime surveillance applications. In addition to the cross-modal differences caused by different camera spectra, RGB-IR person re-identification is also affected by the large cross-modal and intra-modal variations caused by different camera views and person poses. On the other hand, existing VI-ReID works tend to learn global representations with limited discriminative power and weak robustness to noisy images. In this paper, we propose a novel three-attentional aggregation (TAANet) learning method by mining intra-modal hierarchical and cross-modal graph-level contextual cues of VI-ReID. We propose an intra-modal hybrid weight attention module, which extracts distinguished local aggregated features by mining channel and local feature relationships. To enhance robustness to noisy samples, we introduce an improved triple loss combined with a center loss that takes into account the distance between the different classes closest to the sample, allowing a certain distance to be maintained between classes and improving the discrimination of features. Extensive experiments show that TAANet outperforms state-of-the-art methods in a variety of settings.
Similar content being viewed by others
References
Bak S, Zaidenberg S, Boulay B, Bremond F (2014) Improving person re-identification by viewpoint cues. Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, In, pp 175–180
Basaran E, G¨okmen M, Kamasak ME (2020) An efficient framework for visible–infrared cross modality person re-identification. https://arxiv.org/abs/1907.06498, pp 1-12
Chang X, Hospedales TM, Xiang T (2018) Multi-level factorisation net for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2109–2118
Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 3640–3649
Cho YJ, Yoon KJ (2016) Improving person re-identification via poseaware multi-shot matching. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1354–1362
Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. Proceedings of the IEEE Joint Conference on Artificial Intelligence, In, pp 667–683
Feng Z, Lai J, Xie X (2019) Learning modality-specific representations for visible-infrared person re-identification. IEEE Transactions on Image Processing 29(7):579–590
Fu J, Liu J, Tian H, Li Y, Bao Y et al (2019) Dual attention network for scene segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 3146–3154
Gheissari N, Sebastian TB, Hartley R (2006) Person reidentification using spatiotemporal appearance. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 1528–1535
Gong S, Cristani M, Loy CC, Hospedales TM (2014) The re-identification challenge. Proceedings of the IEEE Conference on Person re-identification, Springer, In, pp 1–20
Han C, Zheng R, Gao C, Sang N (2019) Complementation-reinforced attention network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30(10):3433–3445
Hao Y, Wang N, Li J, Gao X (2019) Hsme: hypersphere manifold embedding for visible thermal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 8385–8392
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 770–778
Huang Y, Zha ZJ, Fu X, Zhang W (2019) Illumination-invariant person reidentification. Proceedings of the ACM Conference on Multimedia, In, pp 365–373
Jiang J, Jin K, Qi M, Wang Q, Wu J et al (2020) A cross-modal multi-granularity attention network for rgb-ir person re-identification. Neurocomputing 406:59–67
Jin X, Lan C, Zeng W, Chen Z (2020) Global distance-distributions separation for unsupervised person re-identification. Proceedings of the European Conference on Computer Vision, In, pp 735–751
Karanam S, Li Y, Radke RJ (2019) Person re-identification with discriminatively trained viewpoint invariant dictionaries. Proceedings of the IEEE Conference on Computer Vision, In, pp 4516–4524
Leng Q, Ye M, Tian Q (2019) A survey of open-world person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30(4):1092–1108
Li S, Xiao T, Li H, Zhou B, Yue D et al (2017) Person search with natural language description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 1970–1979
Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2285–2294
Li D, Wei X, Hong X, Gong Y (2020) Infrared-visible cross-modal person re-identification with an x modality. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 4610–4617
Liu H, Tan X, Zhou X (2020) Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. https://arxiv.org/abs/2008.06223, pp 1-12
Liu H, Cheng J, Wang W, Su Y, Bai H (2020) Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing 398:11–19
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 1487–1495
Luo H, Jiang W, Gu Y, Liu F, Liao X et al (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia 22(10):2597–2609
Nguyen DT, Hong HG, Kim KW, Park KR (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):1–29
Su C, Li J, Zhang S, Xing J, Gao W et al (2017) Pose-driven deep convolutional model for person re-identification. Proceedings of the IEEE Conference on Computer Vision, In, pp 3960–3969
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). Proceedings of the European Conference on Computer Vision, In, pp 480–496
Sun L, Jiang Z, Song H, Lu Q, Men A (2018) Semi-coupled dictionary learning with relaxation label space transformation for video-based person re-identification. IEEE Access 6:12587–12597
Sun Y, Xu Q, Li Y, Zhang C, Li Y et al (2019) Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 393–402
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al (2017) Attention is all you need. Proceedings of the Annual Conference on Neural Information Processing Systems, In, pp 5998–6008
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, et al (2017) Graph attention networks. https://arxiv.org/abs/1710.10903: 1-12
Wang F, Jiang M, Qian C, Yang S, Li C et al (2017) Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 3156–3164
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. Proceedings of the ACM Conference on Multimedia, In, pp 274–282
Wang J, Zhu X, Gong S, Li W (2018) Transferable joint attribute-identity deep learning for unsupervised person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2275–2284
Wang C, Zhang Q, Huang C, Liu W, Wang X (2018) Mancs: A multi-task attentional network with curriculum sampling for person re-identification. Proceedings of the European Conference on Computer Vision, In, pp 365–381
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 7794–7803
Wang G, Zhang T, Cheng J, Liu S, Yang Y et al (2019) Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. Proceedings of the IEEE Conference on Computer Vision, In, pp 3623–3632
Wang Z, Wang Z, Zheng Y, Chuang YY, Satoh S (2019) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 618–626
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. Proceedings of the European conference on computer vision, In, pp 499–515
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision, In, pp 3–19
Wu A, Zheng WS, Yu HX, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. Proceedings of the IEEE Conference on Computer Vision, In, pp 5380–5389
Wu A, Zheng WS, Lai JH (2017) Robust depth-based person re-identification. IEEE Transactions on Image Processing 26(6):2588–2603
Wu L, Wang Y, Gao J, Li X (2018) Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Transactions on Multimedia 21(6):1412–1424
Wu D, Zheng SJ, Zhang XP, Yuan CA, Cheng F et al (2019) Deep learning-based methods for person re-identification: A comprehensive review. Neurocomputing 337:354–371
Xu K, Ba J, Kiros R, Cho K, Courville A et al (2015) Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the IEEE Conference on Machine Learning, In, pp 2048–2057
Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018) Attention-aware compositional network for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2119–2128
Yang X, Wang M, Tao D (2017) Person re-identification with metric learning using privileged information. IEEE Transactions on Image Processing 27(2):791–805
Yang F, Yan K, Lu S, Jia H, Xie X et al (2019) Attention driven person re-identification. Pattern Recognition 86:143–155
Yao H, Zhang S, Hong R, Zhang Y, Xu C et al (2019) Deep representation learning with part loss for person re-identification. IEEE Transactions on Image Processing 28(6):2860–2871
Ye M, Shen J, J Crandall D, Shao L, Luo J (2020) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Proceedings of the European Conference on Computer Vision, pp 229-247
Ye M, Lan X, Li J, Yuen P (2018) Hierarchical discriminative learning for visible thermal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 750–7508
Ye M, Wang Z, Lan X, Yuen PC (2018) Visible thermal person re-identification via dual-constrained top-ranking. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 1092–1099
Ye M, Lan X, Wang Z, Yuen PC (2019) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Transactions on Information Forensics and Security 15(6):407–419
Ye M, Lan X, Leng Q, Shen J (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Transactions on Image Processing 29:9387–9399
Ye M, Shen J, Lin G, Xiang T, Shao L, et al (2021) Deep learning for person re-identification: A survey and outlook. https://arxiv.org/abs/2001.04193, pp 1-20
Yuan Y, Zhang J, Wang Q (2020) Deep gabor convolution network for person re-identification. Neurocomputing 378:387–398
Zhang Y, Li K, Li K, Zhong B, Fu Y (2018) Residual non-local attention networks for image restoration. Proceedings of the International Conference on Conference on Learning Representations, In, pp 1–18
Zhang JA, Wang Q, Yuan Y (2019) Metric learning by simultaneously learning linear transformation matrix and weight matrix for person re-identification. IET Computer Vision 13(4):428–434
Zhao H, Tian M, Sun S, Shao J, Yan J et al (2017) Spindle net: Person re-identification with human body region guided feature decomposition and fusion. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1077–1085
Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. Proceedings of the IEEE Conference on Computer Vision, In, pp 3219–3228
Zhao YB, Lin JW, Xuan Q, Xi X (2019) Hpiln: a feature learning framework for cross-modality person re-identification. IET Image Processing 13(14):2897–2904
Zheng F, Deng C, Sun X, Jiang X, Guo X et al (2019) Pyramidal person re-identification via multi-loss dynamic training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 8514–8522
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, P., Zhu, S., Wang, D. et al. Cross-modality person re-identication with triple-attentive feature aggregation. Multimed Tools Appl 81, 4455–4473 (2022). https://doi.org/10.1007/s11042-021-11739-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11739-6