Abstract
Visible-infrared cross modality person re-identification (CM-ReID) has received extensive attention on the community due to its profound applicability for 24-h scene surveillance. The huge modality discrepancy makes it very susceptible to background clutter, especially for infrared images. In this paper, we propose a mask-guided dynamic dual-task collaborative learning (MG-DDCL) method to extract background irrelevant pedestrian representation. A dynamic dual-task collaborative learning strategy is proposed to extract pedestrian representation and generate foreground masks by a unified convolutional neural network. This strategy improved the map by 0.95% and improved the Rank-1 by 1.9%. To make the guidance mask to facilitate the cross modality person re-identification task, we modify the hard-mask produced by semantic segmentation into the friendly soft-mask and generate foreground response map by the regression learning manner. Compared with the classification manner, our method has significant advantages. Extensive experiments conducted on two datasets SYSU-MM01 and RegDB demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
Availability of data and materials
The experimental data is obtained using publicly available datasets. The source code can be obtained by contacting the corresponding author.
References
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell 44:2872–2893
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv:1711.08184
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV), pp 480–496
Jia M, Cheng X, Zhai Y, Lu S, Ma S, Tian Y, Zhang J (2021) Matching on sets: conquer occluded person re-identification without alignment. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 1673–1681
Wang X, Li S, Liu M, Wang Y, Roy-Chowdhury AK (2021) Multi-expert adversarial attack detection in person re-identification using context inconsistency. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 15097–15107
Chen H, Lagadec B, Bremond F (2021) Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 14960–14969
Zheng Y, Tang S, Teng G, Ge Y, Liu K, Qin J, Qi D, Chen D (2021) Online pseudo label generation by hierarchical cluster dynamics for adaptive person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8371–8381
Isobe T, Li D, Tian L, Chen W, Shan Y, Wang S (2021) Towards discriminative representation learning for unsupervised person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8526–8536
Fu D, Chen D, Bao J, Yang H, Yuan L, Zhang L, Li H, Chen D (2021) Unsupervised pre-training for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 14745–14754
Li H, Wu G, Zheng W-S (2021) Combined depth space based architecture search for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6729–6738
Ye M, Wang Z, Lan X, Yuen PC (2018) Visible thermal person re-identification via dual-constrained top-ranking. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, pp 1092–1099
Zhu Y, Yang Z, Wang L, Zhao S, Hu X, Tao D (2020) Hetero-center loss for cross-modality person re-identification. Neurocomputing 386:97–109
Park H, Lee S, Lee J, Ham B (2018) Learning by aligning: visible-infrared person re-identification using cross-modal correspondences. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 12046–12055
Ye M, Shen J, J Crandall D, Shao L, Luo J (2020) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Roceedings of the computer vision–ECCV 2020: 16th European conference, pp 229–247
Ye M, Lan X, Leng Q, Shen J (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans Image Process 29:9387–9399
Ye M, Lan X, Li J, Yuen P (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, pp 7501–7508
Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. In: Proceedings of the 27th international joint conference on artificial intelligence, vol 1, pp 677–683
Wang G, Zhang T, Yang Y, Cheng J, Chang J, Liang X, Hou Z-G (2020) Cross-modality paired-images generation for rgb-infrared person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12144–12151
Wang G, Zhang T, Cheng J, Liu S, Yang Y, Hou Z (2019) Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3623–3632
Wang Z, Wang Z, Zheng Y, Chuang Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 618–626
Liu Y, Shao W, Sun X (2022) Learn robust pedestrian representation within minimal modality discrepancy for visible-infrared person re-identification. J Comput Sci Technol 37:641–651
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 384–393
Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3960–3969
Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1077–1085
Zheng L, Huang Y, Lu H, Yang Y (2019) Pose-invariant embedding for deep person re-identification. IEEE Trans Image Process 28(9):4500–4509
Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1179–1188
Qi L, Huo J, Wang L, Shi Y, Gao Y (2019) A mask based deep ranking neural network for person retrieval. In: 2019 IEEE International conference on multimedia and expo (ICME), pp 496–501
Wu A, Zheng W, Yu H, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 5380–5389
Lu Y, Wu Y, Liu B, Zhang T, Li B, Chu Q, Yu N (2020) Cross-modality person re-identification with shared-specific feature transfer. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 13376–13386
Hao X, Zhao S, Ye M, Shen J (2021) Cross-modality person re-identification via modality confusion and center aggregation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16403–16412
Chen Y, Wan L, Li Z, Jing Q, Sun Z (2021) Neural feature search for rgb-infrared person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 587–597
Liu S, Johns E, Davison AJ (2019) End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1871–1880
Misra I, Shrivastava A, Gupta A, Hebert M (2016) Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3994–4003
Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J (2018) Joint task-recursive learning for semantic segmentation and depth estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 235–251
Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L (2021) Multi-task learning for dense prediction tasks: a survey. IEEE Trans Pattern Anal Mach Intell pp 3614–3633
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Wu Q, Dai P, Chen J, Lin C-W, Wu Y, Huang F, Zhong B, Ji R (2021) Discover cross-modality nuances for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4330–4339
Fan M, Lai S, Huang J, Wei X, Chai Z, Luo J, Wei X (2021) Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9716–9725
Shen X, Tao X, Gao H, Zhou C, Jia J (2016) Deep automatic portrait matting. In: European conference on computer vision, pp 92–107
Nguyen DT, Hong HG, Kim KW, Park KR (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):605
Hao Y, Wang N, Li J, Gao X (2019) Hsme: hypersphere manifold embedding for visible thermal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8385–8392
Ye M, Lan X, Wang Z, Yuen PC (2020) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forensics Secur 15:407–419
Liu J, Song W, Chen C, Liu F (2022) Cross-modality person re-identification via channel-based partition network. Appl Intell 52:2423–2435
Li D, Wei X, Hong X, Gong Y (2020) Infrared-visible cross-modal person re-identification with an x modality. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4610–4617
Liu Q, Teng Q, Chen H, Li B, Qing L (2022) Dual adaptive alignment and partitioning network for visible and infrared cross-modality person re-identification. Appl Intell 52:547–563
Jia M, Zhai Y, Lu S, Ma S, Zhang J (2020) A similarity inference metric for rgb-infrared cross-modality person re-identification. In: International joint conference on artificial intelligence (IJCAI), pp 1026–1032
Zhao Z, Liu B, Chu Q, Lu Y, Yu N (2021) Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 3520–3528
Wei X, Li D, Hong X, Ke W, Gong Y (2020) Co-attentive lifting for infrared-visible person re-identification. In: Proceedings of the 28th ACM international conference on multimedia, pp 1028–1037
Yang M, Huang Z, Hu P, Li T, Lv J, Peng X (2022) Learning with twin noisy labels for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14308–14317
Sun H, Liu J, Zhang Z, Wang C, Qu Y, Xie Y, Ma L (2022) Not all pixels are matched: dense contrastive learning for cross-modality person re-identification. In: Proceedings of the 30th ACM international conference on multimedia, pp 5333–5341
Zhang Y, Wang H (2023) Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2153–2162
Acknowledgements
This work was supported by the National Key R &D Program under Grant No. 2019YFF0301800, the National Natural Science Foundation of China under Grant No. 61379106, and the Shandong Provincial Natural Science Foundation under Grant Nos.ZR2013FM036 and ZR2015FM011.
Author information
Authors and Affiliations
Contributions
The first author Wenbin Shao is the designer of method and the writer of paper. He is the major contributor of this paper. The corresponding author Yujie Liu provides comprehensive guidance. The third author(Wenxin Zhang) and fourth author(Zongmin Li) provides valuable advice and help.
Corresponding author
Ethics declarations
Competing interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shao, W., Liu, Y., Zhang, W. et al. Cross modality person re-identification via mask-guided dynamic dual-task collaborative learning. Appl Intell 54, 3723–3736 (2024). https://doi.org/10.1007/s10489-024-05344-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05344-x