Information disentanglement based cross-modal representation learning for visible-infrared person re-identification

Zhu, Xiaoke; Zheng, Minghao; Chen, Xiaopan; Zhang, Xinyu; Yuan, Caihong; Zhang, Fan

doi:10.1007/s11042-022-13669-3

Information disentanglement based cross-modal representation learning for visible-infrared person re-identification

1227: Content-based Image Retrieval
Published: 02 September 2022

Volume 82, pages 37983–38009, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiaoke Zhu ORCID: orcid.org/0000-0002-0664-1832¹,
Minghao Zheng¹,
Xiaopan Chen²,
Xinyu Zhang³,
Caihong Yuan¹ &
…
Fan Zhang^1,4

484 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Visible-infrared person re-identification (VI-ReID) is an important but very challenging task in the automated video surveillance and forensics. Although existing VI-ReID methods have achieved very encouraging results, how to make full use of the useful information contained in cross-modality visible and infrared images has not been well studied. In this paper, we propose an Information Disentanglement based Cross-modal Representation Learning (IDCRL) approach for VI-ReID. Specifically, IDCRL first extracts the shared and specific features from data of each modality by using the shared feature learning module and the specific feature learning module, respectively. To ensure that the shared and specific information can be well disentangled, we impose an orthogonality constraint on the shared and specific features of each modality. To make the shared features extracted from the visible and infrared images of the same person own high similarity, IDCRL designs a shared feature consistency constraint. Furthermore, IDCRL uses a modality-aware loss to ensure that the useful modality-specific features can be extracted from each modality effectively. Then, the obtained shared and specific features are concatenated as the representation of each image. Finally, identity loss function and cross-modal discriminant loss function are employed to enhance the discriminability of the obtained image representation. We conducted comprehensive experiments on the benchmark visible-infrared pedestrian datasets (SYSU-MM01 and RegDB) to evaluate the efficacy of our IDCRL approach. Experimental results demonstrate that IDCRL outperforms the compared state-of-the-art methods. On the SYSU-MM01 dataset, the rank-1 matching rate of our approach reaches 62.35% and 71.64% in the all-search and in-door modes, respectively. On the RegDB dataset, the rank-1 result of our approach reaches 76.32% and 75.49% in the visible to thermal and thermal to visible modes, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

A Discriminative Feature Learning Approach for Deep Face Recognition

Data Availability

The SYSU-MM01 dataset that supports the findings of this study is available from the corresponding author upon reasonable request. The researchers will need to sign a dataset release agreement before they can obtain the download link. The related page for SYSU-MM01 is https://github.com/wuancong/SYSU-MM01. The RegDB dataset that supports the findings of this study is publicly available online at https://github.com/bismex/HiCMD and https://drive.google.com/file/d/1gnVt9GIQSvium_mcxc7AWLhSXm6lNWsa/view?usp=sharing.

References

Basaran E, Gökmen M, Kamasak M E (2020) An efficient framework for visible-infrared cross modality person re-identification. Signal Process: Image Commun 87:115933. https://doi.org/10.1016/j.image.2020.115933 https://doi.org/10.1016/j.image.2020.115933
Google Scholar
Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: IEEE Conference on CVPR, pp 1320–1329
Chen W, Lu Y, Ma H, Chen Q, Wu X, Wu P (2022) Self-attention mechanism in person re-identification models. Multimed Tools Applic 81 (4):4649–4667. https://doi.org/10.1007/s11042-020-10494-4
Article Google Scholar
Chen Y, Wan L, Li Z, Jing Q, Sun Z (2021) Neural feature search for rgb-infrared person re-identification. In: IEEE Conference on CVPR, pp 587–597
Choi S, Lee S, Kim Y, Kim T, Kim C (2020) Hi-cmd: ical cross-modality disentanglement for visible-infrared person re-identification. In: IEEE Conference on CVP, pp 10254–10263
Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377. https://doi.org/10.1109/TIP.2018.2878505
Article MathSciNet Google Scholar
Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp 677–683
Feng Z-X, Lai J, Xie X (2020) Learning modality-specific representations for visible-infrared person re-identification. IEEE Trans Image Process 29:579–590. https://doi.org/10.1109/TIP.2019.2928126
Article MathSciNet MATH Google Scholar
Hao X, Zhao S, Ye M, Shen J (2021) Cross-modality person re-identification via modality confusion and center aggregation. In: IEEE Conference on ICCV, pp 16383–16392
Hao Y, Wang N, Gao X, Li J, Wang X (2019) Dual-alignment feature embedding for cross-modality person re-identification. In: ACM Multimedia, pp 57–65
Hao Y, Wang N, Li J, Gao X (2019) HSME: hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp 8385–8392
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on CVPR, pp 770–778
Huang P, Zhu S, Wang D, Liang Z (2022) Cross-modality person re-identication with triple-attentive feature aggregation. Multimed Tools Applic 81(3):4455–4473. https://doi.org/10.1007/s11042-021-11739-6 https://doi.org/10.1007/s11042-021-11739-6
Article Google Scholar
Jia M, Zhai Y, Lu S, Ma S, Zhang J (2020) A similarity inference metric for rgb-infrared cross-modality person re-identification. In: IJCAI, pp 1026–1032
Jia X, Jing X-Y, Zhu X, Chen S, Du B, Cai Z, He Z, Yue D (2021) Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell 43(7):2496–2509. https://doi.org/10.1109/TPAMI.2020.2973634
Article Google Scholar
Jiang J, Jin K, Qi M, Wang Q, Wu J, Chen C (2020) A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406:59–67. https://doi.org/10.1016/j.neucom.2020.03.109 https://doi.org/10.1016/j.neucom.2020.03.109
Article Google Scholar
Kniaz V V, Knyaz V A, Hladuvka J, Kropatsch W G, Mizginov V (2018) Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: ECCV, vol 11134, pp 606–624
Li D, Wei X, Hong X, Gong Y (2020) Infrared-visible cross-modal person re-identification with an X modality. In: AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/5891. Accessed 25 Sept 2021, pp 4610–4617
Li M, Zhu X, Gong S (2020) Unsupervised tracklet person re-identification. IEEE Trans Pattern Anal Mach Intell 42(7):1770–1782. https://doi.org/10.1109/TPAMI.2019.2903058
Article Google Scholar
Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653. https://doi.org/10.1007/s11263-019-01274-1
Article Google Scholar
Liang W, Wang G, Lai J, Xie X (2021) Homogeneous-to-heterogeneous: unsupervised learning for rgb-infrared person re-identification. IEEE Trans Image Process 30:6392–6407. https://doi.org/10.1109/TIP.2021.3092578 https://doi.org/10.1109/TIP.2021.3092578
Article MathSciNet Google Scholar
Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on CVPR, pp 2197–2206
Liao S, Li S Z (2015) Efficient PSD constrained asymmetric metric learning for person re-identification. In: IEEE Conference on ICCV, pp 3685–3693
Liu H, Cheng J, Wang W, Su Y, Bai H (2020) Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing 398:11–19. https://doi.org/10.1016/j.neucom.2020.01.089 https://doi.org/10.1016/j.neucom.2020.01.089
Article Google Scholar
Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: IEEE Conference on ICCV, pp 3810–3818
Lu Y, Wu Y, Liu B, Zhang T, Li B, Chu Q, Yu N (2020) Cross-modality person re-identification with shared-specific feature transfer. In: IEEE Conference on CVPR, pp 13376–13386
Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical gaussian descriptor for person re-identification. In: IEEE Conference on CVPR, pp 1363–1372
Meng J, Wu S, Zheng W-S (2019) Weakly supervised person re-identification. In: IEEE Conference on CVPR, pp 760–769
Nguyen D T, Hong H G, Kim K-W, Park K R (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):605. https://doi.org/10.3390/s17030605 https://doi.org/10.3390/s17030605
Article Google Scholar
Park H, Lee S, Lee J, Ham B (2021) Learning by aligning: visible-infrared person re-identification using cross-modal correspondences. In: IEEE Conference on ICCV, pp 12046–12055
Qi M, Wang S, Huang G, Jiang J, Wu J, Chen C (2021) Mask-guided dual attention-aware network for visible-infrared person re-identification. Multimed Tools Applic 80(12):17645–17666. https://doi.org/10.1007/s11042-020-10431-5
Article Google Scholar
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV, pp 501–518
Tian X, Zhang Z, Lin S, Qu Y, Xie Y, Ma L (2021) Farewell to mutual information: variational distillation for cross-modal person re-identification. In: IEEE Conference on CVPR, pp 1522–1531
Wang G, Yang Y, Zhang T, Cheng J, Hou Z, Tiwari P, Pandey H M (2020) Cross-modality paired-images generation and augmentation for rgb-infrared person re-identification. Neural Netw 128:294–304. https://doi.org/10.1016/j.neunet.2020.05.008
Article Google Scholar
Wang G, Zhang T, Cheng J, Liu S, Yang Y, Hou Z (2019) Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In: IEEE Conference on ICCV, pp 3622–3631
Wang G, Zhang T, Yang Y, Cheng J, Chang J, Liang X, Hou Z-G (2020) Cross-modality paired-images generation for rgb-infrared person re-identification. In: AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/6894. Accessed 28 Sept 2021, pp 12144–12151
Wang X, Girshick R B, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on CVPR, pp 7794–7803
Wang Z, Wang Z, Zheng Y, Chuang Y-Y, Satoh S (2019) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: IEEE Conference on CVPR, pp 618–626
Wu A, Zheng W-S, Gong S, Lai J (2020) RGB-IR person re-identification by cross-modality similarity preservation. Int J Comput Vision 128 (6):1765–1785. https://doi.org/10.1007/s11263-019-01290-1
Article MathSciNet Google Scholar
Wu A, Zheng W-S, Guo X, Lai J-H (2019) Distilled person re-identification: towards a more scalable system. In: IEEE Conference on CVPR, pp 1187–1196
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. In: IEEE conference on ICCV, pp 5390–5399
Wu D, Ye M, Lin G, Gao X, Shen J (2021) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf Forensics Secur 17:115–126. https://doi.org/10.1109/TIFS.2021.3075894
Article Google Scholar
Wu F, Jing X-Y, Wu Z, Ji Y, Dong X, Luo X, Huang Q, Wang R (2020) Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recogn 104:107335. https://doi.org/10.1016/j.patcog.2020.107335
Article Google Scholar
Wu Q, Dai P, Chen J, Lin C-W, Wu Y, Huang F, Zhong B, Ji R (2021) Discover cross-modality nuances for visible-infrared person re-identification. In: IEEE Conference on CVPR, pp 4330–4339
Wu W, Tao D, Li H, Yang Z, Cheng J (2021) Deep features for person re-identification on metric learning. Pattern Recogn 110:107424. https://doi.org/10.1016/j.patcog.2020.107424
Article Google Scholar
Wu Y, Bourahla O E F, Li X, Wu F, Tian Q, Zhou X (2020) Adaptive graph representation learning for video person re-identification. IEEE Trans Image Process 29:8821–8830. https://doi.org/10.1109/CVPR.2019.00128 https://doi.org/10.1109/CVPR.2019.00128
Article MATH Google Scholar
Xie Z, Li L, Zhong X, Zhong L, Xiang J (2020) Image-to-video person re-identification with cross-modal embeddings. Pattern Recogn Lett 133:70–76. https://doi.org/10.1016/j.patrec.2019.03.003
Article Google Scholar
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi S C H (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/TPAMI.2021.3054775 https://doi.org/10.1109/TPAMI.2021.3054775
Ye M, Lan X, Leng Q (2019) Modality-aware collaborative learning for visible thermal person re-identification. In: ACM Multimedia, pp 347–355
Ye M, Lan X, Leng Q, Shen J (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans Image Process 29:9387–9399. https://doi.org/10.1109/TIP.2020.2998275 https://doi.org/10.1109/TIP.2020.2998275
Article MATH Google Scholar
Ye M, Lan X, Li J, Yuen P C (2018) Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16734, pp 7501–7508
Ye M, Lan X, Wang Z, Yuen P C (2020) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forensics Secur 15:407–419. https://doi.org/10.1109/TIFS.2019.2921454 https://doi.org/10.1109/TIFS.2019.2921454
Article Google Scholar
Ye M, Ruan W, Du B, Shou M Z (2021) Channel augmented joint learning for visible-infrared recognition. In: IEEE Conference on ICCV, pp 13547–13556
Ye M, Shen J, Crandall D J, Shao L, Luo J (2020) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: ECCV, vol 12362, pp 229–247
Ye M, Shen J, Shao L (2021) Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Trans Inf Forensics Secur 16:728–739. https://doi.org/10.1109/TIFS.2020.3001665 https://doi.org/10.1109/TIFS.2020.3001665
Article Google Scholar
Ye M, Wang Z, Lan X, Yuen P C (2018) Visible thermal person re-identification via dual-constrained top-ranking. In: IJCAI, pp 1092–1099
Yin J, Wu A, Zheng W-S (2020) Fine-grained person re-identification. Int J Comput Vis 128(6):1654–1672. https://doi.org/10.1007/s11263-019-01259-0
Article Google Scholar
Yu H-X, Wu A, Zheng W-S (2020) Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Trans Pattern Anal Mach Intell 42 (4):956–973. https://doi.org/10.1109/CVPR.2019.00085 https://doi.org/10.1109/CVPR.2019.00085
Article Google Scholar
Zhang P, Xu J, Wu Q, Huang Y, Zhang J (2020) Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Trans Circuits Syst Video Technol 30(12):4554–4566. https://doi.org/10.1109/TCSVT.2019.2939564
Article Google Scholar
Zhang S, Yang Y, Wang P, Zhang X, Zhang Y (2019) Attend to the difference: cross-modality person re-identification via contrastive correlation. arXiv:??abs/1910.11656
Zhang W, He X, Yu X, Lu W, Zha Z, Tian Q (2020) A multi-scale spatial-temporal attention model for person re-identification in videos. IEEE Trans Image Process 29:3365–3373. https://doi.org/10.1109/TIP.2019.2959653
Article MATH Google Scholar
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv:1711.08184
Zhao Y-B, Lin J-W, Xuan Q, Xi X (2019) HPILN: a feature learning framework for cross-modality person re-identification. IET Image Processing 13(14):2897–2904. https://doi.org/10.1049/iet-ipr.2019.0699 https://doi.org/10.1049/iet-ipr.2019.0699
Article Google Scholar
Zhu X, Jing X-Y, Zhang F, Zhang X, You X, Cui X (2019) Distance learning by mining hard and easy negative samples for person re-identification. Pattern Recogn 95:211–222. https://doi.org/10.1016/j.patcog.2019.06.007
Article Google Scholar
Zhu Y, Yang Z, Wang L, Zhao S, Hu X, Tao D (2020) Hetero-center loss for cross-modality person re-identification. Neurocomputing 386:97–109. https://doi.org/10.1016/j.neucom.2019.12.100
Article Google Scholar

Download references

Acknowledgments

This work was supported by the NSFC Project (No. 62176069), Young Scientists Fund of the National Natural Science Foundation of China (No. 62006070), Natural Science Foundation of Henan Province (Nos. 202300410092 and 202300410093), Key Scientific and Technological Project of Henan Province of China (Nos. 222102210204 and 222102210197), and the Excellent Youth Scientific Research Project of Hunan Education Department (No. 21B0582).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Henan University, Kaifeng, China
Xiaoke Zhu, Minghao Zheng, Caihong Yuan & Fan Zhang
Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
Xiaopan Chen
School of Computer, Wuhan University, Wuhan, China
Xinyu Zhang
Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, China
Fan Zhang

Authors

Xiaoke Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Minghao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Caihong Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiaopan Chen or Xinyu Zhang.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, X., Zheng, M., Chen, X. et al. Information disentanglement based cross-modal representation learning for visible-infrared person re-identification. Multimed Tools Appl 82, 37983–38009 (2023). https://doi.org/10.1007/s11042-022-13669-3

Download citation

Received: 11 February 2022
Revised: 08 June 2022
Accepted: 11 August 2022
Published: 02 September 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11042-022-13669-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information disentanglement based cross-modal representation learning for visible-infrared person re-identification

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A Discriminative Feature Learning Approach for Deep Face Recognition

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information disentanglement based cross-modal representation learning for visible-infrared person re-identification

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A Discriminative Feature Learning Approach for Deep Face Recognition

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation