Skip to main content
Log in

Learn Robust Pedestrian Representation Within Minimal Modality Discrepancy for Visible-Infrared Person Re-Identification

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Visible-infrared person re-identification has attracted extensive attention from the community due to its potential great application prospects in video surveillance. There are huge modality discrepancies between visible and infrared images caused by different imaging mechanisms. Existing studies alleviate modality discrepancies by aligning modality distribution or extracting modality-shared features on the original image. However, they ignore a key solution, i.e., converting visible images to gray images directly, which is efficient and effective to reduce modality discrepancies. In this paper, we transform the cross-modality person re-identification task from visible-infrared images to gray-infrared images, which is named as the minimal modality discrepancy. In addition, we propose a pyramid feature integration network (PFINet) which mines the discriminative refined features of pedestrian images and fuses high-level and semantically strong features to build a robust pedestrian representation. Specifically, PFINet first performs the feature extraction from concrete to abstract and the top-down semantic transfer to obtain multi-scale feature maps. Second, the multi-scale feature maps are inputted to the discriminative-region response module to emphasize the identity-discriminative regions by the spatial attention mechanism. Finally, the pedestrian representation is obtained by the feature integration. Extensive experiments demonstrate the effectiveness of PFINet which achieves the rank-1 accuracy of 81.95% and mAP of 74.49% on the multi-all evaluation mode of the SYSU-MM01 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi S C. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2021.3054775.

  2. Zeng M, Yao B, Wang Z J, Shen Y, Li F, Zhang J, Lin H, Guo M. CATIRI: An efficient method for content-and-text based image retrieval. Journal of Computer Science and Technology, 2019, 34(2): 287-304. https://doi.org/10.1007/s11390-019-1911-2.

    Article  MathSciNet  Google Scholar 

  3. Sun Y, Zheng L, Yang Y, Tian Q, Wang S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.480-496. https://doi.org/10.1007/978-3-030-01225-0_30.

  4. Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184, 2017. https://arxiv.org/pdf/1711.08184.pdf, Jan. 2022.

  5. Zhong Z, Zheng L, Cao D, Li S. Re-ranking person re-identification with k-reciprocal encoding. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1318-1327. https://doi.org/10.1109/CVPR.2017.389.

  6. Wu A, Zheng W S, Yu H X, Gong S, Lai J. RGB-infrared cross-modality person re-identification. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.5380-5389. https://doi.org/10.1109/ICCV.2017.575.

  7. Dai P, Ji R, Wang H, Wu Q, Huang Y. Cross-modality person re-identification with generative adversarial training. In Proc. the 27th International Joint Conference on Artificial Intelligence, July 2018, pp.677-683. https://doi.org/10.24963/ijcai.2018/94.

  8. Wang G A, Zhang T, Cheng J, Liu S, Yang Y, Hou Z. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.3623-3632. https://doi.org/10.1109/ICCV.2019.00372.

  9. Wang G A, Zhang T, Yang Y, Cheng J, Chang J, Liang X, Hou Z G. Cross-modality paired-images generation for RGB-infrared person re-identification. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.12144-12151. https://doi.org/10.1609/aaai.v34i07.6894.

  10. Zhao Z, Liu B, Chu Q, Lu Y, Yu N. Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification. In Proc. the 35th Conference on Artificial Intelligence, Feb. 2021, pp.3520-3528.

  11. Chen Y, Wan L, Li Z, Jing Q, Sun Z. Neural feature search for RGB-infrared person re-identification. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.587-597. https://doi.org/10.1109/CVPR46437.2021.00065.

  12. Lu Y, Wu Y, Liu B, Zhang T, Li B, Chu Q, Yu N. Cross-modality person re-identification with shared-specific feature transfer. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.13376-13386. https://doi.org/10.1109/CVPR42600.2020.01339.

  13. Zhu Y, Yang Z, Wang L, Zhao S, Hu X, Tao D. Hetero-center loss for cross-modality person re-identification. Neurocomputing, 2020, 386: 97-109. https://doi.org/10.1016/j.neucom.2019.12.100.

    Article  Google Scholar 

  14. Wu Q, Dai P, Chen J, Lin C W, Wu Y, Huang F, Zhong B, Ji R. Discover cross-modality nuances for visible-infrared person re-identification. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.4330-4339. https://doi.org/10.1109/CVPR46437.2021.00431.

  15. Ding S, Lin L, Wang G, Chao H. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 2015, 48(10): 2993-3003. https://doi.org/10.1016/j.patcog.2015.04.005.

    Article  Google Scholar 

  16. Chen W, Chen X, Zhang J, Huang K. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.403-412. https://doi.org/10.1109/CVPR.2017.145.

  17. Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. arXiv:1703.07737, 2017. https://arxiv.org/pdf/1703.07737.pdf, Jan. 2022.

  18. Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q. Person re-identification in the wild. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1367-1376. https://doi.org/10.1109/CVPR.2017.357.

  19. Qian X, Fu Y, Jiang Y G, Xiang T, Xue X. Multi-scale deep learning architectures for person re-identification. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.5399-5408. https://doi.org/10.1109/ICCV.2017.577.

  20. Sun Y, Zheng L, Deng W, Wang S. SVDNet for pedestrian retrieval. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.3800-3808. https://doi.org/10.1109/ICCV.2017.410.

  21. Guo J, Yuan Y, Huang L, Zhang C, Yao J G, Han K. Beyond human parts: Dual part-aligned representations for person re-identification. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.3642-3651. https://doi.org/10.1109/ICCV.2019.00374.

  22. Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X. Spindle net: Person re-identification with human body region guided feature decomposition and fusion. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.1077-1085. https://doi.org/10.1109/CVPR.2017.103.

  23. Gao S, Wang J, Lu H, Liu Z. Pose-guided visible part matching for occluded person ReID. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020, pp.11744-11752. https://doi.org/10.1109/CVPR42600.2020.01176.

  24. Ge Y, Zhu F, Chen D et al. Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. In Proc. the Annual Conference on Neural Information Processing Systems, Dec. 2020.

  25. Ge Y, Chen D, Li H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv:2001.01526, 2020. https://arxiv.org/pdf/2001.01526.pdf, Jan. 2022.

  26. Chen H, Wang Y, Lagadec B, Dantcheva A, Bremond F. Joint generative and contrastive learning for unsupervised person re-identification. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp.2004-2013. https://doi.org/10.1109/CVPR46437.2021.00204.

  27. Wang Z, Wang Z, Zheng Y, Chuang Y Y, Satoh S. Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.618-626. https://doi.org/10.1109/CVPR.2019.00071.

  28. Ye M, Lan X, Wang Z, Yuen P C. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Transactions on Information Forensics and Security, 2020, 15: 407-419. https://doi.org/10.1109/TIFS.2019.2921454.

    Article  Google Scholar 

  29. Hao Y, Wang N, Li J, Gao X. HSME: Hypersphere manifold embedding for visible thermal person re-identification. In Proc. the AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, pp.8385-8392. https://doi.org/10.1609/aaai.v33i01.33018385.

  30. Ye M, Lan X, Leng Q, Shen J. Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Transactions on Image Processing, 2020, 29: 9387-9399. https://doi.org/10.1109/TIP.2020.2998275.

    Article  Google Scholar 

  31. Jia M, Zhai Y, Lu S, Ma S, Zhang J. A similarity inference metric for RGB-infrared cross-modality person re-identification. In Proc. the 29th International Joint Conference on Artificial Intelligence, Jan. 2021, pp.1026-1032. https://doi.org/10.24963/ijcai.2020/143.

  32. Ye M, Shen J, J Crandall D, Shao L, Luo J. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.229-247. https://doi.org/10.1007/978-3-030-58520-4_14.

  33. Li D, Wei X, Hong X, Gong Y. Infrared-visible cross-modal person re-identification with an X modality. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.4610-4617. https://doi.org/10.1609/aaai.v34i04.5891.

  34. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp.7132-7141. https://doi.org/10.1109/CVPR.2018.00745.

  35. Woo S, Park J, Lee J Y, Kweon I S. Cbam: Convolutional block attention module. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.3-19. https://doi.org/10.1007/978-3-030-01234-2_1.

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, Dec. 2017, pp.5998-6008.

  37. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019, pp.3146-3154. https://doi.org/10.1109/CVPR.2019.00326.

  38. Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2117-2125. https://doi.org/10.1109/CVPR.2017.106.

  39. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90.

  40. Nguyen D T, Hong H G, Kim K W, Park K R. Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, 2017, 17(3): Article No. 605. https://doi.org/10.3390/s17030605.

  41. Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. https://arxiv.org/pdf/1412.6980.pdf, Jan. 2022.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Bin Shao.

Supplementary Information

ESM 1

(PDF 261 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, YJ., Shao, WB. & Sun, XR. Learn Robust Pedestrian Representation Within Minimal Modality Discrepancy for Visible-Infrared Person Re-Identification. J. Comput. Sci. Technol. 37, 641–651 (2022). https://doi.org/10.1007/s11390-022-2146-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-2146-1

Keywords

Navigation