Skip to main content
Log in

Cross-modality person re-identication with triple-attentive feature aggregation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Cross-modal person re-identification between the visible (RGB) modality and infrared (IR) modality is extremely important for nighttime surveillance applications. In addition to the cross-modal differences caused by different camera spectra, RGB-IR person re-identification is also affected by the large cross-modal and intra-modal variations caused by different camera views and person poses. On the other hand, existing VI-ReID works tend to learn global representations with limited discriminative power and weak robustness to noisy images. In this paper, we propose a novel three-attentional aggregation (TAANet) learning method by mining intra-modal hierarchical and cross-modal graph-level contextual cues of VI-ReID. We propose an intra-modal hybrid weight attention module, which extracts distinguished local aggregated features by mining channel and local feature relationships. To enhance robustness to noisy samples, we introduce an improved triple loss combined with a center loss that takes into account the distance between the different classes closest to the sample, allowing a certain distance to be maintained between classes and improving the discrimination of features. Extensive experiments show that TAANet outperforms state-of-the-art methods in a variety of settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bak S, Zaidenberg S, Boulay B, Bremond F (2014) Improving person re-identification by viewpoint cues. Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, In, pp 175–180

    Google Scholar 

  2. Basaran E, G¨okmen M, Kamasak ME (2020) An efficient framework for visible–infrared cross modality person re-identification. https://arxiv.org/abs/1907.06498, pp 1-12

  3. Chang X, Hospedales TM, Xiang T (2018) Multi-level factorisation net for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2109–2118

    Google Scholar 

  4. Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 3640–3649

    Google Scholar 

  5. Cho YJ, Yoon KJ (2016) Improving person re-identification via poseaware multi-shot matching. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1354–1362

    Google Scholar 

  6. Dai P, Ji R, Wang H, Wu Q, Huang Y (2018) Cross-modality person re-identification with generative adversarial training. Proceedings of the IEEE Joint Conference on Artificial Intelligence, In, pp 667–683

    Google Scholar 

  7. Feng Z, Lai J, Xie X (2019) Learning modality-specific representations for visible-infrared person re-identification. IEEE Transactions on Image Processing 29(7):579–590

    MathSciNet  Google Scholar 

  8. Fu J, Liu J, Tian H, Li Y, Bao Y et al (2019) Dual attention network for scene segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 3146–3154

    Google Scholar 

  9. Gheissari N, Sebastian TB, Hartley R (2006) Person reidentification using spatiotemporal appearance. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 1528–1535

    Google Scholar 

  10. Gong S, Cristani M, Loy CC, Hospedales TM (2014) The re-identification challenge. Proceedings of the IEEE Conference on Person re-identification, Springer, In, pp 1–20

    Google Scholar 

  11. Han C, Zheng R, Gao C, Sang N (2019) Complementation-reinforced attention network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30(10):3433–3445

    Article  Google Scholar 

  12. Hao Y, Wang N, Li J, Gao X (2019) Hsme: hypersphere manifold embedding for visible thermal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 8385–8392

    Google Scholar 

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 770–778

    Google Scholar 

  14. Huang Y, Zha ZJ, Fu X, Zhang W (2019) Illumination-invariant person reidentification. Proceedings of the ACM Conference on Multimedia, In, pp 365–373

    Google Scholar 

  15. Jiang J, Jin K, Qi M, Wang Q, Wu J et al (2020) A cross-modal multi-granularity attention network for rgb-ir person re-identification. Neurocomputing 406:59–67

    Article  Google Scholar 

  16. Jin X, Lan C, Zeng W, Chen Z (2020) Global distance-distributions separation for unsupervised person re-identification. Proceedings of the European Conference on Computer Vision, In, pp 735–751

    Google Scholar 

  17. Karanam S, Li Y, Radke RJ (2019) Person re-identification with discriminatively trained viewpoint invariant dictionaries. Proceedings of the IEEE Conference on Computer Vision, In, pp 4516–4524

    Google Scholar 

  18. Leng Q, Ye M, Tian Q (2019) A survey of open-world person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30(4):1092–1108

    Article  Google Scholar 

  19. Li S, Xiao T, Li H, Zhou B, Yue D et al (2017) Person search with natural language description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 1970–1979

    Google Scholar 

  20. Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2285–2294

    Google Scholar 

  21. Li D, Wei X, Hong X, Gong Y (2020) Infrared-visible cross-modal person re-identification with an x modality. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 4610–4617

    Google Scholar 

  22. Liu H, Tan X, Zhou X (2020) Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. https://arxiv.org/abs/2008.06223, pp 1-12

  23. Liu H, Cheng J, Wang W, Su Y, Bai H (2020) Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing 398:11–19

    Article  Google Scholar 

  24. Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 1487–1495

    Google Scholar 

  25. Luo H, Jiang W, Gu Y, Liu F, Liao X et al (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia 22(10):2597–2609

    Article  Google Scholar 

  26. Nguyen DT, Hong HG, Kim KW, Park KR (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):1–29

    Google Scholar 

  27. Su C, Li J, Zhang S, Xing J, Gao W et al (2017) Pose-driven deep convolutional model for person re-identification. Proceedings of the IEEE Conference on Computer Vision, In, pp 3960–3969

    Google Scholar 

  28. Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). Proceedings of the European Conference on Computer Vision, In, pp 480–496

    Google Scholar 

  29. Sun L, Jiang Z, Song H, Lu Q, Men A (2018) Semi-coupled dictionary learning with relaxation label space transformation for video-based person re-identification. IEEE Access 6:12587–12597

    Article  Google Scholar 

  30. Sun Y, Xu Q, Li Y, Zhang C, Li Y et al (2019) Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 393–402

    Google Scholar 

  31. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al (2017) Attention is all you need. Proceedings of the Annual Conference on Neural Information Processing Systems, In, pp 5998–6008

    Google Scholar 

  32. Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, et al (2017) Graph attention networks. https://arxiv.org/abs/1710.10903: 1-12

  33. Wang F, Jiang M, Qian C, Yang S, Li C et al (2017) Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 3156–3164

    Google Scholar 

  34. Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. Proceedings of the ACM Conference on Multimedia, In, pp 274–282

    Google Scholar 

  35. Wang J, Zhu X, Gong S, Li W (2018) Transferable joint attribute-identity deep learning for unsupervised person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2275–2284

    Google Scholar 

  36. Wang C, Zhang Q, Huang C, Liu W, Wang X (2018) Mancs: A multi-task attentional network with curriculum sampling for person re-identification. Proceedings of the European Conference on Computer Vision, In, pp 365–381

    Google Scholar 

  37. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 7794–7803

    Google Scholar 

  38. Wang G, Zhang T, Cheng J, Liu S, Yang Y et al (2019) Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. Proceedings of the IEEE Conference on Computer Vision, In, pp 3623–3632

    Google Scholar 

  39. Wang Z, Wang Z, Zheng Y, Chuang YY, Satoh S (2019) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 618–626

    Google Scholar 

  40. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. Proceedings of the European conference on computer vision, In, pp 499–515

    Google Scholar 

  41. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision, In, pp 3–19

    Google Scholar 

  42. Wu A, Zheng WS, Yu HX, Gong S, Lai J (2017) Rgb-infrared cross-modality person re-identification. Proceedings of the IEEE Conference on Computer Vision, In, pp 5380–5389

    Google Scholar 

  43. Wu A, Zheng WS, Lai JH (2017) Robust depth-based person re-identification. IEEE Transactions on Image Processing 26(6):2588–2603

    Article  MathSciNet  MATH  Google Scholar 

  44. Wu L, Wang Y, Gao J, Li X (2018) Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Transactions on Multimedia 21(6):1412–1424

    Article  Google Scholar 

  45. Wu D, Zheng SJ, Zhang XP, Yuan CA, Cheng F et al (2019) Deep learning-based methods for person re-identification: A comprehensive review. Neurocomputing 337:354–371

    Article  Google Scholar 

  46. Xu K, Ba J, Kiros R, Cho K, Courville A et al (2015) Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the IEEE Conference on Machine Learning, In, pp 2048–2057

    Google Scholar 

  47. Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018) Attention-aware compositional network for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 2119–2128

    Google Scholar 

  48. Yang X, Wang M, Tao D (2017) Person re-identification with metric learning using privileged information. IEEE Transactions on Image Processing 27(2):791–805

    Article  MathSciNet  MATH  Google Scholar 

  49. Yang F, Yan K, Lu S, Jia H, Xie X et al (2019) Attention driven person re-identification. Pattern Recognition 86:143–155

    Article  Google Scholar 

  50. Yao H, Zhang S, Hong R, Zhang Y, Xu C et al (2019) Deep representation learning with part loss for person re-identification. IEEE Transactions on Image Processing 28(6):2860–2871

    Article  MathSciNet  MATH  Google Scholar 

  51. Ye M, Shen J, J Crandall D, Shao L, Luo J (2020) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Proceedings of the European Conference on Computer Vision, pp 229-247

  52. Ye M, Lan X, Li J, Yuen P (2018) Hierarchical discriminative learning for visible thermal person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 750–7508

    Google Scholar 

  53. Ye M, Wang Z, Lan X, Yuen PC (2018) Visible thermal person re-identification via dual-constrained top-ranking. Proceedings of the AAAI Conference on Artificial Intelligence, In, pp 1092–1099

    Google Scholar 

  54. Ye M, Lan X, Wang Z, Yuen PC (2019) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Transactions on Information Forensics and Security 15(6):407–419

    Google Scholar 

  55. Ye M, Lan X, Leng Q, Shen J (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Transactions on Image Processing 29:9387–9399

    Article  Google Scholar 

  56. Ye M, Shen J, Lin G, Xiang T, Shao L, et al (2021) Deep learning for person re-identification: A survey and outlook. https://arxiv.org/abs/2001.04193, pp 1-20

  57. Yuan Y, Zhang J, Wang Q (2020) Deep gabor convolution network for person re-identification. Neurocomputing 378:387–398

    Article  Google Scholar 

  58. Zhang Y, Li K, Li K, Zhong B, Fu Y (2018) Residual non-local attention networks for image restoration. Proceedings of the International Conference on Conference on Learning Representations, In, pp 1–18

    Google Scholar 

  59. Zhang JA, Wang Q, Yuan Y (2019) Metric learning by simultaneously learning linear transformation matrix and weight matrix for person re-identification. IET Computer Vision 13(4):428–434

    Article  Google Scholar 

  60. Zhao H, Tian M, Sun S, Shao J, Yan J et al (2017) Spindle net: Person re-identification with human body region guided feature decomposition and fusion. Proceedings of the IEEE conference on computer vision and pattern recognition, In, pp 1077–1085

    Google Scholar 

  61. Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. Proceedings of the IEEE Conference on Computer Vision, In, pp 3219–3228

    Google Scholar 

  62. Zhao YB, Lin JW, Xuan Q, Xi X (2019) Hpiln: a feature learning framework for cross-modality person re-identification. IET Image Processing 13(14):2897–2904

    Article  Google Scholar 

  63. Zheng F, Deng C, Sun X, Jiang X, Guo X et al (2019) Pyramidal person re-identification via multi-loss dynamic training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, In, pp 8514–8522

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songhao Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, P., Zhu, S., Wang, D. et al. Cross-modality person re-identication with triple-attentive feature aggregation. Multimed Tools Appl 81, 4455–4473 (2022). https://doi.org/10.1007/s11042-021-11739-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11739-6

Keywords

Navigation