Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-Identification

Zhang, Yiyuan; Zhao, Sanyuan; Kang, Yuhao; Shen, Jianbing

doi:10.1007/978-3-031-19781-9_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13674))

Included in the following conference series:

European Conference on Computer Vision

2506 Accesses
11 Citations

Abstract

Visible-Infrared Re-Identification (VI-ReID) is challenging in image retrievals. The modality discrepancy will easily make huge intra-class variations. Most existing methods either bridge different modalities through modality-invariance or generate the intermediate modality for better performance. Differently, this paper proposes a novel framework, named Modality Synergy Complement Learning Network (MSCLNet) with Cascaded Aggregation. Its basic idea is to synergize two modalities to construct diverse representations of identity-discriminative semantics and less noise. Then, we complement synergistic representations under the advantages of the two modalities. Furthermore, we propose the Cascaded Aggregation strategy for fine-grained optimization of the feature distribution, which progressively aggregates feature embeddings from the subclass, intra-class, and inter-class. Extensive experiments on SYSU-MM01 and RegDB datasets show that MSCLNet outperforms the state-of-the-art by a large margin. On the large-scale SYSU-MM01 dataset, our model can achieve 76.99% and 71.64% in terms of Rank-1 accuracy and mAP value. Our code will be available at https://github.com/bitreidgroup/VI-ReID-MSCLNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, S.M., Lejbolle, A.R., Panda, R., Roy-Chowdhury, A.K.: Camera on-boarding for person re-identification using hypothesis transfer learning. In: CVPR, pp. 12144–12153 (2020)
Google Scholar
Bai, S., Tang, P., Torr, P.H., Latecki, L.J.: Re-ranking via metric fusion for object retrieval and person re-identification. In: CVPR, pp. 740–749 (2019)
Google Scholar
Chen, G., Lin, C., Ren, L., Lu, J., Zhou, J.: Self-critical attention learning for person re-identification. In: ICCV, pp. 9637–9646 (2019)
Google Scholar
Chen, T., et al.: ABD-net: attentive but diverse person re-identification. In: CVPR, pp. 8351–8361 (2019)
Google Scholar
Chen, Y., Wan, L., Li, Z., Jing, Q., Sun, Z.: Neural feature search for RGB-infrared person re-identification. In: CVPR, pp. 587–597, June 2021
Google Scholar
Choi, S., Lee, S., Kim, Y., Kim, T., Kim, C.: Hi-CMD: hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: CVPR, pp. 10257–10266 (2020)
Google Scholar
Dai, P., Ji, R., Wang, H., Wu, Q., Huang, Y.: Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp. 677–683 (2018)
Google Scholar
Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., Jiao, J.: Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: CVPR, pp. 994–1003 (2018)
Google Scholar
Feng, Z., Lai, J., Xie, X.: Learning modality-specific representations for visible-infrared person re-identification. IEEE TIP 29, 579–590 (2019)
MathSciNet MATH Google Scholar
Fu, C., Hu, Y., Wu, X., Shi, H., Mei, T., He, R.: CM-NAS: cross-modality neural architecture search for visible-infrared person re-identification. In: ICCV, pp. 11823–11832, October 2021
Google Scholar
Hao, X., Zhao, S., Ye, M., Shen, J.: Cross-modality person re-identification via modality confusion and center aggregation. In: ICCV, pp. 16403–16412, October 2021
Google Scholar
Hao, Y., Wang, N., Li, J., Gao, X.: HSME: hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp. 8385–8392 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
Jia, M., Zhai, Y., Lu, S., Ma, S., Zhang, J.: A similarity inference metric for RGB-infrared cross-modality person re-identification. arXiv preprint arXiv:2007.01504 (2020)
Jin, X., Lan, C., Zeng, W., Chen, Z., Zhang, L.: Style normalization and restitution for generalizable person re-identification. In: CVPR, pp. 3143–3152 (2020)
Google Scholar
Li, D., Wei, X., Hong, X., Gong, Y.: Infrared-visible cross-modal person re-identification with an x modality. In: AAAI, pp. 4610–4617 (2020)
Google Scholar
Li, H., Wu, G., Zheng, W.S.: Combined depth space based architecture search for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6729–6738 (2021)
Google Scholar
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2898–2907 (2021)
Google Scholar
Lin, Y., Xie, L., Wu, Y., Yan, C., Tian, Q.: Unsupervised person re-identification via softened similarity learning. In: CVPR, pp. 3390–3399 (2020)
Google Scholar
Lu, Y., et al.: Cross-modality person re-identification with shared-specific feature transfer. In: CVPR, pp. 13379–13389 (2020)
Google Scholar
Luo, C., Chen, Y., Wang, N., Zhang, Z.: Spectral feature transformation for person re-identification. In: CVPR, pp. 4976–4985 (2019)
Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: CVPR Workshops (2019)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Lear. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Melis, G., Kočiskỳ, T., Blunsom, P.: Mogrifier LSTM. arXiv preprint arXiv:1909.01792 (2019)
Meng, J., Zheng, W.S., Lai, J.H., Wang, L.: Deep graph metric learning for weakly supervised person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 6074–6093 (2021)
Google Scholar
Moon, H., Phillips, P.J.: Computational and performance aspects of PCA-based face-recognition algorithms. Perception 30(3), 303–321 (2001)
Article Google Scholar
Nguyen, D.T., Hong, H.G., Kim, K.W., Park, K.R.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3), 605 (2017)
Article Google Scholar
Paisitkriangkrai, S., Shen, C., Van Den Hengel, A.: Learning to rank in person re-identification with metric ensembles. In: CVPR, pp. 1846–1855 (2015)
Google Scholar
Pu, N., Chen, W., Liu, Y., Bakker, E.M., Lew, M.S.: Dual Gaussian-based variational subspace disentanglement for visible-infrared person re-identification. In: ACMMM, pp. 2149–2158 (2020)
Google Scholar
Ren, C.X., Liang, B.H., Lei, Z.: Domain adaptive person re-identification via camera style generation and label propagation. IEEE Trans. Inf. Forensics Secur. 15, 1290–1302 (2019)
Article Google Scholar
Sun, D., Yao, A., Zhou, A., Zhao, H.: Deeply-supervised knowledge synergy. In: CVPR, pp. 6997–7006 (2019)
Google Scholar
Sun, X., Zheng, L.: Dissecting person re-identification from the viewpoint of viewpoint. In: CVPR, pp. 608–617 (2019)
Google Scholar
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Chapter Google Scholar
Wang, G.A., et al.: Cross-modality paired-images generation for RGB-infrared person re-identification. In: AAAI, pp. 12144–12151 (2020)
Google Scholar
Wang, G., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: CVPR, pp. 6449–6458 (2020)
Google Scholar
Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., Hou, Z.: RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In: ICCV, pp. 3623–3632 (2019)
Google Scholar
Wang, J., Zhu, X., Gong, S., Li, W.: Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: CVPR, pp. 2275–2284 (2018)
Google Scholar
Wang, Y., Chen, Z., Feng, W., Gang, W.: Person re-identification with cascaded pairwise convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: CVPR, pp. 618–626 (2019)
Google Scholar
Wei, Z., Yang, X., Wang, N., Gao, X.: Syncretic modality collaborative learning for visible infrared person re-identification. In: ICCV, pp. 225–234, October 2021
Google Scholar
Wu, A., Zheng, W.-S., Gong, S., Lai, J.: RGB-IR person re-identification by cross-modality similarity preservation. IJCV 128(6), 1765–1785 (2020). https://doi.org/10.1007/s11263-019-01290-1
Article MathSciNet Google Scholar
Wu, A., Zheng, W.S., Yu, H.X., Gong, S., Lai, J.: RGB-infrared cross-modality person re-identification. In: ICCV, pp. 5380–5389 (2017)
Google Scholar
Wu, D., Ye, M., Lin, G., Gao, X., Shen, J.: Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans. Inf. Forensics Secur. 17, 115–126 (2021)
Article Google Scholar
Wu, Q., et al.: Discover cross-modality nuances for visible-infrared person re-identification. In: CVPR, pp. 4330–4339, June 2021
Google Scholar
Xuan, S., Zhang, S.: Intra-inter camera similarity for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11926–11935 (2021)
Google Scholar
Ye, M., Lan, X., Leng, Q., Shen, J.: Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE TIP 29, 9387–9399 (2020)
MATH Google Scholar
Ye, M., Lan, X., Li, J., Yuen, P.C.: Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI, pp. 7501–7508 (2018)
Google Scholar
Ye, M., Lan, X., Wang, Z., Yuen, P.C.: Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE TIFS 15, 407–419 (2019)
Google Scholar
Ye, M., Shen, J., J. Crandall, D., Shao, L., Luo, J.: Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 229–247. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_14
Chapter Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. arXiv preprint arXiv:2001.04193 (2020)
Ye, M., Shen, J., Shao, L.: Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE TIFS 16, 728–739 (2020)
Google Scholar
Yu, S., Li, S., Chen, D., Zhao, R., Yan, J., Qiao, Y.: COCAS: a large-scale clothes changing person dataset for re-identification. In: CVPR, pp. 3400–3409 (2020)
Google Scholar
Zhang, X., Ge, Y., Qiao, Y., Li, H.: Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3436–3445 (2021)
Google Scholar
Zhang, Z., Lan, C., Zeng, W., Chen, Z.: Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In: CVPR, pp. 10407–10416 (2020)
Google Scholar
Zhang, Z., Lan, C., Zeng, W., Jin, X., Chen, Z.: Relation-aware global attention for person re-identification. In: CVPR, pp. 3186–3195 (2020)
Google Scholar
Zheng, F., et al.: Pyramidal person re-identification via multi-loss dynamic training. In: CVPR, pp. 8514–8522 (2019)
Google Scholar
Zheng, M., Karanam, S., Wu, Z., Radke, R.J.: Re-identification with consistent attentive Siamese networks. In: CVPR, pp. 5735–5744 (2019)
Google Scholar
Zhong, Z., Zheng, L., Zheng, Z., Li, S., Yang, Y.: Camera style adaptation for person re-identification. In: CVPR, pp. 5157–5166 (2018)
Google Scholar
Zhu, X., Jing, X.Y., You, X., Zuo, W., Shan, S., Zheng, W.S.: Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans. Inf. Forensics Secur. 13, 717–732 (2017)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61902027, and the Start-up Research Grant (SRG) of University of Macau.

Author information

Authors and Affiliations

School of Computer Science, Beijing Institute of Technology, Beijing, China
Yiyuan Zhang, Sanyuan Zhao, Yuhao Kang & Jianbing Shen
SKL-IOTSC, Department of Computer and Information Science, University of Macau, Taipa, Macau
Jianbing Shen

Authors

Yiyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sanyuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yuhao Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jianbing Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanyuan Zhao .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1151 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Zhao, S., Kang, Y., Shen, J. (2022). Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-Identification. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-19781-9_27
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19780-2
Online ISBN: 978-3-031-19781-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics