Boosting Generalization Performance in Person Re-identification

Cheng, Lidong; Kuang, Zhenyu; Zhang, Hongyang; Ding, Xinghao; Huang, Yue

doi:10.1007/978-981-99-8549-4_15

Lidong Cheng^15,16,
Zhenyu Kuang¹⁵,
Hongyang Zhang¹⁵,
Xinghao Ding¹⁵ &
…
Yue Huang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14434))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

451 Accesses

Abstract

Generalizable person re-identification (ReID) has gained significant attention in recent years as it poses greater challenges in recognizing individuals across different domains and unseen scenarios. Existing methods are typically limited to a single visual modality, making it challenging to capture rich semantic information across different domains. Recently, pre-trained vision-language models like CLIP have shown promising performances in various tasks by linking visual representations with their corresponding text descriptions. This enables them to capture diverse high-level semantics from the accompanying text and obtain transferable features. However, the adoption of CLIP has been hindered in person ReID due to the labels being typically index-based rather than descriptive texts. To address this limitation, we propose a novel Cross-modal framework wIth Conditional Prompt (CICP) framework based on CLIP involving the Description Prompt Module (DPM) that pre-trains a set of prompts to tackle the lack of textual information in person ReID. In addition, we further propose the Prompt Generalization Module (PGM) incorporates a lightweight network that generates a conditional token for each image. This module shifts the focus from being limited to a class set to being specific to each input instance, thereby enhancing domain generalization capability for the entire task. Through extensive experiments, we show that our proposed method outperforms state-of-the-art (SOTA) approaches on popular benchmark datasets.

The work was supported in part by the National Natural Science Foundation of China under Grant 82172033, U19B2031, 61971369, 52105126, 82272071, 62271430, and the Fundamental Research Funds for the Central Universities 20720230104.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baldrati, A., Bertini, M., Uricchio, T., Del Bimbo, A.: Effective conditioned and composed image retrieval combining clip-based features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21466–21474 (2022)
Google Scholar
Choi, S., Kim, T., Jeong, M., Park, H., Kim, C.: Meta batch-instance normalization for generalizable person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, Y., Yi, D., Liao, S., Lei, Z., Li, S.Z.: Cross dataset person re-identification. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9010, pp. 650–664. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16634-6_47
Chapter Google Scholar
Huang, Y., Du, C., Xue, Z., Chen, X., Zhao, H., Huang, L.: What makes multi-modal learning better than single (provably). Adv. Neural. Inf. Process. Syst. 34, 10944–10956 (2021)
Google Scholar
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning (2021)
Google Scholar
Jia, J., Ruan, Q., Hospedales, T.M.: Frustratingly easy person re-identification: generalizing person re-id in practice. arXiv preprint arXiv:1905.03422 (2019)
Jin, X., Lan, C., Zeng, W., Chen, Z., Zhang, L.: Style normalization and restitution for generalizable person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3143–3152 (2020)
Google Scholar
Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: International Conference on Machine Learning, pp. 5583–5594. PMLR (2021)
Google Scholar
Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900. PMLR (2022)
Google Scholar
Li, S., Sun, L., Li, Q.: CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. arXiv preprint arXiv:2211.13977 (2022)
Liao, S., Shao, L.: Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 456–474. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_27
Chapter Google Scholar
Liao, S., Shao, L.: Transmatcher: deep image matching through transformers for generalizable person re-identification. Adv. Neural. Inf. Process. Syst. 34, 1992–2003 (2021)
Google Scholar
Liao, S., Shao, L.: Graph sampling based deep metric learning for generalizable person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7359–7368 (2022)
Google Scholar
Ma, H., et al.: EI-CLIP: entity-aware interventional contrastive learning for e-commerce cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18051–18061 (2022)
Google Scholar
Ni, H., Song, J., Luo, X., Zheng, F., Li, W., Shen, H.T.: Meta distribution alignment for generalizable person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2487–2496 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. Cornell University - arXiv (2021)
Google Scholar
Rao, Y., et al.: Denseclip: language-guided dense prediction with context-aware prompting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18082–18091 (2022)
Google Scholar
Song, J., Yang, Y., Song, Y.Z., Xiang, T., Hospedales, T.M.: Generalizable person re-identification by domain-invariant mapping network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 719–728 (2019)
Google Scholar
Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer GAN to bridge domain gap for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 79–88 (2018)
Google Scholar
Yang, Q., Yu, H.X., Wu, A., Zheng, W.S.: Patch-based discriminative feature learning for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3633–3642 (2019)
Google Scholar
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: 2014 22nd International Conference on Pattern Recognition, pp. 34–39. IEEE (2014)
Google Scholar
Yu, H.X., Zheng, W.S., Wu, A., Guo, X., Gong, S., Lai, J.H.: Unsupervised person re-identification by soft multilabel learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2148–2157 (2019)
Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)
Google Scholar
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3754–3762 (2017)
Google Scholar
Zhou, C., Loy, C.C., Dai, B.: Denseclip: extract free dense labels from clip. arXiv preprint arXiv:2112.01071 (2021)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825 (2022)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
Article Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3702–3712 (2019)
Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Learning generalisable omni-scale representations for person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 5056–5069 (2021)
Google Scholar
Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021)

Download references

Author information

Authors and Affiliations

Lab of Smart Data and Signal Processing, Xiamen University, Xiamen, 361001, China
Lidong Cheng, Zhenyu Kuang, Hongyang Zhang, Xinghao Ding & Yue Huang
Institute of Artificial Intelligence, Xiamen University, Xiamen, 361001, China
Lidong Cheng

Authors

Lidong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Hongyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinghao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yue Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinghao Ding .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, L., Kuang, Z., Zhang, H., Ding, X., Huang, Y. (2024). Boosting Generalization Performance in Person Re-identification. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14434. Springer, Singapore. https://doi.org/10.1007/978-981-99-8549-4_15

Download citation

DOI: https://doi.org/10.1007/978-981-99-8549-4_15
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8548-7
Online ISBN: 978-981-99-8549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Boosting Generalization Performance in Person Re-identification