Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Zhang, Shizhou; Luo, Wenlong; Cheng, De; Yang, Qingchun; Ran, Lingyan; Xing, Yinghui; Zhang, Yanning

doi:10.1007/978-3-031-73383-3_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15085))

Included in the following conference series:

European Conference on Computer Vision

245 Accesses

Abstract

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GAMa: Cross-View Video Geo-Localization

Situational diversity in video person re-identification: introducing MSA-BUPT dataset

Article Open access 23 May 2024

I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification

Article 14 January 2025

References

Aich, A., Zheng, M., Karanam, S., Chen, T., Roy-Chowdhury, A.K., Wu, Z.: Spatio-temporal representation factorization for video-based person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 152–162 (2021)
Google Scholar
Bai, S., Ma, B., Chang, H., Huang, R., Chen, X.: Salient-to-broad transition for video person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7339–7348 (2022)
Google Scholar
Baltieri, D., Vezzani, R., Cucchiara, R.: 3dpes: 3d people dataset for surveillance and forensics. In: Joint Acm Workshop on Human Gesture & Behavior Understanding (2011)
Google Scholar
Chao, H., He, Y., Zhang, J., Feng, J.: Gaitset: regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8126–8133 (2019)
Google Scholar
Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1169–1178 (2018)
Google Scholar
Chen, G., Rao, Y., Lu, J., Zhou, J.: Temporal coherence or temporal motion: Which is more critical for video-based person re-identification? In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 660–676. Springer (2020)
Google Scholar
Cheng, D., He, L., Wang, N., Zhang, S., Wang, Z., Gao, X.: Efficient bilateral cross-modality cluster matching for unsupervised visible-infrared person reid. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 1325–1333 (2023)
Google Scholar
Cheng, D., et al.: Continual all-in-one adverse weather removal with knowledge replay on a unified network structure. IEEE Trans. Multimed. (2024)
Google Scholar
Cheng, D., Zhou, J., Wang, N., Gao, X.: Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE Trans. Image Process. 31, 3334–3346 (2022). https://doi.org/10.1109/TIP.2022.3169693
Article Google Scholar
Chung, D., Tahboub, K., Delp, E.J.: A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1983–1991 (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Eom, C., Lee, G., Lee, J., Ham, B.: Video-based person re-identification with spatial and temporal memory networks. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12016–12025 (2021). https://doi.org/10.1109/ICCV48922.2021.01182
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
Google Scholar
Fu, Y., Wang, X., Wei, Y., Huang, T.: Sta: spatial-temporal attention for large-scale video-based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8287–8294 (2019)
Google Scholar
Gu, X., Chang, H., Ma, B., Zhang, H., Chen, X.: Appearance-preserving 3d convolution for video-based person re-identification. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 228–243. Springer (2020)
Google Scholar
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15013–15022 (October 2021)
Google Scholar
He, T., Jin, X., Shen, X., Huang, J., Chen, Z., Hua, X.S.: Dense interaction learning for video-based person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1490–1501 (2021)
Google Scholar
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Image Analysis: 17th Scandinavian Conference, SCIA 2011, Ystad, Sweden, May 2011. Proceedings 17, pp. 91–102. Springer (2011)
Google Scholar
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Vrstc: occlusion-free video person re-identification. In: CVPR, pp. 7183–7192 (2019)
Google Scholar
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Temporal complementary learning for video person re-identification. In: ECCV, pp. 388–405 (2020)
Google Scholar
Hou, R., Chang, H., Ma, B., Huang, R., Shan, S.: Bicnet-tks: learning efficient spatial-temporal representation for video person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2014–2023, June 2021
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=nZeVKeeFYf9
Jia, M., Tang, L., Chen, B., Cardie, C., Belongie, S.J., Hariharan, B., Lim, S.: Visual prompt tuning. In: ECCV (33). LNCS, vol. 13693, pp. 709–727. Springer (2022)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021)
Li, H., et al.: Boosting low-data instance segmentation by unsupervised pre-training with saliency prompt. arXiv preprint arXiv:2302.01171 (2023)
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3958–3967 (2019)
Google Scholar
Li, J., Zhang, S., Huang, T.: Multiscale 3d convolution network for video based person reidentification. In: AAAI, pp. 8618–8625 (2019)
Google Scholar
Li, S., Sun, L., Li, Q.: Clip-reid: exploiting vision-language model for image re-identification without concrete text labels. arXiv preprint arXiv:2211.13977 (2022)
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., Feng, J.: Video-based person re-identification with accumulative motion context. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2788–2802 (2017)
Article Google Scholar
Liu, X., Zhang, P., Lu, H.: Video-based person re-identification with long short-term representation learning. In: International Conference on Image and Graphics, pp. 55–67. Springer (2023)
Google Scholar
Liu, X., Zhang, P., Yu, C., Lu, H., Yang, X.: Watching you: Global-guided reciprocal learning for video-based person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13334–13343 (2021)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Rasheed, H., khattak, M.U., Maaz, M., Khan, S., Khan, F.S.: Finetuned clip models are efficient video learners. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer (2016)
Google Scholar
Wang, X., Zhao, R.: Person re-identification: System design and evaluation overview. In: Person Re-Identification, pp. 351–370. Springer (2014)
Google Scholar
Wang, Y., Zhang, P., Gao, S., Geng, X., Lu, H., Wang, D.: Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12026–12035 (2021)
Google Scholar
Xing, Y., Wu, Q., Cheng, D., Zhang, S., Liang, G., Wang, P., Zhang, Y.: Dual modality prompt tuning for vision-language pre-trained model. IEEE Trans. Multimedia 26, 2056–2068 (2024). https://doi.org/10.1109/TMM.2023.3291588
Article Google Scholar
Yan, Y., et al.: Learning multi-granular hypergraphs for video-based person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2899–2908 (2020)
Google Scholar
Yang, J., Zheng, W.S., Yang, Q., Chen, Y.C., Tian, Q.: Spatial-temporal graph convolutional network for video-based person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3289–3299 (2020)
Google Scholar
Yin, J., Wu, A., Zheng, W.S.: Fine-grained person re-identification. Int. J. Comput. Vision 128(6), 1654–1672 (2020). https://doi.org/10.1007/s11263-019-01259-0
Article Google Scholar
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. Advances in neural information processing systems 30 (2017)
Google Scholar
Zang, X., Li, G., Gao, W.: Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval. IEEE Trans. Industr. Inf. 18(12), 8776–8785 (2022). https://doi.org/10.1109/TII.2022.3151766
Article Google Scholar
Zhang, S., Yang, Y., Wang, P., Liang, G., Zhang, X., Zhang, Y.: Attend to the difference: Cross-modality person re-identification via contrastive correlation. IEEE Trans. Image Process. 30, 8861–8872 (2021). https://doi.org/10.1109/TIP.2021.3120881
Article Google Scholar
Zhang, S., et al.: Person re-identification in aerial imagery. IEEE Trans. Multimedia 23, 281–291 (2021). https://doi.org/10.1109/TMM.2020.2977528
Article Google Scholar
Zhang, Z., Lan, C., Zeng, W., Chen, Z.: Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10407–10416 (2020)
Google Scholar
Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: ECCV, pp. 868–884 (2016)
Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13001–13008 (2020)
Google Scholar
Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE CDonference on Computer Vision and Pattern Recognition, pp. 4747–4756 (2017)
Google Scholar
Zhu, K., Guo, H., Zhang, S., Wang, Y., Liu, J., Wang, J., Tang, M.: Aaformer: auto-aligned transformer for person re-identification. IEEE Trans. Neural Networks Learn. Syst. (2023)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 62101453, 62176198 and 62201467, the Key Research and Development Program of Shaanxi Province under Grant 2024GX-YBXM-135, in part by China Postdoctoral Science Foundation under Grant 2022TQ0260, 2023M742842, in part by the Young Talent Fund of Xi’an Association for Science and Technology under Grant 959202313088, Innovation Capability Support Program of Shaanxi (No. 2024ZC-KJXX-043), the Fundamental Research Funds for the Central Universities No. HYGJZN202331 and the Natural Science Basic Research Program of Shaanxi Province (No. 2022JC-DW-08).

Author information

Authors and Affiliations

Northwestern Polytechnical University, Xi’an, 710000, Shaanxi, China
Shizhou Zhang, Wenlong Luo, Qingchun Yang, Lingyan Ran, Yinghui Xing & Yanning Zhang
Xidian University, Xi’an, 710000, Shaanxi, China
De Cheng

Authors

Shizhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenlong Luo
View author publications
You can also search for this author in PubMed Google Scholar
De Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingchun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lingyan Ran
View author publications
You can also search for this author in PubMed Google Scholar
Yinghui Xing
View author publications
You can also search for this author in PubMed Google Scholar
Yanning Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to De Cheng .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 97906 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S. et al. (2025). Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15085. Springer, Cham. https://doi.org/10.1007/978-3-031-73383-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-73383-3_16
Published: 03 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73382-6
Online ISBN: 978-3-031-73383-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GAMa: Cross-View Video Geo-Localization

Situational diversity in video person re-identification: introducing MSA-BUPT dataset

I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (mp4 97906 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GAMa: Cross-View Video Geo-Localization

Situational diversity in video person re-identification: introducing MSA-BUPT dataset

I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (mp4 97906 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation