Skip to main content

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13534))

Included in the following conference series:

  • 2698 Accesses

Abstract

Video-based person re-identification is a challenging task due to illuminations, occlusions, viewpoint changes, and pedestrian misalignment. Most previous works focus more on temporal correlation features, which leads to a lack of detailed information. In this paper, we emphasize the importance of keeping both correlation and diversity of multi-frame features simultaneously. Thus, we propose a Temporal Correlation-Diversity Representation (TCDR) network to enhance the representation of frame-level pedestrian features and the temporal feature aggregation abilities. Specifically, in order to capture correlated but diverse temporal features, we propose a Temporal-Guided Frame Feature Enhancement (TGFE) module, which explores the temporal correlation with a global perspective and enhances frame-level features to achieve the temporal diversity. Furthermore, we propose a Temporal Feature Integration (TFI) module to aggregate multi-frame features. Finally, we propose a novel progressive smooth loss to alleviate the influence of noisy frames. Extensive experiments show that our method achieves the state-of-the-art performance on MARS, DukeMTMC-VideoReID and LS-VID datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, T., et al.: ABD-Net: attentive but diverse person re-identification. In: ICCV (2019)

    Google Scholar 

  2. Eom, C., Lee, G., Lee, J., Ham, B.: Video-based person re-identification with spatial and temporal memory networks. In: ICCV (2021)

    Google Scholar 

  3. Gu, X., Chang, H., Ma, B., Zhang, H., Chen, X.: Appearance-preserving 3D convolution for video-based person re-identification. In: ECCV (2020)

    Google Scholar 

  4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  5. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)

  6. Hou, R., Chang, H., Ma, B., Huang, R., Shan, S.: BiCnet-TKS: learning efficient spatial-temporal representation for video person re-identification. In: CVPR (2021)

    Google Scholar 

  7. Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Temporal complementary learning for video person re-identification. In: ECCV (2020)

    Google Scholar 

  8. Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR (2019)

    Google Scholar 

  9. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)

    Google Scholar 

  10. Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: ICCV (2019)

    Google Scholar 

  11. Li, J., Zhang, S., Huang, T.: Multi-scale 3D convolution network for video based person re-identification. In: AAAI (2019)

    Google Scholar 

  12. Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR (2017)

    Google Scholar 

  13. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: ICCV (2017)

    Google Scholar 

  14. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)

    Google Scholar 

  15. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)

    Google Scholar 

  16. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: ECCV (2018)

    Google Scholar 

  17. Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: CVPR (2018)

    Google Scholar 

  18. Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: ICCV (2017)

    Google Scholar 

  19. Yan, Y., et al.: Learning multi-granular hypergraphs for video-based person re-identification. In: CVPR (2020)

    Google Scholar 

  20. Zhang, L., et al.: Ordered or orderless: a revisit for video based person re-identification. Trans. Pattern Aanal. Mach. Intell. 43, 1460–1466 (2020)

    Google Scholar 

  21. Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: ECCV (2016)

    Google Scholar 

  22. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: AAAI (2020)

    Google Scholar 

  23. Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR (2017)

    Google Scholar 

Download references

Acknowledgment

The research work is supported by the National Key Research and Development Program of China (2021AAA0140203), the Zhejiang Provincial Key Research and Development Program of China (No. 2021C01164), the Project of Chinese Academy of Sciences (E141020). Juan Cao thanks the Nanjing Government Affairs and Public Opinion Research Institute for the support of “CaoJuan Studio” and thank Chi Peng, Jingjing Jiang, Qiang Liu, and Yu Dai for their help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gong, L., Zhang, R., Tang, S., Cao, J. (2022). Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13534. Springer, Cham. https://doi.org/10.1007/978-3-031-18907-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18907-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18906-7

  • Online ISBN: 978-3-031-18907-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics