Abstract
Video-based person re-identification is a challenging task due to illuminations, occlusions, viewpoint changes, and pedestrian misalignment. Most previous works focus more on temporal correlation features, which leads to a lack of detailed information. In this paper, we emphasize the importance of keeping both correlation and diversity of multi-frame features simultaneously. Thus, we propose a Temporal Correlation-Diversity Representation (TCDR) network to enhance the representation of frame-level pedestrian features and the temporal feature aggregation abilities. Specifically, in order to capture correlated but diverse temporal features, we propose a Temporal-Guided Frame Feature Enhancement (TGFE) module, which explores the temporal correlation with a global perspective and enhances frame-level features to achieve the temporal diversity. Furthermore, we propose a Temporal Feature Integration (TFI) module to aggregate multi-frame features. Finally, we propose a novel progressive smooth loss to alleviate the influence of noisy frames. Extensive experiments show that our method achieves the state-of-the-art performance on MARS, DukeMTMC-VideoReID and LS-VID datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, T., et al.: ABD-Net: attentive but diverse person re-identification. In: ICCV (2019)
Eom, C., Lee, G., Lee, J., Ham, B.: Video-based person re-identification with spatial and temporal memory networks. In: ICCV (2021)
Gu, X., Chang, H., Ma, B., Zhang, H., Chen, X.: Appearance-preserving 3D convolution for video-based person re-identification. In: ECCV (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
Hou, R., Chang, H., Ma, B., Huang, R., Shan, S.: BiCnet-TKS: learning efficient spatial-temporal representation for video person re-identification. In: CVPR (2021)
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Temporal complementary learning for video person re-identification. In: ECCV (2020)
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: ICCV (2019)
Li, J., Zhang, S., Huang, T.: Multi-scale 3D convolution network for video based person re-identification. In: AAAI (2019)
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR (2017)
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: ICCV (2017)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: ECCV (2018)
Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: CVPR (2018)
Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: ICCV (2017)
Yan, Y., et al.: Learning multi-granular hypergraphs for video-based person re-identification. In: CVPR (2020)
Zhang, L., et al.: Ordered or orderless: a revisit for video based person re-identification. Trans. Pattern Aanal. Mach. Intell. 43, 1460–1466 (2020)
Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: ECCV (2016)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: AAAI (2020)
Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR (2017)
Acknowledgment
The research work is supported by the National Key Research and Development Program of China (2021AAA0140203), the Zhejiang Provincial Key Research and Development Program of China (No. 2021C01164), the Project of Chinese Academy of Sciences (E141020). Juan Cao thanks the Nanjing Government Affairs and Public Opinion Research Institute for the support of “CaoJuan Studio” and thank Chi Peng, Jingjing Jiang, Qiang Liu, and Yu Dai for their help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gong, L., Zhang, R., Tang, S., Cao, J. (2022). Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13534. Springer, Cham. https://doi.org/10.1007/978-3-031-18907-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-18907-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18906-7
Online ISBN: 978-3-031-18907-4
eBook Packages: Computer ScienceComputer Science (R0)