Skip to main content

Temporal Coherence or Temporal Motion: Which Is More Critical for Video-Based Person Re-identification?

  • Conference paper
  • First Online:
Book cover Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12353))

Included in the following conference series:

Abstract

Video-based person re-identification aims to match pedestrians with the consecutive video sequences. While a rich line of work focuses solely on extracting the motion features from pedestrian videos, we show in this paper that the temporal coherence plays a more critical role. To distill the temporal coherence part of video representation from frame representations, we propose a simple yet effective Adversarial Feature Augmentation (AFA) method, which highlights the temporal coherence features by introducing adversarial augmented temporal motion noise. Specifically, we disentangle the video representation into the temporal coherence and motion parts and randomly change the scale of the temporal motion features as the adversarial noise. The proposed AFA method is a general lightweight component that can be readily incorporated into various methods with negligible cost. We conduct extensive experiments on three challenging datasets including MARS, iLIDS-VID, and DukeMTMC-VideoReID, and the experimental results verify our argument and demonstrate the effectiveness of the proposed method.

G. Chen and Y. Rao—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: CVPR, pp. 1169–1178 (2018)

    Google Scholar 

  2. Chen, G., Lin, C., Ren, L., Lu, J., Zhou, J.: Self-critical attention learning for person re-identification. In: ICCV, pp. 9637–9646 (2019)

    Google Scholar 

  3. Chen, G., Lu, J., Yang, M., Zhou, J.: Spatial-temporal attention-aware learning for video-based person re-identification. TIP 28(9), 4192–4205 (2019)

    MathSciNet  MATH  Google Scholar 

  4. Chen, G., Lu, J., Yang, M., Zhou, J.: Learning recurrent 3D attention for video-based person re-identification. TIP 29, 6963–6976 (2020)

    Google Scholar 

  5. Dai, J., Zhang, P., Wang, D., Lu, H., Wang, H.: Video person re-identification by temporal residual learning. TIP 28(3), 1366–1377 (2018)

    MathSciNet  Google Scholar 

  6. Dehghan, A., Modiri Assari, S., Shah, M.: GMMCP tracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: CVPR, pp. 4091–4099 (2015)

    Google Scholar 

  7. Dosovitskiy, A., et al.: Learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)

    Google Scholar 

  8. Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV, pp. 6202–6211 (2019)

    Google Scholar 

  9. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  10. Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using GAN for improved liver lesion classification. In: ISBI, pp. 289–293. IEEE (2018)

    Google Scholar 

  11. Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: Spatial-temporal attention for large-scale video-based person re-identification. In: AAAI (2019)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  13. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv (2017)

    Google Scholar 

  14. Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: SCIA, pp. 91–102 (2011)

    Google Scholar 

  15. Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR, June 2019

    Google Scholar 

  16. Huang, H., Li, D., Zhang, Z., Chen, X., Huang, K.: Adversarially occluded samples for person re-identification. In: CVPR, pp. 5098–5107 (2018)

    Google Scholar 

  17. Karanam, S., Li, Y., Radke, R.J.: Person re-identification with discriminatively trained viewpoint invariant dictionaries. In: ICCV, pp. 4516–4524 (2015)

    Google Scholar 

  18. Karanam, S., Li, Y., Radke, R.J.: Sparse re-id: block sparsity for person re-identification. In: CVPR Workshops, pp. 33–40 (2015)

    Google Scholar 

  19. Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 1–10 (2008)

    Google Scholar 

  20. Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: ICCV, October 2019

    Google Scholar 

  21. Li, J., Zhang, S., Huang, T.: Multi-scale 3D convolution network for video based person re-identification. AAAI 33, 8618–8625 (2019)

    Article  Google Scholar 

  22. Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR, pp. 369–378 (2018)

    Google Scholar 

  23. Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR, p. 2 (2018)

    Google Scholar 

  24. Liao, X., He, L., Yang, Z., Zhang, C.: Video-based person re-identification via 3D convolutional networks and non-local attention. In: ACCV, pp. 620–634. Springer (2018)

    Google Scholar 

  25. Lin, J., Ren, L., Lu, J., Feng, J., Zhou, J.: Consistent-aware deep learning for person re-identification in a camera network. In: CVPR (2017)

    Google Scholar 

  26. Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., Feng, J.: Video-based person re-identification with accumulative motion context. TCSVT 28(10), 2788–2802 (2017)

    Google Scholar 

  27. Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)

    Google Scholar 

  28. Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR (2017)

    Google Scholar 

  29. McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: CVPR, pp. 1325–1334, June 2016

    Google Scholar 

  30. Ouyang, D., Shao, J., Zhang, Y., Yang, Y., Shen, H.T.: Video-based person re-identification via self-paced learning and deep reinforcement learning framework. In: ACM MM, pp. 1562–1570 (2018)

    Google Scholar 

  31. Rao, Y., Lu, J., Zhou, J.: Learning discriminative aggregation network for video-based face recognition and person re-identification. IJCV 127(6–7), 701–718 (2019)

    Article  Google Scholar 

  32. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  33. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS, pp. 4077–4087 (2017)

    Google Scholar 

  34. Song, G., Leng, B., Liu, Y., Hetang, C., Cai, S.: Region-based quality estimation network for large-scale person re-identification. In: AAAI (2018)

    Google Scholar 

  35. Subramaniam, A., Nambiar, A., Mittal, A.: Co-segmentation inspired attention networks for video-based person re-identification. In: ICCV, October 2019

    Google Scholar 

  36. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)

    Google Scholar 

  37. Wang, F., et al.: Residual attention network for image classification. In: CVPR (2017)

    Google Scholar 

  38. Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: ECCV, pp. 688–703 (2014)

    Google Scholar 

  39. Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by discriminative selection in video ranking. TPAMI 38(12), 2501–2514 (2016)

    Article  Google Scholar 

  40. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)

    Google Scholar 

  41. Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: deep siamese attention networks for video-based person re-identification. TMM 21(6), 1412–1424 (2018)

    Google Scholar 

  42. Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: CVPR, pp. 5177–5186 (2018)

    Google Scholar 

  43. Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: ICCV (2017)

    Google Scholar 

  44. Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: ECCV, pp. 701–716 (2016)

    Google Scholar 

  45. You, J., Wu, A., Li, X., Zheng, W.S.: Top-push video-based person re-identification. In: CVPR, pp. 1345–1353, June 2016

    Google Scholar 

  46. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

  47. Zhang, J., Wang, N., Zhang, L.: Multi-shot pedestrian re-identification via sequential decision making. In: CVPR (2018)

    Google Scholar 

  48. Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: CVPR, pp. 1239–1248 (2016)

    Google Scholar 

  49. Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)

    Google Scholar 

  50. Zhao, Y., Shen, X., Jin, Z., Lu, H., Hua, X.S.: Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: CVPR, June 2019

    Google Scholar 

  51. Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: a video benchmark for large-scale person re-identification. In: ECCV, pp. 868–884 (2016)

    Google Scholar 

  52. Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: ICCV, pp. 3754–3762 (2017)

    Google Scholar 

  53. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017)

  54. Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR, July 2017

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Natural Science Foundation under Grant No. L172051, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiwen Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, G., Rao, Y., Lu, J., Zhou, J. (2020). Temporal Coherence or Temporal Motion: Which Is More Critical for Video-Based Person Re-identification?. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58598-3_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58597-6

  • Online ISBN: 978-3-030-58598-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics