Abstract
Video-based person re-identification aims to match pedestrians with the consecutive video sequences. While a rich line of work focuses solely on extracting the motion features from pedestrian videos, we show in this paper that the temporal coherence plays a more critical role. To distill the temporal coherence part of video representation from frame representations, we propose a simple yet effective Adversarial Feature Augmentation (AFA) method, which highlights the temporal coherence features by introducing adversarial augmented temporal motion noise. Specifically, we disentangle the video representation into the temporal coherence and motion parts and randomly change the scale of the temporal motion features as the adversarial noise. The proposed AFA method is a general lightweight component that can be readily incorporated into various methods with negligible cost. We conduct extensive experiments on three challenging datasets including MARS, iLIDS-VID, and DukeMTMC-VideoReID, and the experimental results verify our argument and demonstrate the effectiveness of the proposed method.
G. Chen and Y. Rao—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: CVPR, pp. 1169–1178 (2018)
Chen, G., Lin, C., Ren, L., Lu, J., Zhou, J.: Self-critical attention learning for person re-identification. In: ICCV, pp. 9637–9646 (2019)
Chen, G., Lu, J., Yang, M., Zhou, J.: Spatial-temporal attention-aware learning for video-based person re-identification. TIP 28(9), 4192–4205 (2019)
Chen, G., Lu, J., Yang, M., Zhou, J.: Learning recurrent 3D attention for video-based person re-identification. TIP 29, 6963–6976 (2020)
Dai, J., Zhang, P., Wang, D., Lu, H., Wang, H.: Video person re-identification by temporal residual learning. TIP 28(3), 1366–1377 (2018)
Dehghan, A., Modiri Assari, S., Shah, M.: GMMCP tracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: CVPR, pp. 4091–4099 (2015)
Dosovitskiy, A., et al.: Learning optical flow with convolutional networks. In: ICCV, pp. 2758–2766 (2015)
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV, pp. 6202–6211 (2019)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627–1645 (2010)
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: Synthetic data augmentation using GAN for improved liver lesion classification. In: ISBI, pp. 289–293. IEEE (2018)
Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: Spatial-temporal attention for large-scale video-based person re-identification. In: AAAI (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv (2017)
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: SCIA, pp. 91–102 (2011)
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR, June 2019
Huang, H., Li, D., Zhang, Z., Chen, X., Huang, K.: Adversarially occluded samples for person re-identification. In: CVPR, pp. 5098–5107 (2018)
Karanam, S., Li, Y., Radke, R.J.: Person re-identification with discriminatively trained viewpoint invariant dictionaries. In: ICCV, pp. 4516–4524 (2015)
Karanam, S., Li, Y., Radke, R.J.: Sparse re-id: block sparsity for person re-identification. In: CVPR Workshops, pp. 33–40 (2015)
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC, pp. 1–10 (2008)
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: ICCV, October 2019
Li, J., Zhang, S., Huang, T.: Multi-scale 3D convolution network for video based person re-identification. AAAI 33, 8618–8625 (2019)
Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR, pp. 369–378 (2018)
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR, p. 2 (2018)
Liao, X., He, L., Yang, Z., Zhang, C.: Video-based person re-identification via 3D convolutional networks and non-local attention. In: ACCV, pp. 620–634. Springer (2018)
Lin, J., Ren, L., Lu, J., Feng, J., Zhou, J.: Consistent-aware deep learning for person re-identification in a camera network. In: CVPR (2017)
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., Feng, J.: Video-based person re-identification with accumulative motion context. TCSVT 28(10), 2788–2802 (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR (2017)
McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: CVPR, pp. 1325–1334, June 2016
Ouyang, D., Shao, J., Zhang, Y., Yang, Y., Shen, H.T.: Video-based person re-identification via self-paced learning and deep reinforcement learning framework. In: ACM MM, pp. 1562–1570 (2018)
Rao, Y., Lu, J., Zhou, J.: Learning discriminative aggregation network for video-based face recognition and person re-identification. IJCV 127(6–7), 701–718 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS, pp. 4077–4087 (2017)
Song, G., Leng, B., Liu, Y., Hetang, C., Cai, S.: Region-based quality estimation network for large-scale person re-identification. In: AAAI (2018)
Subramaniam, A., Nambiar, A., Mittal, A.: Co-segmentation inspired attention networks for video-based person re-identification. In: ICCV, October 2019
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Wang, F., et al.: Residual attention network for image classification. In: CVPR (2017)
Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: ECCV, pp. 688–703 (2014)
Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by discriminative selection in video ranking. TPAMI 38(12), 2501–2514 (2016)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)
Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: deep siamese attention networks for video-based person re-identification. TMM 21(6), 1412–1424 (2018)
Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: CVPR, pp. 5177–5186 (2018)
Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., Zhou, P.: Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: ICCV (2017)
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: ECCV, pp. 701–716 (2016)
You, J., Wu, A., Li, X., Zheng, W.S.: Top-push video-based person re-identification. In: CVPR, pp. 1345–1353, June 2016
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Zhang, J., Wang, N., Zhang, L.: Multi-shot pedestrian re-identification via sequential decision making. In: CVPR (2018)
Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: CVPR, pp. 1239–1248 (2016)
Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)
Zhao, Y., Shen, X., Jin, Z., Lu, H., Hua, X.S.: Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: CVPR, June 2019
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: a video benchmark for large-scale person re-identification. In: ECCV, pp. 868–884 (2016)
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: ICCV, pp. 3754–3762 (2017)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017)
Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: CVPR, July 2017
Acknowledgement
This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Natural Science Foundation under Grant No. L172051, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, G., Rao, Y., Lu, J., Zhou, J. (2020). Temporal Coherence or Temporal Motion: Which Is More Critical for Video-Based Person Re-identification?. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-58598-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58597-6
Online ISBN: 978-3-030-58598-3
eBook Packages: Computer ScienceComputer Science (R0)