Appearance-Preserving 3D Convolution for Video-Based Person Re-identification

Gu, Xinqian; Chang, Hong; Ma, Bingpeng; Zhang, Hongkai; Chen, Xilin

doi:10.1007/978-3-030-58536-5_14

Xinqian Gu^12,13,
Hong Chang^12,13,
Bingpeng Ma¹³,
Hongkai Zhang^12,13 &
…
Xilin Chen^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12347))

Included in the following conference series:

European Conference on Computer Vision

6981 Accesses
90 Citations

Abstract

Due to the imperfect person detection results and posture changes, temporal appearance misalignment is unavoidable in video-based person re-identification (ReID). In this case, 3D convolution may destroy the appearance representation of person video clips, thus it is harmful to ReID. To address this problem, we propose Appearance-Preserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel. With APM aligning the adjacent feature maps in pixel level, the following 3D convolution can model temporal information on the premise of maintaining the appearance representation quality. It is easy to combine AP3D with existing 3D ConvNets by simply replacing the original 3D convolution kernels with AP3Ds. Extensive experiments demonstrate the effectiveness of AP3D for video-based ReID and the results on three widely used datasets surpass the state-of-the-arts. Code is available at: https://github.com/guxinqian/AP3D.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Video-Based Person Re-identification via 3D Convolutional Networks and Non-local Attention

A Hybrid 2D and 3D Convolution Based Recurrent Network for Video-Based Person Re-identification

Two Stream Deep CNN-RNN Attentive Pooling Architecture for Video-Based Person Re-identification

References

Aberman, K., Liao, J., Shi, M., Lischinski, D., Chen, B., Cohen-Or, D.: Neural best-buddies: sparse cross-domain correspondence. ACM Trans. Graph. 37(4), 69 (2018)
Article Google Scholar
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? a new model and the kinetics dataset. In: CVPR (2017)
Google Scholar
Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: CVPR (2018)
Google Scholar
Chung, D., Tahboub, K., Delp, E.J.: A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: Spatial-temporal attention for large-scale video-based person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2019)
Google Scholar
Gu, X., Ma, B., Chang, H., Shan, S., Chen, X.: Temporal knowledge propagation for image-to-video person re-identification. In: ICCV (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. ArXiv:1703.07737 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Temporal complementary learning for video person re-identification. In: ECCV (2020)
Google Scholar
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: CVPR (2019)
Google Scholar
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR (2019)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: ICCV (2019)
Google Scholar
Li, J., Zhang, S., Huang, T.: Multi-scale 3D convolution network for video based person re-identification. In: AAAI (2019)
Google Scholar
Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR (2018)
Google Scholar
Liao, X., He, L., Yang, Z., Zhang, C.: Video-based person re-identification via 3D convolutional networks and non-local attention. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds.) ACCV 201. Lecture Notes in Computer Science, vol. 11366, pp. 620–634. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-20876-9_39
Chapter Google Scholar
Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: BMVC (2019)
Google Scholar
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR (2017)
Google Scholar
Mclaughlin, N., Rincon, J.M.D., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: CVPR (2016)
Google Scholar
Ng, Y.H., et al.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: ICCV (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR (2018)
Google Scholar
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. Lecture Notes in Computer Science, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Chapter Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)
Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: ACM MM (2018)
Google Scholar
Wang, H., et al.: CosFace: Large margin cosine loss for deep face recognition. In: CVPR (2018)
Google Scholar
Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 688–703. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_45
Chapter Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
Google Scholar
Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: CVPR (2018)
Google Scholar
Zhang, H., Chang, H., Ma, B., Wang, N., Chen, X.: Dynamic R-CNN: Towards high quality object detection via dynamic training. In: ECCV (2020)
Google Scholar
Zhao, Y., Shen, X., Jin, Z., Lu, H., Hua, X.: Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: CVPR (2019)
Google Scholar
Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_52
Chapter Google Scholar
Zheng, L., et al.: Scalable person re-identification: a benchmark. In: ICCV (2015)
Google Scholar
Zitová, B., Flusser, J.: Image registration methods: a survey. IVC (2003)
Google Scholar

Download references

Acknowledgement

This work is partially supported by Natural Science Foundation of China (NSFC): 61876171 and 61976203.

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Xinqian Gu, Hong Chang, Hongkai Zhang & Xilin Chen
University of Chinese Academy of Sciences, Beijing, 100049, China
Xinqian Gu, Hong Chang, Bingpeng Ma, Hongkai Zhang & Xilin Chen

Authors

Xinqian Gu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chang
View author publications
You can also search for this author in PubMed Google Scholar
Bingpeng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hongkai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Chang .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, X., Chang, H., Ma, B., Zhang, H., Chen, X. (2020). Appearance-Preserving 3D Convolution for Video-Based Person Re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12347. Springer, Cham. https://doi.org/10.1007/978-3-030-58536-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-58536-5_14
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58535-8
Online ISBN: 978-3-030-58536-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Appearance-Preserving 3D Convolution for Video-Based Person Re-identification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Video-Based Person Re-identification via 3D Convolutional Networks and Non-local Attention

A Hybrid 2D and 3D Convolution Based Recurrent Network for Video-Based Person Re-identification

Two Stream Deep CNN-RNN Attentive Pooling Architecture for Video-Based Person Re-identification

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Appearance-Preserving 3D Convolution for Video-Based Person Re-identification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Video-Based Person Re-identification via 3D Convolutional Networks and Non-local Attention

A Hybrid 2D and 3D Convolution Based Recurrent Network for Video-Based Person Re-identification

Two Stream Deep CNN-RNN Attentive Pooling Architecture for Video-Based Person Re-identification

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation