READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification

Shim, Minho; Ho, Hsuan-I; Kim, Jinhyung; Wee, Dongyoon

doi:10.1007/978-3-030-58568-6_20

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12359))

Included in the following conference series:

European Conference on Computer Vision

3866 Accesses
6 Citations

Abstract

Person re-identification (re-ID) is the problem of visually identifying a person given a database of identities. In this work, we focus on image-to-video re-ID which compares a single query image to videos in the gallery. The main challenge is the asymmetry association of an image and a video, and overcoming the difference caused by the additional temporal dimension. To this end, we propose an attention-aware discriminator architecture. The attention occurs across different modalities, and even different identities to aggregate useful spatio-temporal information for comparison. The information is effectively fused into a united feature, followed by the final prediction of a similarity score. The performance of the method is shown with image-to-video person re-identification benchmarks (DukeMTMC-VideoReID, and MARS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)
Google Scholar
Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: ICCV (2019)
Google Scholar
Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV (2019)
Google Scholar
Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: spatial-temporal attention for large-scale video-based person re-identification. In: AAAI (2019)
Google Scholar
Gu, X., Ma, B., Chang, H., Shan, S., Chen, X.: Temporal knowledge propagation for image-to-video person re-identification. In: ICCV (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint (2017)
Google Scholar
Ho, H.I., Shim, M., Wee, D.: Learning from dances: pose-invariant re-identification for multi-person tracking. In: ICASSP (2020)
Google Scholar
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: CVPR (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR (2017)
Google Scholar
Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR (2018)
Google Scholar
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR (2018)
Google Scholar
Li, Y.J., Chen, Y.C., Lin, Y.Y., Du, X., Wang, Y.C.F.: Recover and identify: a generative dual model for cross-resolution person re-identification. In: ICCV (2019)
Google Scholar
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015)
Google Scholar
Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: BMVC (2019)
Google Scholar
Liu, D., Wen, B., Fan, Y., Loy, C.C., Huang, T.S.: Non-local recurrent network for image restoration. In: NeurIPS (2018)
Google Scholar
Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV (2017)
Google Scholar
Liu, Y., Yuan, Z., Zhou, W., Li, H.: Spatial and temporal mutual promotion for video-based person re-identification. In: AAAI (2019)
Google Scholar
Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: ICCV (2019)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Shen, Y., Li, H., Yi, S., Chen, D., Wang, X.: Person re-identification with deep similarity-guided graph neural network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 508–526. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_30
Chapter Google Scholar
Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)
Google Scholar
Suh, Y., Wang, J., Tang, S., Mei, T., Lee, K.M.: Part-aligned bilinear representations for person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 418–437. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_25
Chapter Google Scholar
Sun, Y., et al.: Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. In: CVPR (2019)
Google Scholar
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Chapter Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Wang, G., Lai, J., Xie, X.: P2SNet: can an image match a video for person re-identification in an end-to-end way? TCSVT 28(10), 2777–2787 (2017)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
Google Scholar
Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: CVPR (2018)
Google Scholar
Zhang, D., Wu, W., Cheng, H., Zhang, R., Dong, Z., Cai, Z.: Image-to-video person re-identification with temporally memorized similarity learning. TCSVT 28(10), 2622–2632 (2017)
Google Scholar
Zhang, Y., Li, X., Zhang, Z.: Learning a key-value memory co-attention matching network for person re-identification. In: AAAI (2019)
Google Scholar
Zhang, Y., Zhong, Q., Ma, L., Xie, D., Pu, S.: Learning incremental triplet margin for person re-identification. In: AAAI (2019)
Google Scholar
Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)
Google Scholar
Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)
Google Scholar
Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_52
Chapter Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)
Google Scholar
Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint (2016)
Google Scholar
Zhu, X., Jing, X.Y., Wu, F., Wang, Y., Zuo, W., Zheng, W.S.: Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In: AAAI (2017)
Google Scholar
Zhu, X., Jing, X.Y., You, X., Zuo, W., Shan, S., Zheng, W.S.: Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. TIFS 13(3), 717–732 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Seoul, South Korea
Minho Shim
Department of Computer Science, ETH Zürich, Zürich, Switzerland
Hsuan-I Ho
School of Electrical Engineering, KAIST, Daejeon, South Korea
Jinhyung Kim
Clova AI, NAVER Corporation, Seongnam, South Korea
Dongyoon Wee

Authors

Minho Shim
View author publications
You can also search for this author in PubMed Google Scholar
Hsuan-I Ho
View author publications
You can also search for this author in PubMed Google Scholar
Jinhyung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dongyoon Wee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8206 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shim, M., Ho, HI., Kim, J., Wee, D. (2020). READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-58568-6_20
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58567-9
Online ISBN: 978-3-030-58568-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics