Large-Scale Video-Based Person Re-identification via Non-local Attention and Feature Erasing

Yang, Zhao; Chang, Zhigang; Zheng, Shibao

doi:10.1007/978-981-15-3341-9_27

Large-Scale Video-Based Person Re-identification via Non-local Attention and Feature Erasing

Zhao Yang¹¹,
Zhigang Chang¹¹ &
Shibao Zheng¹¹

Conference paper
First Online: 16 February 2020

593 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1181))

Abstract

Encoding the video tracks of person to an aggregative representation is the key for video-based person re-identification (re-ID), where average pooling or RNN methods are typically used to aggregating frame-level features. However, It is still difficult to deal with the spatial misalignment caused by occlusion, posture changes and camera views. Inspired by the success of non-local block in video analysis, we use a non-local attention block as a spatial-temporal attention mechanism to handle the spatial-temporal misalignment problem. Moreover, partial occlusion is widely occurred in video sequences. We propose a local feature branch to tackle the partial occlusion problem by using feature erasing in the frame-level feature map. Therefore, our network is composed by two-branch, the global branch via non-local attention encoding the global feature and the local feature branch grasping the local feature. In evaluation, the global feature and local feature are concatenated to obtain a more discriminative feature. We conduct extensive experiments on two challenging datasets (MARS and iLIDS-VID). The experimental results demonstrate that our method is comparable with the state-of-the-art methods in these datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3908–3916 (2015)
Google Scholar
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65. IEEE (2005)
Google Scholar
Chen, D., Li, H., Xiao, T., Yi, S., Wang, X.: Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1169–1178 (2018)
Google Scholar
Chen, S.Z., Guo, C.C., Lai, J.H.: Deep ranking for person re-identification via joint representation learning. IEEE Trans. Image Process. 25(5), 2353–2367 (2016)
Article MathSciNet Google Scholar
Dai, Z., Chen, M., Zhu, S., Tan, P.: Batch feature erasing for person re-identification and beyond. arXiv preprint arXiv:1811.07130 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)
Article Google Scholar
Felzenszwalb, P.F., McAllester, D.A., Ramanan, D., et al.: A discriminatively trained, multiscale, deformable part model. In: CVPR, vol. 2, p. 7 (2008)
Google Scholar
Fu, Y., et al.: Horizontal pyramid matching for person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8295–8302 (2019)
Article Google Scholar
Gao, J., Nevatia, R.: Revisiting temporal modeling for video-based person ReID. arXiv preprint arXiv:1805.02104 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 369–378 (2018)
Google Scholar
Liao, X., He, L., Yang, Z., Zhang, C.: Video-based person re-identification via 3d convolutional networks and non-local attention. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_39
Chapter Google Scholar
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5790–5799 (2017)
Google Scholar
Loy, C.C., Xiang, T., Gong, S.: Multi-camera activity correlation analysis. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1988–1995. IEEE (2009)
Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1325–1334 (2016)
Google Scholar
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496 (2018)
Chapter Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: 2018 ACM Multimedia Conference on Multimedia Conference, pp. 274–282. ACM (2018)
Google Scholar
Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_45
Chapter Google Scholar
Wang, X.: Intelligent multi-camera video surveillance: a review. Pattern Recogn. Lett. 34(1), 3–19 (2013)
Article Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., Yang, X.: Person re-identification via recurrent feature aggregation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_42
Chapter Google Scholar
Yu, S.I., Yang, Y., Hauptmann, A.: Harry Potter’s Marauder’s map: localizing and tracking multiple persons-of-interest by nonnegative discretization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3714–3720 (2013)
Google Scholar
Roshan Zamir, A., Dehghan, A., Shah, M.: GMCP-Tracker: global multi-object tracking using generalized minimum clique graphs. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_25
Chapter Google Scholar
Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_52
Chapter Google Scholar
Zhou, Z., Huang, Y., Wang, W., Wang, L., Tan, T.: See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4747–4756 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Image Communication and Network Engineering, Shanghai Key Labs of Digital Media Processing and Transmission, Shanghai Jiao Tong University, SEIEE Buildings 5-421, Shanghai, 200240, China
Zhao Yang, Zhigang Chang & Shibao Zheng

Authors

Zhao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Chang
View author publications
You can also search for this author in PubMed Google Scholar
Shibao Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shibao Zheng .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University, Shanghai, China
Jun Zhou
Shanghai Jiao Tong University, Shanghai, China
Hua Yang
Shanghai University, Shanghai, China
Ping An
Shanghai Jiao Tong University, Shanghai, China
Xiaokang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z., Chang, Z., Zheng, S. (2020). Large-Scale Video-Based Person Re-identification via Non-local Attention and Feature Erasing. In: Zhai, G., Zhou, J., Yang, H., An, P., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2019. Communications in Computer and Information Science, vol 1181. Springer, Singapore. https://doi.org/10.1007/978-981-15-3341-9_27

Download citation

DOI: https://doi.org/10.1007/978-981-15-3341-9_27
Published: 16 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3340-2
Online ISBN: 978-981-15-3341-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics