Spatial-temporal aware network for video-based person re-identification

Wang, Jun; Zhao, Qi; Jia, Di; Huang, Ziqing; Zhang, Miaohui; Ren, Xing

doi:10.1007/s11042-023-16911-8

Spatial-temporal aware network for video-based person re-identification

Published: 29 September 2023

Volume 83, pages 36355–36373, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jun Wang¹,
Qi Zhao¹,
Di Jia¹,
Ziqing Huang¹,
Miaohui Zhang¹ &
…
Xing Ren¹

243 Accesses
Explore all metrics

Abstract

Video-based pedestrian re-identification (ReID) is able to match the same pedestrian from various cameras in a complex real-world scene. The extracted representations can’t contain all the useful information about the persons, due to the occlusion and misalignment of human areas between video sequences, and thus lack integrity and discrimination. To resolve this issue, we propose a new Spatial-Temporal Aware Network, which can mine and complement person features with feature relationships and intra-frame cues. According to the high correlation of the feature nodes of the same person between different video sequences, we employ the learned pedestrian feature nodes to construct the temporal relationship graph. In detail, the Temporal Interaction Module is designed to locate relevant pedestrian regions by modeling the correlation of feature nodes with reference nodes; and the Temporal Attention Module that we have designed is used to select more specific reference nodes. Then, we apply the designed Spatial Reference Module to adaptively mine each frame for fine-grained cues, making the spatial-temporal characteristics of persons more discriminative. We have implemented numerous experiments to demonstrate the excellent performance of STAN, such as achieving 88.0$\%$ mAP and 89.5$\%$ Rank-1 accuracy on the MARS dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Article 24 March 2023

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

Data Availability

The datasets generated during and/or analysed during the current study are not publicly available due to [REASON(S) WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.

References

Andriluka M, Roth S, Schiele B (2008) People-tracking-by-detection and people-detection-by-tracking. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3539–3548
Khan FM, Brémond F (2016) Person re-identification for real-world surveillance systems. arXiv preprint arXiv:1607.05975
Wang X (2013) Intelligent multi-camera video surveillance: A review. Pattern Recognit Lett 34(1):3–19
Article Google Scholar
Chen XS et al (2020) Salience-guided cascaded suppression network for person re-identification. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR), Electr Network, 2020. pp 3297–3307
Su C et al (2017) Pose-driven deep convolutional model for person re-identification. In: 16th IEEE International conference on computer vision (ICCV), Venice, Italy, 2017. pp 3980–3989
Wei L, Zhang S, Gao W, Tian Q (2018) Person Transfer GAN to bridge domain gap for person re-identification. In 31st IEEE/CVF conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, 2018. IEEE, pp 79–88
Wang C, Zhang Q, Huang C, Liu W, Wang X (2018) Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification. In: 15th European conference on computer vision (ECCV), Munich, Germany, 2018, vol. 11208, pp 384–400
Zhang Z, Lan C, Zeng W, Chen Z (2020) Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. Ed. IEEE
Eom C, Lee G, Lee J, Ham B (2021) Video-based person re-identification with spatial and temporal memory networks. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 12036–12045
Wang Y, Zhang P, Gao S, Geng X, Lu H, Wang D (2021) Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 12026–12035
Bhuiyan A, Huang JX (2022) STCA: Utilizing a spatio-temporal cross-attention network for enhancing video person re-identification. Image Vis Comput 123:104474
Article Google Scholar
McLaughlin N, del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Seattle, WA, 2016. IEEE, pp 1325–1334
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision. pp 4733–4742
Chung D, Tahboub K, Delp EJ (2017) A two stream siamese convolutional neural network for person re-identification. In: 16th IEEE international conference on computer vision (ICCV), Venice, Italy, 2017. IEEE, pp 1992–2000
Gu XQ, Chang H, Ma BP, Shan SG (2022) Motion feature aggregation for video-based person re-identification. IEEE Trans Image Process 31:3908–3919
Article Google Scholar
Zhang R et al (2019) SCAN: self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28(10):4870–4882
Article MathSciNet Google Scholar
Liu J, Zha Z-J, Chen X, Wang Z, Zhang Y (2019) Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans Multimed Comput Commun Appl 15(1):8
Google Scholar
Fu Y, Wang X, Wei Y, Huang T, Aaai (2019) STA: spatial-temporal attention for large-scale video-based person re-identification. In: 33rd AAAI Conference on artificial intelligence / 31st innovative applications of artificial intelligence conference / 9th AAAI symposium on educational advances in artificial intelligence, Honolulu, HI, 2019. pp 8287–8294
Li J, Wang J, Tian Q, Gao W, Zhang S (2019) Global-local temporal representations for video person re-identification. In: IEEE/CVF International conference on computer vision (ICCV), Seoul, South Korea, 2019. IEEE, pp 3957–3966
Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. European conference on computer vision. Springer, pp 228–243
Google Scholar
Gao J, Nevatia R (2018) Revisiting temporal modeling for video-based person reid. arXiv preprint arXiv:1805.02104
Pei S, Fan X (2021) Multi-level fusion temporal-spatial co-attention for video-based person re-identification. Entropy 23(12):1686
Article Google Scholar
Liu C-T, Wu C-W, Wang Y-CF, Chien S-Y (2019) Spatially and temporally efficient non-local attention network for video-based person re-identification. arXiv preprint arXiv:1908.01683
Song W, Zheng J, Wu Y, Chen C, Liu F (2021) Discriminative feature extraction for video person re-identification via multi-task network. Appl Intell 51:788–803
Article Google Scholar
Liu X, Zhang P, Yu C, Lu H, Qian X, Yang X (2021) A video is worth three views: Trigeminal transformers for video-based person re-identification. arXiv preprint arXiv:2104.01745
Wu D, Ye M, Lin G, Gao X, Shen J (2022) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf Forensics Secur 17:115–126
Article Google Scholar
Yang F, Wang X, Zhu X, Liang B, Li W (2022) Relation-based global-partial feature learning network for video-based person re-identification. Neurocomputing 488:424–435
Article Google Scholar
Bai S, Ma B, Chang H, Huang R, Shan S, Chen X (2021) SANet: Statistic attention network for video-based person re-identification. IEEE Trans Circ Syst Video Technol 32(6):3866–3879
Article Google Scholar
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Seattle, WA. IEEE, pp 2818–2826
Zheng L et al (2016) Mars: A video benchmark for large-scale person re-identification. European conference on computer vision. Springer, pp 868–884
Google Scholar
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. European conference on computer vision. Springer, pp 688–703
Google Scholar
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5177–5186
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. European conference on computer vision. Springer, pp 17–35
Google Scholar
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: 32nd IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, 2019. IEEE, pp 1487–1495
Zhang G, Chen Y, Dai Y, Zheng Y, Wu Y (2021) Reference-aided part-aligned feature disentangling for video person re-identification. In: 2021 IEEE International conference on multimedia and expo (ICME). IEEE, pp 1-6
Kiran M, Bhuiyan A, Nguyen-Meidine L, Blais-Morin LA, Ben Ayed I, Granger E (2021) Flow guided mutual attention for person re-identification. Image Vis Comput 113:104246
Article Google Scholar
Wang Z et al (2021) Robust video-based person re-identification by hierarchical mining. IEEE Trans Circuits Syst Video Technol
Chen Z, Zhou Z, Huang J, Zhang P, Li B, Assoc Advancement Artificial I (2020) Frame-guided region-aligned representation for video person re-identification. In: 34th AAAI Conference on artificial intelligence / 32nd innovative applications of artificial intelligence conference / 10th AAAI symposium on educational advances in artificial intelligence, New York, NY, 2020, vol 34. pp 10591–10598
Jiang M, Leng B, Song G, Meng Z (2020) Weighted triple-sequence loss for video-based person re-identification. Neurocomputing 381:314–321
Article Google Scholar
Subramaniam A, Nambiar A, Mittal A (2019) Co-segmentation inspired attention networks for video-based person re-identification. In: IEEE/CVF International conference on computer vision (ICCV), Seoul, South Korea, 2019. , IEEE, pp 562–572
Lin G, Zhao S, Shen J (2021) Video person re-identification with global statistic pooling and self-attention distillation. Neurocomputing 453:777–789
Article Google Scholar
Fu H, Zhang K, Li HY, Wang JY, Wang Z (2022) Spatial temporal and channel aware network for video-based person re-identification. Image Vis Comput 118:104356
Article Google Scholar
Liu Y, Yuan Z, Zhou W, Li H, Aaai (2019) Spatial and temporal mutual promotion for video-based person re-identification. In: 33rd AAAI Conference on artificial intelligence / 31st innovative applications of artificial intelligence conference / 9th AAAI symposium on educational advances in artificial intelligence, Honolulu, HI, 2019. pp 8786–8793
Li PK, Pan PB, Liu P, Xu ML, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circ Syst Video Technol 31(2):503–511
Article Google Scholar
Hou R et al (2019) VRSTC: Occlusion-free video person re-identification. In: 32nd IEEE/CVF Conference on computer vision and pattern recognition (CVPR), Long Beach, CA, 2019. pp 7176–7185
Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276
Article Google Scholar
Gu XQ, Ma BP, Chang H, Shan SG, Chen XL (2019) Temporal knowledge propagation for image-to-video person re-identification. In IEEE/CVF International conference on computer vision (ICCV), Seoul, South Korea, 2019. IEEE, pp 9646–9655
Porrello A, Bergamini L, Calderara S (2020) Robust re-identification by multiple views knowledge distillation. European conference on computer vision. Springer, pp 93–110
Google Scholar
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. pp 91–102
Li J, Zhang S, Huang T (2020) Multi-scale temporal cues learning for video person re-identification. IEEE Trans Image Process 29:4461–4473
Article Google Scholar
Batool E, Gillani S, Naz S, Bukhari M, Maqsood M, Yeo S-S, Rho S (2023) POSNet: a hybrid deep learning model for efficient person re-identification. J Supercomput 1–29
Song W, Zheng J, Wu Y, Chen C, Liu F (2020) Video-based person re-identification using a novel feature extraction and fusion technique. Multimed Tools Appl 79:12471–12491
Article Google Scholar
Ouyang D, Zhang Y, Shao J (2019) Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recognit Lett 117:153–160
Article Google Scholar
Cheng L, Jing X-Y, Zhu X, Ma F, Hu C-H, Cai Z, Qi F (2020) Scale-fusion framework for improving video-based person re-identification performance. Neural Comput Appl 32:12841–12858
Article Google Scholar
Chen L, Yang H, Gao Z (2020) Comprehensive feature fusion mechanism for video-based person re-identification via significance-aware attention. Signal Process Image Commun 84:115835
Article Google Scholar
Tagore NK, Chattopadhyay P, Wang L (2020) T-MAN: a neural ensemble approach for person re-identification using spatio-temporal information. Multimed Tools Appl 79(37–38):28393–28409
Article Google Scholar
Wang X, Zhao X (2019) Temporal regularized spatial attention for video-based person re-identification. pp 2249–2253
Gong W, Yan B, Lin C (2020) Flow-guided feature enhancement network for video-based person re-identification. Neurocomputing 383:295–302
Article Google Scholar
Lu Z, Zhang G, Huang G, Yu Z, Pun C-M, Zhang W, Chen J, Ling W-K (2022) Video person re-identification using key frame screening with index and feature reorganization based on inter-frame relation. Int J Mach Learn Cybern 13(9):2745–2761
Article Google Scholar
Li J, Piao Y (2022) Video person re-identification with frame sampling-random erasure and mutual information-temporal weight aggregation. Sensors 22(8):3047

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Henan University, Zhengzhou, 450001, Henan, China
Jun Wang, Qi Zhao, Di Jia, Ziqing Huang, Miaohui Zhang & Xing Ren

Authors

Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Di Jia
View author publications
You can also search for this author in PubMed Google Scholar
Ziqing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Miaohui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jun Wang, Qi Zhao, Di Jia , Yonghua Zhang and Miaohui Zhang. The first draft of the manuscript was written by Xing Ren and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xing Ren.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Spatial-temporal aware network for video-based person re-identification”.

Ethical and informed consent for data used

The data used did not involve human participants and animal studies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Zhao, Q., Jia, D. et al. Spatial-temporal aware network for video-based person re-identification. Multimed Tools Appl 83, 36355–36373 (2024). https://doi.org/10.1007/s11042-023-16911-8

Download citation

Received: 18 April 2023
Revised: 26 July 2023
Accepted: 05 September 2023
Published: 29 September 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-16911-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial-temporal aware network for video-based person re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attention-guided spatial–temporal graph relation network for video-based person re-identification

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

Data Availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now