Skip to main content
Log in

Spatial-temporal aware network for video-based person re-identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video-based pedestrian re-identification (ReID) is able to match the same pedestrian from various cameras in a complex real-world scene. The extracted representations can’t contain all the useful information about the persons, due to the occlusion and misalignment of human areas between video sequences, and thus lack integrity and discrimination. To resolve this issue, we propose a new Spatial-Temporal Aware Network, which can mine and complement person features with feature relationships and intra-frame cues. According to the high correlation of the feature nodes of the same person between different video sequences, we employ the learned pedestrian feature nodes to construct the temporal relationship graph. In detail, the Temporal Interaction Module is designed to locate relevant pedestrian regions by modeling the correlation of feature nodes with reference nodes; and the Temporal Attention Module that we have designed is used to select more specific reference nodes. Then, we apply the designed Spatial Reference Module to adaptively mine each frame for fine-grained cues, making the spatial-temporal characteristics of persons more discriminative. We have implemented numerous experiments to demonstrate the excellent performance of STAN, such as achieving 88.0\(\%\) mAP and 89.5\(\%\) Rank-1 accuracy on the MARS dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are not publicly available due to [REASON(S) WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.

References

  1. Andriluka M, Roth S, Schiele B (2008) People-tracking-by-detection and people-detection-by-tracking. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  2. Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3539–3548

  3. Khan FM, Brémond F (2016) Person re-identification for real-world surveillance systems. arXiv preprint arXiv:1607.05975

  4. Wang X (2013) Intelligent multi-camera video surveillance: A review. Pattern Recognit Lett 34(1):3–19

    Article  Google Scholar 

  5. Chen XS et al (2020) Salience-guided cascaded suppression network for person re-identification. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR), Electr Network, 2020. pp 3297–3307

  6. Su C et al (2017) Pose-driven deep convolutional model for person re-identification. In: 16th IEEE International conference on computer vision (ICCV), Venice, Italy, 2017. pp 3980–3989

  7. Wei L, Zhang S, Gao W, Tian Q (2018) Person Transfer GAN to bridge domain gap for person re-identification. In 31st IEEE/CVF conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, 2018. IEEE, pp 79–88

  8. Wang C, Zhang Q, Huang C, Liu W, Wang X (2018) Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification. In: 15th European conference on computer vision (ECCV), Munich, Germany, 2018, vol. 11208, pp 384–400

  9. Zhang Z, Lan C, Zeng W, Chen Z (2020) Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. Ed. IEEE

  10. Eom C, Lee G, Lee J, Ham B (2021) Video-based person re-identification with spatial and temporal memory networks. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 12036–12045

  11. Wang Y, Zhang P, Gao S, Geng X, Lu H, Wang D (2021) Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 12026–12035

  12. Bhuiyan A, Huang JX (2022) STCA: Utilizing a spatio-temporal cross-attention network for enhancing video person re-identification. Image Vis Comput 123:104474

    Article  Google Scholar 

  13. McLaughlin N, del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Seattle, WA, 2016. IEEE, pp 1325–1334

  14. Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision. pp 4733–4742

  15. Chung D, Tahboub K, Delp EJ (2017) A two stream siamese convolutional neural network for person re-identification. In: 16th IEEE international conference on computer vision (ICCV), Venice, Italy, 2017. IEEE, pp 1992–2000

  16. Gu XQ, Chang H, Ma BP, Shan SG (2022) Motion feature aggregation for video-based person re-identification. IEEE Trans Image Process 31:3908–3919

    Article  Google Scholar 

  17. Zhang R et al (2019) SCAN: self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28(10):4870–4882

    Article  MathSciNet  Google Scholar 

  18. Liu J, Zha Z-J, Chen X, Wang Z, Zhang Y (2019) Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans Multimed Comput Commun Appl 15(1):8

    Google Scholar 

  19. Fu Y, Wang X, Wei Y, Huang T, Aaai (2019) STA: spatial-temporal attention for large-scale video-based person re-identification. In: 33rd AAAI Conference on artificial intelligence / 31st innovative applications of artificial intelligence conference / 9th AAAI symposium on educational advances in artificial intelligence, Honolulu, HI, 2019. pp 8287–8294

  20. Li J, Wang J, Tian Q, Gao W, Zhang S (2019) Global-local temporal representations for video person re-identification. In: IEEE/CVF International conference on computer vision (ICCV), Seoul, South Korea, 2019. IEEE, pp 3957–3966

  21. Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. European conference on computer vision. Springer, pp 228–243

    Google Scholar 

  22. Gao J, Nevatia R (2018) Revisiting temporal modeling for video-based person reid. arXiv preprint arXiv:1805.02104

  23. Pei S, Fan X (2021) Multi-level fusion temporal-spatial co-attention for video-based person re-identification. Entropy 23(12):1686

    Article  Google Scholar 

  24. Liu C-T, Wu C-W, Wang Y-CF, Chien S-Y (2019) Spatially and temporally efficient non-local attention network for video-based person re-identification. arXiv preprint arXiv:1908.01683

  25. Song W, Zheng J, Wu Y, Chen C, Liu F (2021) Discriminative feature extraction for video person re-identification via multi-task network. Appl Intell 51:788–803

    Article  Google Scholar 

  26. Liu X, Zhang P, Yu C, Lu H, Qian X, Yang X (2021) A video is worth three views: Trigeminal transformers for video-based person re-identification. arXiv preprint arXiv:2104.01745

  27. Wu D, Ye M, Lin G, Gao X, Shen J (2022) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf Forensics Secur 17:115–126

    Article  Google Scholar 

  28. Yang F, Wang X, Zhu X, Liang B, Li W (2022) Relation-based global-partial feature learning network for video-based person re-identification. Neurocomputing 488:424–435

    Article  Google Scholar 

  29. Bai S, Ma B, Chang H, Huang R, Shan S, Chen X (2021) SANet: Statistic attention network for video-based person re-identification. IEEE Trans Circ Syst Video Technol 32(6):3866–3879

    Article  Google Scholar 

  30. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737

  31. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Seattle, WA. IEEE, pp 2818–2826

  32. Zheng L et al (2016) Mars: A video benchmark for large-scale person re-identification. European conference on computer vision. Springer, pp 868–884

    Google Scholar 

  33. Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. European conference on computer vision. Springer, pp 688–703

    Google Scholar 

  34. Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5177–5186

  35. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. European conference on computer vision. Springer, pp 17–35

    Google Scholar 

  36. Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: 32nd IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, 2019. IEEE, pp 1487–1495

  37. Zhang G, Chen Y, Dai Y, Zheng Y, Wu Y (2021) Reference-aided part-aligned feature disentangling for video person re-identification. In: 2021 IEEE International conference on multimedia and expo (ICME). IEEE, pp 1-6

  38. Kiran M, Bhuiyan A, Nguyen-Meidine L, Blais-Morin LA, Ben Ayed I, Granger E (2021) Flow guided mutual attention for person re-identification. Image Vis Comput 113:104246

    Article  Google Scholar 

  39. Wang Z et al (2021) Robust video-based person re-identification by hierarchical mining. IEEE Trans Circuits Syst Video Technol

  40. Chen Z, Zhou Z, Huang J, Zhang P, Li B, Assoc Advancement Artificial I (2020) Frame-guided region-aligned representation for video person re-identification. In: 34th AAAI Conference on artificial intelligence / 32nd innovative applications of artificial intelligence conference / 10th AAAI symposium on educational advances in artificial intelligence, New York, NY, 2020, vol 34. pp 10591–10598

  41. Jiang M, Leng B, Song G, Meng Z (2020) Weighted triple-sequence loss for video-based person re-identification. Neurocomputing 381:314–321

    Article  Google Scholar 

  42. Subramaniam A, Nambiar A, Mittal A (2019) Co-segmentation inspired attention networks for video-based person re-identification. In: IEEE/CVF International conference on computer vision (ICCV), Seoul, South Korea, 2019. , IEEE, pp 562–572

  43. Lin G, Zhao S, Shen J (2021) Video person re-identification with global statistic pooling and self-attention distillation. Neurocomputing 453:777–789

    Article  Google Scholar 

  44. Fu H, Zhang K, Li HY, Wang JY, Wang Z (2022) Spatial temporal and channel aware network for video-based person re-identification. Image Vis Comput 118:104356

    Article  Google Scholar 

  45. Liu Y, Yuan Z, Zhou W, Li H, Aaai (2019) Spatial and temporal mutual promotion for video-based person re-identification. In: 33rd AAAI Conference on artificial intelligence / 31st innovative applications of artificial intelligence conference / 9th AAAI symposium on educational advances in artificial intelligence, Honolulu, HI, 2019. pp 8786–8793

  46. Li PK, Pan PB, Liu P, Xu ML, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circ Syst Video Technol 31(2):503–511

    Article  Google Scholar 

  47. Hou R et al (2019) VRSTC: Occlusion-free video person re-identification. In: 32nd IEEE/CVF Conference on computer vision and pattern recognition (CVPR), Long Beach, CA, 2019. pp 7176–7185

  48. Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276

    Article  Google Scholar 

  49. Gu XQ, Ma BP, Chang H, Shan SG, Chen XL (2019) Temporal knowledge propagation for image-to-video person re-identification. In IEEE/CVF International conference on computer vision (ICCV), Seoul, South Korea, 2019. IEEE, pp 9646–9655

  50. Porrello A, Bergamini L, Calderara S (2020) Robust re-identification by multiple views knowledge distillation. European conference on computer vision. Springer, pp 93–110

    Google Scholar 

  51. Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. pp 91–102

  52. Li J, Zhang S, Huang T (2020) Multi-scale temporal cues learning for video person re-identification. IEEE Trans Image Process 29:4461–4473

    Article  Google Scholar 

  53. Batool E, Gillani S, Naz S, Bukhari M, Maqsood M, Yeo S-S, Rho S (2023) POSNet: a hybrid deep learning model for efficient person re-identification. J Supercomput 1–29

  54. Song W, Zheng J, Wu Y, Chen C, Liu F (2020) Video-based person re-identification using a novel feature extraction and fusion technique. Multimed Tools Appl 79:12471–12491

    Article  Google Scholar 

  55. Ouyang D, Zhang Y, Shao J (2019) Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recognit Lett 117:153–160

    Article  Google Scholar 

  56. Cheng L, Jing X-Y, Zhu X, Ma F, Hu C-H, Cai Z, Qi F (2020) Scale-fusion framework for improving video-based person re-identification performance. Neural Comput Appl 32:12841–12858

    Article  Google Scholar 

  57. Chen L, Yang H, Gao Z (2020) Comprehensive feature fusion mechanism for video-based person re-identification via significance-aware attention. Signal Process Image Commun 84:115835

    Article  Google Scholar 

  58. Tagore NK, Chattopadhyay P, Wang L (2020) T-MAN: a neural ensemble approach for person re-identification using spatio-temporal information. Multimed Tools Appl 79(37–38):28393–28409

    Article  Google Scholar 

  59. Wang X, Zhao X (2019) Temporal regularized spatial attention for video-based person re-identification. pp 2249–2253

  60. Gong W, Yan B, Lin C (2020) Flow-guided feature enhancement network for video-based person re-identification. Neurocomputing 383:295–302

    Article  Google Scholar 

  61. Lu Z, Zhang G, Huang G, Yu Z, Pun C-M, Zhang W, Chen J, Ling W-K (2022) Video person re-identification using key frame screening with index and feature reorganization based on inter-frame relation. Int J Mach Learn Cybern 13(9):2745–2761

    Article  Google Scholar 

  62. Li J, Piao Y (2022) Video person re-identification with frame sampling-random erasure and mutual information-temporal weight aggregation. Sensors 22(8):3047

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jun Wang, Qi Zhao, Di Jia , Yonghua Zhang and Miaohui Zhang. The first draft of the manuscript was written by Xing Ren and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xing Ren.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Spatial-temporal aware network for video-based person re-identification”.

Ethical and informed consent for data used

The data used did not involve human participants and animal studies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Zhao, Q., Jia, D. et al. Spatial-temporal aware network for video-based person re-identification. Multimed Tools Appl 83, 36355–36373 (2024). https://doi.org/10.1007/s11042-023-16911-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16911-8

Keywords

Navigation