Skip to main content
Log in

Person Reidentification using 3D inception based Spatio-temporal features learning, attribute recognition, and Reranking

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Identifying pedestrians in video sequences captured by non-overlapping multi-cameras is referred to as video-based Person Re-identification. The successive video frames in video clips embrace motion patterns of pedestrians and represent a person's appearance from varying angles with different body poses and, thus, provide critical features to counter occlusion, pose variation, viewpoint change, etc. This article proposes a novel person reidentification methodology, which incorporates a 3D Inception-based Person Re-identification model, which embraces four three-dimensional (3D) Inception modules with 3D convolution and pooling layers. The receptive fields of neurons are well expanded through 3D inception modules in both temporal and spatial dimensions. Due to this, the model learns discriminatory appearance along with pedestrians' long-term and short-term motion patterns without any motion approximation module. Further, the model is trained with a unified loss function integrating center loss with usual identification loss to reduce intra-class difference while increasing inter-class difference. Further, the proposed method incorporates an attribute recognition model to identify discriminatory attributes in the video frames. The Spatio-temporal and attribute features are then utilized by a reranking method, which generates the k-most similar video clips for the given input. The effectiveness of the proposed method is validated by performing extensive experiments over three realistic surveillance video datasets; MARS, DukeMTMC-VideoReID, and iLIDS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 1
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available and cited/reference in text.

References

  1. Chen D, Yuan Z, Hua G, Zheng N, Wang J (2015) Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.:1565–1573

  2. Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:403–412

  3. Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Wang Z (2019) Abd-net: attentive but diverse person re-identification. ProceedIEEE Int Conf Comput Vision:8350–8360

  4. Chen B, Deng W, Hu J (2019) Mixed high-order attention network for person re-identification. Proceed IEEE Int Conf Comp Vision:371–381

  5. Chen Z, Zhou Z, Huang J, Zhang P, Li B (2020) Frame-guided region-aligned representation for video person re-identification. Proc AAAI Conf Artif Intell 34(7):10591–10598

    Google Scholar 

  6. Choudhary M, Tiwari V, Jain S (2021) Person re-identification using deep siamese network with multi-layer similarity constraints. Multimed Tools Appl:1–17

  7. Fu Y, Wang X, Wei Y, Huang T (2019) STA: spatial-temporal attention for large-scale video-based person re-identification. Proc. AAAI Conf. Artif. Intell. 33:8287–8294

    Google Scholar 

  8. Fu H, Zhang K, Li H, Wang J, Wang Z (2022) Spatial temporal and channel aware network for video-based person re-identification. Image Vis Comput 118:104356

    Article  Google Scholar 

  9. Ge Y, Li Z, Zhao H, Yin G, Yi S, Wang X, Li H (2018) Fd-Gan: pose-guided feature distilling Gan for robust person re-identification

  10. Gong W, Yan B, Lin C (2020) Flow-guided feature enhancement network for video-based person re-identification. Neurocomputing 383:295–302

    Article  Google Scholar 

  11. Gong W, Yan B, Lin C (2020) Flow-guided feature enhancement network for videobased person re-identification. Neurocomputing 383:295–302

    Article  Google Scholar 

  12. Hermans A, Beyer L, Leibe B, In Defense of the Triplet Loss for Person Reidentification, https://arxiv.org/pdf/1703.07737.pdf 2017 (arXiv preprint arXiv:1703.07737).

  13. Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) VRSTC: occlusion-free video person re-identification. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 7176–7185. https://doi.org/10.1109/CVPR.2019.00735

    Chapter  Google Scholar 

  14. Jiang M, Leng B, Song G, Meng Z (2020) Weighted triple-sequence loss for videobased person re-identification. Neurocomputing 381:314–321

    Article  Google Scholar 

  15. Khamis S, Kuo C-H, Singh VK, Shet VD, Davis LS (2014) Joint learning for attribute-consistent person reidentification, in: European conference on computer vision, springer. Pp 134146.

  16. Layne R, Hospedales TM, Gong S (2017) Attributes-based reidentification. Person Re-Identification, In, pp 93–117

  17. Li W, Wang X (2013) Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:3594–3601

  18. Li W, Zhao R, Xiao T, Wang X (2014) Deepreid: deep filter pairing neural network for person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:152–159

  19. Li J, Zhang S, Wang J, Gao W, Tian Q (2019) Global-Local Temporal Representations for Video Person Re-Identification. Proc. IEEE Int. Conf Comput. Vis. (ICCV):3957–3966

  20. Li J, Zhang S, Huang T (2020) Multi-scale temporal cues learning for video person re-identification. IEEE Trans Image Process 29:4 461–4 473

    Article  Google Scholar 

  21. Li S, Yu H, Hu H (2020) Appearance and motion enhancement for video-based person re-identification. Proc. AAAI Conf. Artif. Intell. 34(7):11394–11401

    Google Scholar 

  22. Li P, Pan P, Liu P, Xu M, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circuits Syst Video Technol 31(2):503–511

    Article  Google Scholar 

  23. Liang Z, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 868–884.

  24. Lin Y, Zheng L, Zheng Z, Wu Y, Yang Y (2017) Improving person re-identification by attribute and identity learning. Comput Vis Pattern Recognit 95:151–161

    Article  Google Scholar 

  25. Lin G, Zhao S, Shen J (2021) Video person re-identification with global statistic pooling and self-attention distillation. Neurocomputing 381:777–789

    Article  Google Scholar 

  26. Liu J, Zha ZJ, Chen X, Wang Z, Zhang Y (2019) Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans Multimedia Comput, Commun, Appl (TOMM) 15(1s):1–19

    Google Scholar 

  27. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60:91–110

    Article  Google Scholar 

  28. Mansouri N, Ammar S, Kessentini Y (2021) Re-ranking person re-identification using attributes learning. Neural Comput Applic 33(19):12827–12843

    Article  Google Scholar 

  29. Matsukawa T, Suzuki E (2016) Person re-identification using CNN features learned from combination of attributes , In: International conference on pattern recognition, Cancn, Mxico. pp 2429 – 2434.

  30. Mclaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1325–1334

  31. McLaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:1325–1334

  32. Ming Z, Zhu M, Wang X, Zhu J, Cheng J, Gao C, Yang Y, Wei X (2022) Deep learning-based person re-identification methods: a survey and outlook of recent works. Image Vis Comput 119:104394

    Article  Google Scholar 

  33. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision. Springer, pp 17–35

    Google Scholar 

  34. Song W, Zheng J, Wu Y, Chen C, Liu F (2021) Discriminative feature extraction for video person re-identification via multi-task network. Appl Intell 51(2):788–803

    Article  Google Scholar 

  35. Su C, Zhang, Xing J, Gao W, Tian Q (2016) Deep attributes driven multi-camera person re-identification, arXiv:1605.03259.

  36. Subramaniam A, Nambiar A, Mittal A (2019) Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification. Proc. IEEE Int. Conf. Comput. Vis. (ICCV). 562-572

  37. Tay CP, Roy S, Yap KH (2019) Aanet: attribute attention network for person reidentifications. Proc IEEE Conf Comput Vis Pattern Recognit:7127–7136

  38. Tay CP, Roy S, Yap KH (2019) Aanet: attribute attention network for person reidentifications. Proc IEEE Conf Comput Vis Pattern Recognit:7127–7136

  39. Wang J, Zhu X, Gong SH, Li W (2015) Transferable joint attribute-identity deep learning for unsupervised person re-identification,In: Conference on computer vision and pattern recognition, tats-Unis. pp 2275 – 2284.

  40. Wang T, Gong S, Zhu X, Wang S (2016) Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38:2501–2514

    Article  Google Scholar 

  41. G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, “Learning Discriminative Features with Multiple Granularities for Person Re-Identification,” in Proc. ACM Multimedia Conf. MM, 2018, pp. 274-282.

  42. Wang Z et al. (2021) Robust Video-based Person Re-Identification by Hierarchical Mining. IEEE Trans. Circuits Syst. Video Technol. 1-1, https://doi.org/10.1109/TCSVT.2021.3076097.

  43. Wei L, Zhang S, Yao H, Gao W, Tian Q (2017) Glad: global-local-alignment descriptor for pedestrian retrieval. Proceed ACM Int Conf Multimedia:420–428

  44. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision. Springer, Cham, pp 499–515

    Google Scholar 

  45. Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun, pp 5177–5186

    Google Scholar 

  46. Wu Y, Bourahla O, Li X, Wu F, Zhou X (2020) Adaptive graph representation learning for video person re-identification. IEEE Trans Image Process 29:8821–8830

    Article  Google Scholar 

  47. Wu D, Ye M, Lin G, Gao X, Shen J (2021) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf. Foren, Sec

  48. Yan Y, Qin J, Chen J, Liu L, Zhu F, Tai Y, Shao L (2020) Learning multi-granular hypergraphs for video-based person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:2899–2908

  49. Yang J, Zheng W, Yang Q, Chen Y, Tian Q (2020) Spatial-temporal graph convolutional network for video-based person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:3289–3299

  50. Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276

    Article  Google Scholar 

  51. Yang F, Wang X, Zhu X, Liang B, Li W (2022) Relation-based global-partial feature learning network for video-based person re-identification. Neurocomputing 488:424–435

    Article  Google Scholar 

  52. Yao Y, Jiang X, Fujita H, Fang Z (2022) A sparse graph wavelet convolution neural network for video-based person re-identification. Pattern Recogn 129:108708

    Article  Google Scholar 

  53. Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. Proceed IEEE Conf Comput Vision Patt Recogn:3183–3192

  54. Zhang L et al (2021) Ordered or Orderless: a revisit for video based person re- identification. IEEE Trans Pattern Anal Mach Intell 43(4):1460–1466

    Article  Google Scholar 

  55. Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J (2019) Joint discriminative and generative learning for person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:2138–2147

  56. Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.:3652–3661

  57. Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person reidentification with K-reciprocal encoding, Conference on Computer Vision and Pattern Recognition, pp1318–1327. Hawa, tats Unis, IEEE

  58. Zhou Z, Huang Y, Wang W, Liang W, Tan T. 2017. See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In proceedings of the IEEE international conference on computer vision. IEEE, 6776–6785.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivek Tiwari.

Ethics declarations

Funding and/or Conflicts of Interest/Competing interests: I declare on behalf of the author that there is not any conflict of interest, either non-financial or commercial, among the author.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choudhary, M., Tiwari, V., Jain, S. et al. Person Reidentification using 3D inception based Spatio-temporal features learning, attribute recognition, and Reranking. Multimed Tools Appl 83, 2007–2030 (2024). https://doi.org/10.1007/s11042-023-15473-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15473-z

Keywords

Navigation