Abstract
Identifying pedestrians in video sequences captured by non-overlapping multi-cameras is referred to as video-based Person Re-identification. The successive video frames in video clips embrace motion patterns of pedestrians and represent a person's appearance from varying angles with different body poses and, thus, provide critical features to counter occlusion, pose variation, viewpoint change, etc. This article proposes a novel person reidentification methodology, which incorporates a 3D Inception-based Person Re-identification model, which embraces four three-dimensional (3D) Inception modules with 3D convolution and pooling layers. The receptive fields of neurons are well expanded through 3D inception modules in both temporal and spatial dimensions. Due to this, the model learns discriminatory appearance along with pedestrians' long-term and short-term motion patterns without any motion approximation module. Further, the model is trained with a unified loss function integrating center loss with usual identification loss to reduce intra-class difference while increasing inter-class difference. Further, the proposed method incorporates an attribute recognition model to identify discriminatory attributes in the video frames. The Spatio-temporal and attribute features are then utilized by a reranking method, which generates the k-most similar video clips for the given input. The effectiveness of the proposed method is validated by performing extensive experiments over three realistic surveillance video datasets; MARS, DukeMTMC-VideoReID, and iLIDS.
Similar content being viewed by others
Data availability
The data that support the findings of this study are openly available and cited/reference in text.
References
Chen D, Yuan Z, Hua G, Zheng N, Wang J (2015) Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.:1565–1573
Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:403–412
Chen T, Ding S, Xie J, Yuan Y, Chen W, Yang Y, Wang Z (2019) Abd-net: attentive but diverse person re-identification. ProceedIEEE Int Conf Comput Vision:8350–8360
Chen B, Deng W, Hu J (2019) Mixed high-order attention network for person re-identification. Proceed IEEE Int Conf Comp Vision:371–381
Chen Z, Zhou Z, Huang J, Zhang P, Li B (2020) Frame-guided region-aligned representation for video person re-identification. Proc AAAI Conf Artif Intell 34(7):10591–10598
Choudhary M, Tiwari V, Jain S (2021) Person re-identification using deep siamese network with multi-layer similarity constraints. Multimed Tools Appl:1–17
Fu Y, Wang X, Wei Y, Huang T (2019) STA: spatial-temporal attention for large-scale video-based person re-identification. Proc. AAAI Conf. Artif. Intell. 33:8287–8294
Fu H, Zhang K, Li H, Wang J, Wang Z (2022) Spatial temporal and channel aware network for video-based person re-identification. Image Vis Comput 118:104356
Ge Y, Li Z, Zhao H, Yin G, Yi S, Wang X, Li H (2018) Fd-Gan: pose-guided feature distilling Gan for robust person re-identification
Gong W, Yan B, Lin C (2020) Flow-guided feature enhancement network for video-based person re-identification. Neurocomputing 383:295–302
Gong W, Yan B, Lin C (2020) Flow-guided feature enhancement network for videobased person re-identification. Neurocomputing 383:295–302
Hermans A, Beyer L, Leibe B, In Defense of the Triplet Loss for Person Reidentification, https://arxiv.org/pdf/1703.07737.pdf 2017 (arXiv preprint arXiv:1703.07737).
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) VRSTC: occlusion-free video person re-identification. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 7176–7185. https://doi.org/10.1109/CVPR.2019.00735
Jiang M, Leng B, Song G, Meng Z (2020) Weighted triple-sequence loss for videobased person re-identification. Neurocomputing 381:314–321
Khamis S, Kuo C-H, Singh VK, Shet VD, Davis LS (2014) Joint learning for attribute-consistent person reidentification, in: European conference on computer vision, springer. Pp 134146.
Layne R, Hospedales TM, Gong S (2017) Attributes-based reidentification. Person Re-Identification, In, pp 93–117
Li W, Wang X (2013) Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:3594–3601
Li W, Zhao R, Xiao T, Wang X (2014) Deepreid: deep filter pairing neural network for person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:152–159
Li J, Zhang S, Wang J, Gao W, Tian Q (2019) Global-Local Temporal Representations for Video Person Re-Identification. Proc. IEEE Int. Conf Comput. Vis. (ICCV):3957–3966
Li J, Zhang S, Huang T (2020) Multi-scale temporal cues learning for video person re-identification. IEEE Trans Image Process 29:4 461–4 473
Li S, Yu H, Hu H (2020) Appearance and motion enhancement for video-based person re-identification. Proc. AAAI Conf. Artif. Intell. 34(7):11394–11401
Li P, Pan P, Liu P, Xu M, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circuits Syst Video Technol 31(2):503–511
Liang Z, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 868–884.
Lin Y, Zheng L, Zheng Z, Wu Y, Yang Y (2017) Improving person re-identification by attribute and identity learning. Comput Vis Pattern Recognit 95:151–161
Lin G, Zhao S, Shen J (2021) Video person re-identification with global statistic pooling and self-attention distillation. Neurocomputing 381:777–789
Liu J, Zha ZJ, Chen X, Wang Z, Zhang Y (2019) Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans Multimedia Comput, Commun, Appl (TOMM) 15(1s):1–19
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60:91–110
Mansouri N, Ammar S, Kessentini Y (2021) Re-ranking person re-identification using attributes learning. Neural Comput Applic 33(19):12827–12843
Matsukawa T, Suzuki E (2016) Person re-identification using CNN features learned from combination of attributes , In: International conference on pattern recognition, Cancn, Mxico. pp 2429 – 2434.
Mclaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1325–1334
McLaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:1325–1334
Ming Z, Zhu M, Wang X, Zhu J, Cheng J, Gao C, Yang Y, Wei X (2022) Deep learning-based person re-identification methods: a survey and outlook of recent works. Image Vis Comput 119:104394
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision. Springer, pp 17–35
Song W, Zheng J, Wu Y, Chen C, Liu F (2021) Discriminative feature extraction for video person re-identification via multi-task network. Appl Intell 51(2):788–803
Su C, Zhang, Xing J, Gao W, Tian Q (2016) Deep attributes driven multi-camera person re-identification, arXiv:1605.03259.
Subramaniam A, Nambiar A, Mittal A (2019) Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification. Proc. IEEE Int. Conf. Comput. Vis. (ICCV). 562-572
Tay CP, Roy S, Yap KH (2019) Aanet: attribute attention network for person reidentifications. Proc IEEE Conf Comput Vis Pattern Recognit:7127–7136
Tay CP, Roy S, Yap KH (2019) Aanet: attribute attention network for person reidentifications. Proc IEEE Conf Comput Vis Pattern Recognit:7127–7136
Wang J, Zhu X, Gong SH, Li W (2015) Transferable joint attribute-identity deep learning for unsupervised person re-identification,In: Conference on computer vision and pattern recognition, tats-Unis. pp 2275 – 2284.
Wang T, Gong S, Zhu X, Wang S (2016) Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38:2501–2514
G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, “Learning Discriminative Features with Multiple Granularities for Person Re-Identification,” in Proc. ACM Multimedia Conf. MM, 2018, pp. 274-282.
Wang Z et al. (2021) Robust Video-based Person Re-Identification by Hierarchical Mining. IEEE Trans. Circuits Syst. Video Technol. 1-1, https://doi.org/10.1109/TCSVT.2021.3076097.
Wei L, Zhang S, Yao H, Gao W, Tian Q (2017) Glad: global-local-alignment descriptor for pedestrian retrieval. Proceed ACM Int Conf Multimedia:420–428
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision. Springer, Cham, pp 499–515
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun, pp 5177–5186
Wu Y, Bourahla O, Li X, Wu F, Zhou X (2020) Adaptive graph representation learning for video person re-identification. IEEE Trans Image Process 29:8821–8830
Wu D, Ye M, Lin G, Gao X, Shen J (2021) Person re-identification by context-aware part attention and multi-head collaborative learning. IEEE Trans Inf. Foren, Sec
Yan Y, Qin J, Chen J, Liu L, Zhu F, Tai Y, Shao L (2020) Learning multi-granular hypergraphs for video-based person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:2899–2908
Yang J, Zheng W, Yang Q, Chen Y, Tian Q (2020) Spatial-temporal graph convolutional network for video-based person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:3289–3299
Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276
Yang F, Wang X, Zhu X, Liang B, Li W (2022) Relation-based global-partial feature learning network for video-based person re-identification. Neurocomputing 488:424–435
Yao Y, Jiang X, Fujita H, Fang Z (2022) A sparse graph wavelet convolution neural network for video-based person re-identification. Pattern Recogn 129:108708
Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. Proceed IEEE Conf Comput Vision Patt Recogn:3183–3192
Zhang L et al (2021) Ordered or Orderless: a revisit for video based person re- identification. IEEE Trans Pattern Anal Mach Intell 43(4):1460–1466
Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J (2019) Joint discriminative and generative learning for person re-identification. Proc IEEE Conf Comput Vis Pattern Recognit:2138–2147
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.:3652–3661
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person reidentification with K-reciprocal encoding, Conference on Computer Vision and Pattern Recognition, pp1318–1327. Hawa, tats Unis, IEEE
Zhou Z, Huang Y, Wang W, Liang W, Tan T. 2017. See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In proceedings of the IEEE international conference on computer vision. IEEE, 6776–6785.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding and/or Conflicts of Interest/Competing interests: I declare on behalf of the author that there is not any conflict of interest, either non-financial or commercial, among the author.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Choudhary, M., Tiwari, V., Jain, S. et al. Person Reidentification using 3D inception based Spatio-temporal features learning, attribute recognition, and Reranking. Multimed Tools Appl 83, 2007–2030 (2024). https://doi.org/10.1007/s11042-023-15473-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15473-z