Abstract
Video-based person re-identification attracts wide attention because it plays a crucial role for many applications in the video surveillance. The task of video-based person re-identification is to match image sequences of the pedestrian recorded by non-overlapping cameras. Like many visual recognition problems, variations in pose, viewpoints, illumination, and occlusion make this task non-trivial. Aiming at increasing the robustness of features to variations and occlusion, this paper designs an aligned multi-part image model inspired by human visual attention mechanism. This model performs a pose estimation method to align the pedestrians. Then, it divides the images to extract multi-part appearance features. Besides, we present independent metric learning to combine the multi-part appearance and spatial-temporal features, which obtains several metric kernels by feeding these features into distance metric learning respectively. These kernels are fused with the weights learned by the attention measure. The novel way of features fusion can achieve better functional complementarity of these features. In experiments, we analyze the effectiveness of the major components. Extensive experiments on two public benchmark datasets, i.e., the iLIDS-VID and PRID-2011 datasets, demonstrate the effectiveness of the proposed method.





Similar content being viewed by others
References
Chen J, Wang Y, Tang YY (2016) Person re-identification by exploiting spatio-temporal cues and multi-view metric learning. IEEE Signal Process Lett 23 (7):998–1002. https://doi.org/10.1109/LSP.2016.2574323
Cho YJ, Yoon KJ (2016) Improving person re-identification via pose-aware multi-shot matching. In: Computer vision and pattern recognition, pp 1354–1362
Chu H, Qi M, Liu H, Jiang J (2017) Local region partition for person re-identification.Multimed Tools Appl (7):1–17
Ferrari V, Marinjimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008, pp 1–8
Gao C, Wang J, Liu L, Yu JG, Sang N (2016) Temporally aligned pooling representation for video-based person re-identification. In: 2016 IEEE international conference on image processing (ICIP), pp 4284–4288, DOI https://doi.org/10.1109/ICIP.2016.7533168, (to appear in print)
Gordon CC, Churchill T, Clauser CE, Bradtmiller B, Mcconville JT (1989) Anthropometric survey of us army personnel: methods and summary statistics 1988. Tech. rep., Anthropology Research Project Inc., Yellow Springs, OH
He L, Xu X, Lu H, Yang Y, Shen F, Shen HT (2017) Unsupervised cross-modal retrieval through adversarial learning. In: IEEE International conference on multimedia and expo, pp 1153–1158
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis, pp 91–102
Itti L, Koch C (2000) A saliency-based search mechanism for overt and covert shifts of visual attention. Vis Res 40(12):1489–1506
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19Th British machine vision conference, pp 275–1. British machine vision association
Li W, Wang X (2013) Locally aligned feature transforms across views. In: Computer vision and pattern recognition, pp 3594–3601
Li Y, Zhuo L, Li J, Zhang J, Liang X, Tian Q (2017) Video-based person re-identification by deep feature guided pooling. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1454–1461. https://doi.org/10.1109/CVPRW.2017.188
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 2197–2206. https://doi.org/10.1109/CVPR.2015.7298832
Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2017) Video-based person re-identification with accumulative motion context. IEEE Trans Circuits Syst Video Technol PP(99):1–1. https://doi.org/10.1109/TCSVT.2017.2715499
Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for video-based pedestrian re-identification. In: IEEE International conference on computer vision, pp 3810– 3818
Liu Z, Chen J, Wang Y (2016) A fast adaptive spatio-temporal 3d feature for video-based person re-identification. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 4294– 4298
Mclaughlin N, Rincon JMD, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Computer vision and pattern recognition, pp 1325–1334
Ramanan D (2007) Learning to parse images of articulated bodies. In: Advances in neural information processing systems, pp 1129–1136
Song Z, Cai X, Chen Y, Zeng Y, Lv L, Shu H (2017) Deep convolutional neural networks with adaptive spatial feature for person re-identification. In: IEEE Advanced information technology, electronic and automation control conference, pp 2020–2023
Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: European conference on computer vision, pp 135–153
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision, pp 688–703
Wang T, Gong S, Zhu X, Wang S (2016) Person re-identification by discriminative selection in video ranking. IEEE Trans Pattern Anal Mach Intell 38 (12):2501–2514. https://doi.org/10.1109/TPAMI.2016.2522418
Wei L, Zhang S, Yao H, Gao W, Tian Q (2017) Glad: global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 2017 ACM on multimedia conference. ACM, pp 420–428
Xiao Q, Cao K, Chen H, Peng F, Zhang C (2016) Cross domain knowledge transfer for person re-identification. arXiv:1611.06026
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: 2017 IEEE international conference on computer vision (ICCV), pp 4743–4752. https://doi.org/10.1109/ICCV.2017.507
Xu X, He L, Lu H, Gao L, Ji Y (2018) Deep adversarial metric learning for cross-modal retrieval. World Wide Web-internet & Web Information Systems, pp 1–16
Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890
Yao H, Zhang S, Zhang Y, Li J, Tian Q (2017) Deep representation learning with part loss for person re-identification. arXiv:1707.00798
You J, Wu A, Li X, Zheng WS (2016) Top-push video-based person re-identification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1345–1353. https://doi.org/10.1109/CVPR.2016.150
Zhang W, Chen Q, Zhang W, He X (2018) Long-range terrain perception using convolutional neural networks. Neurocomputing 275:781–787
Zhang W, Hu S, Liu K (2017) Learning compact appearance representation for video-based person re-identification. arXiv:1702.06294
Zhang W, Ma B, Liu K, Huang R (2017) Video-based pedestrian re-identification by adaptive spatio-temporal appearance model. IEEE Trans Image Process PP(99):1–1
Zhang W, Yu X, He X (2017) Learning bidirectional temporal cues for video-based person re-identification. IEEE Trans Circuits Syst Video Technol PP (99):1–1. https://doi.org/10.1109/TCSVT.2017.2718188
Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: Computer vision and pattern recognition, pp 907–915
Zheng L, Huang Y, Lu H, Yang Y (2017) Pose invariant embedding for deep person re-identification. arXiv:1701.07732
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:1610.02984
Zheng S, Li X, Men A, Guo X, Yang B (2017) Integration of deep features and hand-crafted features for person re-identification. In: 2017 IEEE international conference on multimedia expo workshops (ICMEW), pp 674–679. https://doi.org/10.1109/ICMEW.2017.8026267
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE Conference on computer vision and pattern recognition, pp 6776–6785
Zhu J, Zeng H, Liao S, Lei Z, Cai C, Zheng LX (2017) Deep hybrid similarity learning for person re-identification. IEEE Trans Circuits Syst Video Technol PP(99):1–1
Zhu X, Jing XY, Wu F, Feng H (2016) Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: International joint conference on artificial intelligence, pp 3552–3558
Zhu X, Jing XY, Yang L, You X, Chen D, Gao G, Wang Y (2017) Semi-supervised cross-view projection-based dictionary learning for video-based person re-identification. IEEE Trans Circuits Syst Video Technol PP(99):1–1. https://doi.org/10.1109/TCSVT.2017.2718036
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the National Natural Science Foundation of China Grant 61876056 and Grant 61771180.
Rights and permissions
About this article
Cite this article
Wu, J., Jiang, J., Qi, M. et al. Independent metric learning with aligned multi-part features for video-based person re-identification. Multimed Tools Appl 78, 29323–29341 (2019). https://doi.org/10.1007/s11042-018-7119-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7119-6