View-specific subspace learning and re-ranking for semi-supervised person re-identification
Introduction
Person re-identification (re-ID), which aims at matching people across multiple non-overlapping camera networks, has become a hot research topic in the pattern recognition community [1]. It manages to re-identify a target in one camera when he/she disappears from another. This is quite challenging since human appearance often exhibits severe variations across different camera views due to arbitrary changes in viewpoint, illumination, pose and occlusion, etc.
Despite the numerous recent efforts, some issues remain unsolved in person re-ID, one of which is that most existing methods are based on supervised learning and require extensive labeled image pairs for training, i.e. annotation reliance. However, manual label annotation is very labor-expensive and time-consuming, sometimes even not reliable since a large number of images need to be inspected across multiple camera views [2]. Moreover, the reliance of labels severely limits the applicability and scalability in real-world applications, where a huge number of images are available but not labeled.
One intuitive solution is to carry out person re-ID utilizing only the unlabeled data (i.e. unsupervised), which are way more abundant than label information. Several works have been done [3], [4], [5], but the matching performances are still much weaker than the supervised counterparts. The main reason is that unsupervised methods are unable to benefit from labeled cross-view discriminative information, which is fundamental in matching the same identity while discriminating from imposters [6]. In light of this, a semi-supervised framework was proposed in [7]. Nonetheless, the work in [7] treats samples from different views in the same manner and learns a unitary transformation for images across various camera views. As a result, the view-specific interference due to similar indoor/outdoor environment within each camera view is neglected, such as specific illumination conditions in one camera and occlusions in another [2].
To this end, we put forward a subspace learning approach that can learn specific projections for each camera view, also under semi-supervised setting but taking view-specific bias into consideration. First, we learn an initial projection for each view with the limited manually labeled matching pairs. Second, the unlabeled data are mapped into a subspace with the initial projection, in which pseudo cross-view correspondence relationships can be constructed. Next, new projections are learned by integrating the original method with a graph Laplacian regularization term, which is encoded with the pseudo cross-view relationships. Besides, the pseudo-classes are refined with the updated projections and augmented with the labeled images to create new training sets. This procedure runs iteratively till the pseudo cross-view correspondence relationships stop changing. The flowchart of the proposed View-Specific Semi-supervised Subspace Learning (VS-SSL) algorithm is illustrated in Fig. 1.
Moreover, after computing the initial distances with the proposed VS-SSL algorithm, a re-ranking step can be adopted to further enhance the performance. Re-ranking is a common practice in person re-ID [8], [9], re-estimating the similarities between probe and galleries with the goal to place more relevant images at the top of the returned list. However, most existing re-ranking methods need to recompute the neighborhood lists for each query-gallery image pair, which is quite computationally demanding. To cope with this inefficiency, the second contribution of this work is that we put forward an efficient re-ranking strategy, with the underlying assumption that true matches and the query should not only have a multitude of mutual nearest neighbors, but also be ranked highly in each other’s ranking list.
To be specific, we first build the Expanded Cross Neighborhood (ECN) [9] for the query and gallery images, which constitutes of the immediate top neighbors (first-level) of any image and the neighbors (second-level) of each element in the first-level neighbors, see Fig. 2. Second, we compute the overlap ratio between the expanded neighbors of each query-gallery image pair, which is leveraged as the contextual similarity. The latent assumption is that the more similar neighbors two images share, the more likely they are of the same person. Next, we sort the elements in each gallery image’s ECN according to their similarity to the particular gallery image, and propose a novel reciprocal content similarity measure of a image pair according to their position in each other’s ranking list. The new similarity score after re-ranking is defined as the product of the contextual and reciprocal content similarity, and is further combined with the original similarity scores to increase robustness. The revised ranking list is obtained by sorting the final similarity scores in a descending manner.
Intrinsically, the proposed re-ranking strategy is independent with the VS-SSL method and can be readily applied to any ranking result. Here we integrate the two techniques to constitute a novel pipeline for person re-ID, which is proved effective with experiments on the challenging cross-view datasets VIPeR [10], PRID450S [11], PRID2011 [12], CUHK01 [13] and multi-view dataset Market-1501 [14]. The results prove that the VS-SSL and re-ranking approach both contribute to the overall performance gain and can complement each other very well. To summarize, the main contributions of this paper are:
- •
A semi-supervised VS-SSL approach for person re-ID is put forward, which can effectively exploit the limited labeled data and the abundant unlabeled images to find specific projections for each camera view, thus alleviating the view-specific biases and sparing the requirement of exhaustive data annotation.
- •
A novel re-ranking strategy is proposed, which can find more prospective correct matches and eliminate the harm of false matches effectively, taking both contextual and reciprocal content similarity into consideration.
- •
A pragmatic framework for person re-ID is introduced and extensive experiments on widely-used datasets are conducted, demonstrating that the proposed method can effectively enhance the performance for semi-supervised person re-ID.
The rest of the paper is organized as follows: a brief review of related works is presented in Section 2. Details of the proposed method are introduced in Section 3. Experimental results are presented in Section 4. Finally, the conclusions are summarized in Section 5.
Section snippets
Semi-supervised person re-ID
Existing person re-ID methods either focus on constructing discriminative features for the pedestrian images or aim at learning distance metrics or subspaces with the goal that the intra-class distance is reduced and the inter-class distance is increased. This is achieved with equidistance constraints [15], impostor rejection [16], or semantic projection learning [17]. However, most of these methods require extensive manual annotation, which is the main bottleneck of supervised learning.
On the
Fully-supervised subspace learning
For the convenience of discussion, we first consider the case with two camera views under the fully-supervised setting. Suppose we are given n training instances observed from two camera views A and B, respectively denoted as and where is the corresponding identity label. The task of traditional subspace learning is to learn a projection ω such that the distance between x and y is computed as:
However, as stated before, it can be
Datasets
We evaluate the proposed approach on five challenging datasets, including four two-view datasets VIPeR [10], PRID450S [11], PRID2011 [12], CUHK01 [13] and one multi-view dataset Market-1501 [14]. The VIPeR dataset is one of the most commonly used datasets for person re-ID, which consists of 632 people captured outdoor with 2 images for each identity, one for each camera view. PRID450S contains 450 image pairs captured from two static surveillance cameras. PRID2011 dataset is more challenging in
Conclusion
We have proposed a semi-supervised framework which can effectively utilize both labeled and unlabeled training samples to address the annotation reliance problem in supervised person re-ID. Compared to existing methods, our work is one of the few attempts that focuses on semi-supervised learning for re-ID without resort to deep learning. It enables learning view-specific transformations with limited annotated data, thus alleviating the exhaustive need of manual annotation. The method has been
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by the National Key R&D Program of China (2019YFB2204200) and the National Natural Science Foundation of China (61972030 and 61772067).
Jieru Jia received the B.S. degree in Biological Engineering from Beijing Jiaotong University in 2012. She is currently a Ph.D. candidate of Signal and Information Processing in the Institute of Information Science, Beijing Jiaotong University. Her main research interests are in computer vision, pattern recognition, in particularly focusing on person reidentification.
References (35)
- et al.
Deep asymmetric video-based person re-identification
Pattern Recogn.
(2019) - et al.
Equidistance constrained metric learning for person re-identification
Pattern Recogn.
(2018) - et al.
Person re-identification by multiple instance metric learning with impostor rejection
Pattern Recogn.
(2017) - et al.
Cross-view semantic projection learning for person re-identification
Pattern Recogn.
(2018) - et al.
Improving classification with semi-supervised and fine-grained learning
Pattern Recogn.
(2019) - et al.
Semi-supervised person re-identification using multi-view clustering
Pattern Recogn.
(2019) - et al.
Cross-view asymmetric metric learning for unsupervised person re-identification
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
(2017) - et al.
Unsupervised person re-identification by soft multilabel learning
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2019) - et al.
Patch-based discriminative feature learning for unsupervised person re-identification
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2019) - et al.
Transferable joint attribute-identity deep learning for unsupervised person re-identification
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2018)
Dictionary Learning with Iterative Laplacian Regularisation for Unsupervised Person Re-identification
Proc. British Machine Vision Conference (BMVC)
Enhancing person re-identification in a self-Trained subspace
ACM T Multim Comput.
Re-ranking Person Re-identification with k-Reciprocal Encoding
CVPR
A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features
Lecture Notes in Computer Science
Mahalanobis Distance Learning for Person Re-identification
Re-Identification
Person re-identification by descriptive and discriminative classification
Lecture Notes in Computer Science
Cited by (26)
Multilinear subspace learning for Person Re-Identification based fusion of high order tensor features
2024, Engineering Applications of Artificial IntelligencePerson re-identification: A retrospective on domain specific open challenges and future trends
2023, Pattern RecognitionSimultaneous label inference and discriminant projection estimation through adaptive self-taught graphs
2023, Expert Systems with ApplicationsCitation Excerpt :Semi-supervised learning is an important branch of machine learning science, which has recently received much attention from researchers (Han et al., 2021; He et al., 2020; Yan et al., 2020; Zhang et al., 2019; Zhao et al., 2019). It attempts to learn a particular model by using labeled and unlabeled data (Jia et al., 2020; Jian & Jung, 2021; Kang et al., 2021; Liu et al., 2021; Yang et al., 2021; Zhu et al., 2021). Each work addressed a particular aspect and tried to propose a reliable and effective semi-supervised framework.
Semi-supervised multi-view graph convolutional networks with application to webpage classification
2022, Information SciencesCitation Excerpt :Joint consensus and diversity (JCD) [36] was presented for semi-supervised multi-view classification, which learns a common probability label matrix for multiple views to enhance the consensus, and learns the view-specific classifiers to ensure the diversity. Focusing on the person re-identification application, view-specific semi-supervised subspace learning (VS-SSL) [37] exploits limited labeled and abundant unlabeled images for learning specific projection for each view. Xie and Sun [38] developed two support vector machines (SVM)-based multi-view semi-supervised learning methods, i.e., general multi-view Laplacian least squares support vector machines and general multi-view Laplacian least squares twin support vector machines, for dealing with the multi-view classification problems.
Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles
2022, Pattern RecognitionCitation Excerpt :In the past few decades, people have made a lot of exploration. For example, researchers have investigated automatic anomaly detection algorithms using machine learning and statistical analysis methods [3,4]. However, these smart methods require a lot of expensive computing resources, though they are powerful.
Jieru Jia received the B.S. degree in Biological Engineering from Beijing Jiaotong University in 2012. She is currently a Ph.D. candidate of Signal and Information Processing in the Institute of Information Science, Beijing Jiaotong University. Her main research interests are in computer vision, pattern recognition, in particularly focusing on person reidentification.
Qiuqi Ruan received the B.S. and M.S. degree from Northern Jiaotong University, P.R. China in 1969 and 1981, respectively. He is currently a Professor and a Doctorate Supervisor at the Institute of Information Science, Beijing Jiaotong University.His main research interests include digital signal processing, computer vision, pattern recognition, and virtual reality.
Yi Jin received the Ph.D. degree in Signal and Information Processing from the Institute of Information Science, Beijing Jiaotong University in 2010. She is currently an Associate Professor in the School of Computer Science and Information Technology, Beijing Jiaotong University. Her research interests include computer vision, pattern recognition and machine learning.
Gaoyun An received the B.S. degree in Biological Engineering and Ph.D. degree in Signal and Information Processing from Beijing Jiaotong University in 2003 and 2008 respectively. Currently, he is an associate professor in Institute of Information Science, Beijing Jiaotong University. His main research interests include image processing, computer vision and pattern recognition.
Shiming Ge received B.S. and Ph.D. degree from the University of Science and Technology of China (USTC) in Hefei. Currently, he is an Associate Professor and a Doctoral Supervisor at Institute of Information Engineering, Chinese Academy of Sciences. His homepage is http://www.escience.cn/people/geshiming.