View-specific subspace learning and re-ranking for semi-supervised person re-identification

doi:10.1016/j.patcog.2020.107568

Pattern Recognition

Volume 108, December 2020, 107568

https://doi.org/10.1016/j.patcog.2020.107568 Get rights and content

Highlights

•
A pragmatic and effective framework for semi-supervised person re-ID is introduced.
•
A view-specific subspace learning method is proposed to tackle view-specific biases.
•
An effective re-ranking strategy with expanded cross neighborhood is proposed.
•
A novel reciprocal content similarity and contextual similarity is put forward.

Abstract

Person re-identification (re-ID) focuses on matching the same person across non-overlapping camera views. Most existing methods require tedious manual annotation and can only learn a unitary transformation for images across views, which severely lack of scalability and suffer from view-specific biases. To address these issues, we put forward a View-Specific Semi-supervised Subspace Learning (VS-SSL) approach that can learn specific projections for each view, utilizing limited labeled data to guide the training while leveraging abundant unlabeled data simultaneously. Moreover, a novel re-ranking strategy is proposed to boost the performance further, which re-estimates the similarity between probe and galleries according to the overlap ratio between their expanded neighbors and their position in each other’s ranking list. The effectiveness of the proposed framework is evaluated on several widely-used datasets (VIPeR, PRID450S, PRID2011, CUHK01 and Market-1501), yielding superior performance for both semi-supervised and supervised re-ID.

Introduction

Person re-identification (re-ID), which aims at matching people across multiple non-overlapping camera networks, has become a hot research topic in the pattern recognition community [1]. It manages to re-identify a target in one camera when he/she disappears from another. This is quite challenging since human appearance often exhibits severe variations across different camera views due to arbitrary changes in viewpoint, illumination, pose and occlusion, etc.

Despite the numerous recent efforts, some issues remain unsolved in person re-ID, one of which is that most existing methods are based on supervised learning and require extensive labeled image pairs for training, i.e. annotation reliance. However, manual label annotation is very labor-expensive and time-consuming, sometimes even not reliable since a large number of images need to be inspected across multiple camera views [2]. Moreover, the reliance of labels severely limits the applicability and scalability in real-world applications, where a huge number of images are available but not labeled.

One intuitive solution is to carry out person re-ID utilizing only the unlabeled data (i.e. unsupervised), which are way more abundant than label information. Several works have been done [3], [4], [5], but the matching performances are still much weaker than the supervised counterparts. The main reason is that unsupervised methods are unable to benefit from labeled cross-view discriminative information, which is fundamental in matching the same identity while discriminating from imposters [6]. In light of this, a semi-supervised framework was proposed in [7]. Nonetheless, the work in [7] treats samples from different views in the same manner and learns a unitary transformation for images across various camera views. As a result, the view-specific interference due to similar indoor/outdoor environment within each camera view is neglected, such as specific illumination conditions in one camera and occlusions in another [2].

To this end, we put forward a subspace learning approach that can learn specific projections for each camera view, also under semi-supervised setting but taking view-specific bias into consideration. First, we learn an initial projection for each view with the limited manually labeled matching pairs. Second, the unlabeled data are mapped into a subspace with the initial projection, in which pseudo cross-view correspondence relationships can be constructed. Next, new projections are learned by integrating the original method with a graph Laplacian regularization term, which is encoded with the pseudo cross-view relationships. Besides, the pseudo-classes are refined with the updated projections and augmented with the labeled images to create new training sets. This procedure runs iteratively till the pseudo cross-view correspondence relationships stop changing. The flowchart of the proposed View-Specific Semi-supervised Subspace Learning (VS-SSL) algorithm is illustrated in Fig. 1.

Moreover, after computing the initial distances with the proposed VS-SSL algorithm, a re-ranking step can be adopted to further enhance the performance. Re-ranking is a common practice in person re-ID [8], [9], re-estimating the similarities between probe and galleries with the goal to place more relevant images at the top of the returned list. However, most existing re-ranking methods need to recompute the neighborhood lists for each query-gallery image pair, which is quite computationally demanding. To cope with this inefficiency, the second contribution of this work is that we put forward an efficient re-ranking strategy, with the underlying assumption that true matches and the query should not only have a multitude of mutual nearest neighbors, but also be ranked highly in each other’s ranking list.

To be specific, we first build the Expanded Cross Neighborhood (ECN) [9] for the query and gallery images, which constitutes of the immediate top neighbors (first-level) of any image and the neighbors (second-level) of each element in the first-level neighbors, see Fig. 2. Second, we compute the overlap ratio between the expanded neighbors of each query-gallery image pair, which is leveraged as the contextual similarity. The latent assumption is that the more similar neighbors two images share, the more likely they are of the same person. Next, we sort the elements in each gallery image’s ECN according to their similarity to the particular gallery image, and propose a novel reciprocal content similarity measure of a image pair according to their position in each other’s ranking list. The new similarity score after re-ranking is defined as the product of the contextual and reciprocal content similarity, and is further combined with the original similarity scores to increase robustness. The revised ranking list is obtained by sorting the final similarity scores in a descending manner.

Intrinsically, the proposed re-ranking strategy is independent with the VS-SSL method and can be readily applied to any ranking result. Here we integrate the two techniques to constitute a novel pipeline for person re-ID, which is proved effective with experiments on the challenging cross-view datasets VIPeR [10], PRID450S [11], PRID2011 [12], CUHK01 [13] and multi-view dataset Market-1501 [14]. The results prove that the VS-SSL and re-ranking approach both contribute to the overall performance gain and can complement each other very well. To summarize, the main contributions of this paper are:

•
A semi-supervised VS-SSL approach for person re-ID is put forward, which can effectively exploit the limited labeled data and the abundant unlabeled images to find specific projections for each camera view, thus alleviating the view-specific biases and sparing the requirement of exhaustive data annotation.
•
A novel re-ranking strategy is proposed, which can find more prospective correct matches and eliminate the harm of false matches effectively, taking both contextual and reciprocal content similarity into consideration.
•
A pragmatic framework for person re-ID is introduced and extensive experiments on widely-used datasets are conducted, demonstrating that the proposed method can effectively enhance the performance for semi-supervised person re-ID.

The rest of the paper is organized as follows: a brief review of related works is presented in Section 2. Details of the proposed method are introduced in Section 3. Experimental results are presented in Section 4. Finally, the conclusions are summarized in Section 5.

Section snippets

Semi-supervised person re-ID

Existing person re-ID methods either focus on constructing discriminative features for the pedestrian images or aim at learning distance metrics or subspaces with the goal that the intra-class distance is reduced and the inter-class distance is increased. This is achieved with equidistance constraints [15], impostor rejection [16], or semantic projection learning [17]. However, most of these methods require extensive manual annotation, which is the main bottleneck of supervised learning.

On the

Fully-supervised subspace learning

For the convenience of discussion, we first consider the case with two camera views under the fully-supervised setting. Suppose we are given n training instances observed from two camera views A and B, respectively denoted as $X = {x_{i}, l_{i}}_{i = 1}^{n}$ and $Y = {y_{i}, l_{i}^{'}}_{i = 1}^{n},$ where $l_{i}, l_{i}^{'}, i = 1, 2 \dots n$ is the corresponding identity label. The task of traditional subspace learning is to learn a projection ω such that the distance between x and y is computed as: $d (x, y) = {∥ ω^{T} x - ω^{T} y ∥}_{2} .$

However, as stated before, it can be

Datasets

We evaluate the proposed approach on five challenging datasets, including four two-view datasets VIPeR [10], PRID450S [11], PRID2011 [12], CUHK01 [13] and one multi-view dataset Market-1501 [14]. The VIPeR dataset is one of the most commonly used datasets for person re-ID, which consists of 632 people captured outdoor with 2 images for each identity, one for each camera view. PRID450S contains 450 image pairs captured from two static surveillance cameras. PRID2011 dataset is more challenging in

Conclusion

We have proposed a semi-supervised framework which can effectively utilize both labeled and unlabeled training samples to address the annotation reliance problem in supervised person re-ID. Compared to existing methods, our work is one of the few attempts that focuses on semi-supervised learning for re-ID without resort to deep learning. It enables learning view-specific transformations with limited annotated data, thus alleviating the exhaustive need of manual annotation. The method has been

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the National Key R&D Program of China (2019YFB2204200) and the National Natural Science Foundation of China (61972030 and 61772067).

Jieru Jia received the B.S. degree in Biological Engineering from Beijing Jiaotong University in 2012. She is currently a Ph.D. candidate of Signal and Information Processing in the Institute of Information Science, Beijing Jiaotong University. Her main research interests are in computer vision, pattern recognition, in particularly focusing on person reidentification.

References (35)

J. Meng et al.
Deep asymmetric video-based person re-identification
Pattern Recogn.
(2019)
J. Wang et al.
Equidistance constrained metric learning for person re-identification
Pattern Recogn.
(2018)
X. Liu et al.
Person re-identification by multiple instance metric learning with impostor rejection
Pattern Recogn.
(2017)
J. Dai et al.
Cross-view semantic projection learning for person re-identification
Pattern Recogn.
(2018)
D. Lai et al.
Improving classification with semi-supervised and fine-grained learning
Pattern Recogn.
(2019)
X. Xin et al.
Semi-supervised person re-identification using multi-view clustering
Pattern Recogn.
(2019)
H. Yu et al.
Cross-view asymmetric metric learning for unsupervised person re-identification
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
(2017)
H.-X. Yu et al.
Unsupervised person re-identification by soft multilabel learning
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2019)
Q. Yang et al.
Patch-based discriminative feature learning for unsupervised person re-identification
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2019)
J. Wang et al.
Transferable joint attribute-identity deep learning for unsupervised person re-identification
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2018)

E. Kodirov et al.

Dictionary Learning with Iterative Laplacian Regularisation for Unsupervised Person Re-identification

Proc. British Machine Vision Conference (BMVC)

(2015)

X. Yang et al.

Enhancing person re-identification in a self-Trained subspace

ACM T Multim Comput.

(2017)

Z. Zhong et al.

Re-ranking Person Re-identification with k-Reciprocal Encoding

CVPR

(2017)

M. Saquib Sarfraz et al.

A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

(2018)

D. Gray et al.

Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features

Lecture Notes in Computer Science

(2008)

P.M. Roth et al.

Mahalanobis Distance Learning for Person Re-identification

Re-Identification

(2014)

M. Hirzer et al.

Person re-identification by descriptive and discriminative classification

Lecture Notes in Computer Science

(2011)

Cited by (26)

Multilinear subspace learning for Person Re-Identification based fusion of high order tensor features
2024, Engineering Applications of Artificial Intelligence
Video surveillance image analysis and processing is an important field in computer vision and among the challenging tasks for Person Re-Identification (PRe-ID). The latter aims at finding a target person who has already been identified and appeared on a camera network using a powerful description of their pedestrian images. The success of recent research on person PRe-ID is largely backed into the effective features extraction and representation with a powerful learning of these features to correctly discriminate pedestrian images. To this end, two powerful features, Convolutional Neural Network (CNN) and Local Maximal Occurrence (LOMO) are modeled on a multidimensional data in the proposed method, High-Dimensional Feature Fusion (HDFF). Specifically, a new tensor fusion scheme is introduced to take advantage and combine two types of features in the same tensor data even if its dimensions are not the same. To improve the accuracy, we use Tensor Cross-View Quadratic Analysis (TXQDA) to perform multilinear subspace learning followed by the Cosine similarity for matching. TXQDA efficiently ensures the learning ability and reduces the high dimensionality resulting from high-order tensor data. The effectiveness of the proposed method is verified through experiments on three challenging widely-used PRe-ID datasets namely, VIPeR, GRID, and PRID450S. Extensive experiments show that the proposed method performs very well when compared with recent state-of-the-art methods.
Person re-identification: A retrospective on domain specific open challenges and future trends
2023, Pattern Recognition
Person Re-Identification (Re-ID) is a critical aspect of visual surveillance systems, which aims to automatically recognize and locate individuals across a multi-camera network with non-overlapping fields-of-view. Despite significant progress in recent years through the use of deep learning-based approaches, there remain many vision-related challenges, such as occlusion, pose, background clutter, misalignment, scale, viewpoint, low resolution & illumination, and cross-domain generalization across camera modalities, that hinder the accurate identification of individuals. The majority of the proposed approaches directly or indirectly aim to solve one or multiple of these existing challenges. To further advance the development of Re-ID solutions, a comprehensive review of the current approaches is necessary. However, no focused review currently exists that analyses and highlights specific aspects for further development. To fill this gap, we present a systematic challenge-specific literature survey of about 300 papers published between 2015 and 2022, which reviews Re-ID approaches from a solution-oriented perspective. This survey is the first of its kind to provide an in-depth analysis of the different approaches used to address the various challenges in Re-ID. Furthermore, our review highlights several prominent and diverse research trends in the Re-ID domain. These trends offer a visionary perspective regarding ongoing person Re-ID research, and they may eventually lead to the development of practical real-world solutions. We highlighted the AI ethics that must be followed while developing a Re-ID solution, and recently being practiced as well. Another exciting future dimension of person Re-ID research is the long-term Re-ID, which is still under evolution. Overall, our survey aims to serve as a valuable resource for researchers and practitioners working in the field of Re-ID and to inspire the development of innovative and effective Re-ID solutions.
GW-net: An efficient grad-CAM consistency neural network with weakening of random erasing features for semi-supervised person re-identification
2023, Image and Vision Computing
Person re-identification (Re-ID) subject to loss of detailed information (sacrificing certain intricate body details) in samples caused by conventional regularization algorithms and degraded cross-domain performance due to limited generalization capacity is a challenging task. The majority of the existing research efforts address this problem either for attention regularization or cross-domain task, but neglect to explore a powerful framework to consider solving both cases simultaneously. To overcome this limitation, this paper develops an efficient semi-supervised person Re-ID network with Grad-CAM consistency regularization and weakening of random erasing features (GW-Net) to explore rich features and expect non-degraded performance in cross-domain condition. Specifically, a Grad-CAM consistency regularization (GCCR) module is designed to capture detailed information by using Grad-CAM for consistency regularization, thereby enhancing the capacity of intra-class feature mining. Secondly, a weakening of random erasing features (WREF) module is presented to reduce the impact of erased regions on texture features, thereby maintaining the performance on the general source domain while preventing performance degradation in the target domain. Thirdly, a Grad-CAM consistency regularization loss is introduced to enable our model to maintain consistency between Grad-CAM results of the unlabeled and its augmented images. Meanwhile, a feature consistency loss is reported to keep consistency between features of unlabeled samples. Finally, sufficient experiments are carried out on three representative datasets, which validate the efficacy and meliority of our presented approach over state-of-the-art methods.
Simultaneous label inference and discriminant projection estimation through adaptive self-taught graphs
2023, Expert Systems with Applications
Citation Excerpt :
Semi-supervised learning is an important branch of machine learning science, which has recently received much attention from researchers (Han et al., 2021; He et al., 2020; Yan et al., 2020; Zhang et al., 2019; Zhao et al., 2019). It attempts to learn a particular model by using labeled and unlabeled data (Jia et al., 2020; Jian & Jung, 2021; Kang et al., 2021; Liu et al., 2021; Yang et al., 2021; Zhu et al., 2021). Each work addressed a particular aspect and tried to propose a reliable and effective semi-supervised framework.
Processing structured data has become an interesting topic in recent years. The development of graph-based semi-supervised learning models has attracted much attention from machine learning researchers. In this paper, we present a novel approach for graph-based semi-supervised learning. We provide an effective method for simultaneous label recovery and linear transformation estimation. The targeted linear transformation is to obtain a discriminant subspace. The most important factor in this work to improve the semi-supervised learning is to exploit the data structure and soft labels of the available unlabeled samples. In the iterative optimization scheme used, the prior estimation of the labels increases the monitoring information in an indirect way through an introduced label-graph, avoiding the use of confidence-based hard decisions as used in self-supervised methods. It also enforces label smoothing and projected data smoothing through the use of hybrid graphs. For each smoothing type, the hybrid graph is an adaptive fusion of the two graphs encoding the similarity of the data information and the similarity of the label information. The proposed method leads to an improved discriminant linear transformation. Several experimental results on real image datasets confirm the effectiveness of the proposed method. This work also shows superior performance compared to semi-supervised methods that use simultaneous embedding and inference of labels.
Semi-supervised multi-view graph convolutional networks with application to webpage classification
2022, Information Sciences
Citation Excerpt :
Joint consensus and diversity (JCD) [36] was presented for semi-supervised multi-view classification, which learns a common probability label matrix for multiple views to enhance the consensus, and learns the view-specific classifiers to ensure the diversity. Focusing on the person re-identification application, view-specific semi-supervised subspace learning (VS-SSL) [37] exploits limited labeled and abundant unlabeled images for learning specific projection for each view. Xie and Sun [38] developed two support vector machines (SVM)-based multi-view semi-supervised learning methods, i.e., general multi-view Laplacian least squares support vector machines and general multi-view Laplacian least squares twin support vector machines, for dealing with the multi-view classification problems.
Semi-supervised multi-view learning (SML) is a hot research topic in recent years, with webpage classification being a typical application domain. The performance of SML is further boosted by the successful introduction of graph convolutional network (GCN) for learning discriminant node representations. However, there remains much space to improve the GCN-based SML technique, particularly on how to adaptively learn optimal graph structures for multi-view graph convolutional representation learning and make full use of the label and structure information in labeled and unlabeled multi-view samples. In this paper, we propose a novel SML approach named semi-supervised multi-view graph convolutional networks (SMGCN) for webpage classification. It contains a multi-view graph construction module and a semi-supervised multi-view graph convolutional representation learning module, which are integrated into a unified network architecture. The former aims to obtain optimal graph structure for each view. And the latter performs graph convolutional representation learning for each view, and provides an inter-view attention scheme to fuse multi-view representations. Network training is guided by the losses defined on both label and feature spaces, such that the label and structure information in labeled and unlabeled data is fully explored. Experiments on two widely used webpage datasets demonstrate that SMGCN can achieve state-of-the-art classification performance.
Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles
2022, Pattern Recognition
Citation Excerpt :
In the past few decades, people have made a lot of exploration. For example, researchers have investigated automatic anomaly detection algorithms using machine learning and statistical analysis methods [3,4]. However, these smart methods require a lot of expensive computing resources, though they are powerful.
In the Internet of Things enabled intelligent transportation systems, a huge amount of vehicle video data has been generated and real-time and accurate video analysis are very important and challenging work, especially in situations with complex street scenes. Therefore, we propose edge computing based video pre-processing to eliminate the redundant frames, so that we migrate the partial or all the video processing task to the edge, thereby diminishing the computing, storage and network bandwidth requirements of the cloud center, and enhancing the effectiveness of video analyzes. To eliminate the redundancy of the traffic video, the magnitude of motion detection based on spatio-temporal interest points (STIP) and the multi-modal linear features combination are presented which splits a video into super frame segments of interests. After that, we select the key frames from these interesting segments of the long videos with the design and detection of the prominent region. Finally, the extensive numerical experimental verification results show our methods are superior to the previous algorithms for different stages of the redundancy elimination, video segmentation, key frame selection and vehicle detection.

View all citing articles on Scopus

Qiuqi Ruan received the B.S. and M.S. degree from Northern Jiaotong University, P.R. China in 1969 and 1981, respectively. He is currently a Professor and a Doctorate Supervisor at the Institute of Information Science, Beijing Jiaotong University.His main research interests include digital signal processing, computer vision, pattern recognition, and virtual reality.

Yi Jin received the Ph.D. degree in Signal and Information Processing from the Institute of Information Science, Beijing Jiaotong University in 2010. She is currently an Associate Professor in the School of Computer Science and Information Technology, Beijing Jiaotong University. Her research interests include computer vision, pattern recognition and machine learning.

Gaoyun An received the B.S. degree in Biological Engineering and Ph.D. degree in Signal and Information Processing from Beijing Jiaotong University in 2003 and 2008 respectively. Currently, he is an associate professor in Institute of Information Science, Beijing Jiaotong University. His main research interests include image processing, computer vision and pattern recognition.

Shiming Ge received B.S. and Ph.D. degree from the University of Science and Technology of China (USTC) in Hefei. Currently, he is an Associate Professor and a Doctoral Supervisor at Institute of Information Engineering, Chinese Academy of Sciences. His homepage is http://www.escience.cn/people/geshiming.

View full text

View-specific subspace learning and re-ranking for semi-supervised person re-identification

Highlights

Abstract

Introduction

Section snippets

Semi-supervised person re-ID

Fully-supervised subspace learning

Datasets

Conclusion

Declaration of Competing Interest

Acknowledgment

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Pattern Recogn.

Cross-view asymmetric metric learning for unsupervised person re-identification

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

Unsupervised person re-identification by soft multilabel learning

Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Patch-based discriminative feature learning for unsupervised person re-identification

Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Transferable joint attribute-identity deep learning for unsupervised person re-identification

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Dictionary Learning with Iterative Laplacian Regularisation for Unsupervised Person Re-identification

Proc. British Machine Vision Conference (BMVC)

Enhancing person re-identification in a self-Trained subspace

ACM T Multim Comput.

Re-ranking Person Re-identification with k-Reciprocal Encoding

CVPR

A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features

Lecture Notes in Computer Science

Mahalanobis Distance Learning for Person Re-identification

Re-Identification

Person re-identification by descriptive and discriminative classification

Lecture Notes in Computer Science