Multiple metric learning with query adaptive weights and multi-task re-weighting for person re-identification

doi:10.1016/j.cviu.2017.04.003

Computer Vision and Image Understanding

Volume 160, July 2017, Pages 87-99

https://doi.org/10.1016/j.cviu.2017.04.003 Get rights and content

Highlights

•
A metric ensemble model as weighted sum of multiple sub-metrics is proposed.
•
A two-step weighting procedure is proposed, both separately and collaboratively.
•
A novel algorithm for calculating query adaptive weights is proposed.
•
The components are tailored to specific feature type in multi-task structural SVM.

Abstract

Metric learning has been widely studied in person re-identification (re-id). However, most existing metric learning methods only learn one holistic Mahalanobis distance metric for the concatenated high dimensional feature. This single metric learning strategy cannot handle complex nonlinear data structure and may easily encounter overfitting. Besides, feature concatenation is incapable of exploring the discriminant capability of different features and low dimensional features tend to be dominated by high dimensional ones. Motivated by these problems, we propose a multiple metric learning method for the re-id problem, where individual sub-metrics are separately learned for each feature type and the final metric is formed as weighted sum of the sub-metrics. The sub-metrics are learned with the Cross-view Quadratic Discriminant Analysis (XQDA) algorithm and the weights to each sub-metric are assigned in a two-step procedure. First, the importance of each feature type is estimated according to its discriminative power, which is measured in a query adaptive manner as related to the partial Area Under Curve (pAUC) scores. Then, the weights of all feature types are learned simultaneously with a maximum-margin based multi-task structural SVM learning framework, in order to make sure that relevant gallery images are ranked before irrelevant ones within all feature spaces. Finally, the sub-metrics are integrated with the learned weights in an ensemble model, generating a sophisticated distance metric. Experiments on the challenging i-LIDS, VIPeR, CAVIAR and 3DPeS datasets demonstrate the effectiveness of the proposed method.

Introduction

Person re-identification (re-id), which aims to re-identify a target person in one camera when he/she disappears from another, has attracted huge interest over the recent decades. It is a special case of image retrieval (Zheng, Wang, Liu, Tian, 2014, Zheng, Wang, Tian, He, Liu, Tian, 2015b) in video surveillance and undergoes severe challenges like significant variations in viewpoints, poses or illumination, and occlusions.

Feature extraction and metric learning are two key components in person re-id. Numerous features have been proposed for the problem, e.g. color histograms (CH) (Zheng, Yang, Hauptmann, Zheng, Gong, Xiang, 2013), color names (CN) (Liu, Wang, Wu, Yang, Yang, 2015, Zheng, Wang, Tian, He, Liu, Tian, 2015b), textures (Lisanti, Masi, Bagdanov, Bimbo, 2015, Zheng, Gong, Xiang, 2013) and attributes (Layne et al., 2012). Specifically, often exploited color histograms are RGB, HSV, YUV, Lab (Zheng et al., 2013) , while commonly used texture features include Schmid, Gabor, LBP and HOG (Lisanti et al., 2015). Attributes (Layne et al., 2012) refer to semantic description of people such as hair-style, shoe-type or clothing-style.

For metric learning, a distance metric is learned from training samples such that the inter-class distance is maximized whilst the intra-class distance is minimized. Many metric learning methods have been proposed for person re-id, such as Relative Distance Comparison (RDC) (Zheng et al., 2013), Large Margin Nearest Neighbor (LMNN) (Dikmen et al., 2010), Keep It Simple and Straightforward (KISSME) (Roth et al., 2012) and Pairwise Constrained Component Analysis (PCCA) (Jurie and Mignon, 2012).

While these methods could achieve encouraging performance, they all only learn one unitary distance metric for all the heterogeneous features. The weaknesses are two folds: First, single metric learning is not robust against the complex nonlinear data structure in person re-id. Images of the same person may be far away from each other due to dramatic appearance variations, while images of different people may be very close to each other, e.g., two different people wearing similar color or pattern (Jia et al., 2016). Second, single metric learning encounters the bottleneck of Small Sample Size (SSS), i.e. the number of training samples is far less than the feature dimension. Typically only hundreds of training samples are available whilst the feature dimension is often in the order of thousands or higher. Consequently, traditional single metric learning methods have to resort to dimensionality reduction and/or matrix regularization, which may lead to sub-optimal solutions and loss of discriminative power (Zhang et al., 2016a).

Another issue with single metric learning is that it cannot deal with multiple feature representations directly. In person re-id, we usually have access to multiple heterogeneous features like color histograms, textures, and color names et al. for each image. Each feature has its unique characteristic and renders various performance, thus the fusion of features becomes a hard task. In most metric learning algorithms, different features are concatenated into a high dimensional vector and a corresponding distance metric is learned from the combined vector.

This feature-level fusion scheme has several disadvantages compared to score-level fusion (Eisenbach, Kolarow, Vorndran, Niebling, 2015, Zheng, Wang, Tian, He, Liu, Tian, 2015b) and decision-level fusion (Liu, Wang, Wu, Yang, Yang, 2015, de Prates, Schwartz, 2015) approaches. First, the divergent importance and discriminant capability of each individual feature is ignored. Each feature is treated equally with uniform weighting, regardless of its particular characteristic. Second, low dimensional features tend to be neglected when combined with high dimensional features (Liu et al., 2015), thus their discriminant power might be discarded. Third, the combination makes the feature dimension quite high, which easily leads to overfitting because pair or triplet-based constraints become much easier to satisfy in a high-dimensional feature space and thereby results in poor generalization performance.

In light of these problems, we propose a multiple metric learning method for the re-id problem, where a final metric is learned as weighted sum of a bunch of sub-metrics. The sub-metrics are separately learned for each feature type in contrast to learning a unitary metric for all the features, see Fig. 1. The sub-metrics can be learned with off-the-shelf metric learning methods since it’s not the focus of this work.

Now the problem arises: how to assign weights to the sub-metrics? We argue that different feature types should not be assigned with the same weight as in single metric learning. What’s more, the importance of each feature type is not constant across all the individuals. On the contrary, it should be measured according to the query in question. Certain appearance features can be more important than others in describing a specific query and distinguishing him/her from other people (Liu et al., 2012). For instance, if the query is wearing bright shirt and pants without any texture, color features are clearly more important and should be given more weight. However, if the query is wearing plaid shirt or the illumination change is too drastic to rely on color features, texture information becomes critical and should be given more weight. With this intuition, we propose a query adaptive weight learning strategy for each sub-metric.

Furthermore, though conducting metric learning separately, these different feature types are not totally dependent but still relevant. They are complementary and may share information from each other (Cui, Li, Xu, Shan, Chen, 2013, Hu, Lu, Yuan, Tan, 2015). Therefore we aim to learn a number of weights collaboratively which can guarantee correct ranking within all feature spaces. To this end, by modeling the ranking in each feature space as a separate task, we utilize a maximum-margin based multi-task learning framework to jointly learn the weights. The framework treats the ranking of various feature types as different but related tasks and enables information propagation among tasks. By utilizing relatedness among different tasks, the framework can guarantee that within all feature spaces, relevant images to the query are all ranked before irrelevant ones. The multi-task ranking model is superior to traditional ranking methods (Paisitkriangkrai, Shen, van den Anton, 2015, Wu, Mukunoki, Funatomi, Minoh, Lao, 2011), especially when the training sample size is small for each task (Su et al., 2015).

In this paper, we propose a multiple metric learning and two-step weighting procedure for the re-id problem. Multiple sub-metrics are learned separately and linearly combined to form the final metric. The sub-metrics are learned with techniques from XQDA (Liao et al., 2015) due to its prominent effectiveness and high computation efficiency. The algorithm learns a discriminant low dimensional subspace and derived metric at the same time. The weights of each sub-metric are assigned in a two-step procedure: First, a query adaptive weight is assigned to each sub-metric. The weight is estimated with pAUC scores, which have the ability to measure the discriminative abilities of divergent features, as stated in Zheng et al. (2015b), Eisenbach et al. (2015) and Zhao et al. (2014). Second, to reach a more sophisticated decision, re-weighting parameters that can effectively rank relevant gallery images before irrelevant ones are simultaneously learned with a multi-task structural SVM learning framework. The ranking within each feature space is modeled as a task and we model the task relatedness in a way that all tasks are close to their mean, following (Pontil, 2004). Finally, the weighted sub-metrics are integrated in an ensemble model. In this way, the discriminant information of each feature type is effectively exploited. The feature dimension of each metric learning task is reduced thus overfitting is alleviated. Our method has the advantage to perform well and fast, even if only few samples are available. Experiments on four challenging datasets i-LIDS (Zheng et al., 2009), VIPeR (Gray and Tao, 2008), CAVIAR (Dong et al., 2011), 3DPeS (Baltieri et al., 2011) demonstrate the effectiveness of the proposed method.

The main contributions of the proposed method are:

•
A metric ensemble model where final metric is represented as weighted sum of multiple sub-metrics is proposed, which is more discriminative and effective than holistic metric learning.
•
A two-step weighting strategy is proposed, which assigns weights to the sub-metrics both separately and collaboratively. Separately, the weight of each sub-metric is learned adaptively to the query. Collaboratively, the weights are set to guarantee proper ranking within all feature spaces.
•
A novel algorithm for calculating query adaptive weights is proposed, which considers the similarity label between query and gallery images in computing pAUC scores.
•
The structural SVM learning is extended to multi-task framework which tailors feature map and loss function to a specific feature type.

The rest of the paper is organized as follows: a brief view of related works is presented in Section 2. The metric learning algorithm, query adaptive weighting strategy and multi-task structural SVM learning are introduced in Section 3. Experimental results are presented in Section 4. Finally, the concluding remarks and suggestions for future work are discussed in Section 5.

Section snippets

Related work

A.Person re-identification

Existing person re-id methods can be roughly divided into two categories: feature based and metric learning based. Feature based approaches focus on designing a feature representation that is both distinctive and robust to large appearance variations. Different kinds of features have been proposed in previous work, for instance, histograms from various color and textures (Lisanti, Masi, Bagdanov, Bimbo, 2015, Liu, Wang, Wu, Yang, Yang, 2015, Zheng, Gong, Xiang, 2013),

Proposed method

In contrast to learning a universal metric, we aim to learn a set of preliminary sub-metrics, each computed on a specific feature type and then linearly combined to build a sophisticated metric using query adaptive weights and multi-task structural SVM weight learning. The basic flow of the proposed method is depicted in Fig. 2. More concretely, the final metric is constructed as follows: $\begin{matrix} M = \sum_{i = 1}^{T} w_{i} M_{i} \end{matrix}$ T is the number of feature types, $w_{i}, i = 1, 2 \dots T$ is the learned weights.

Experimental results

In this section, four most challenging datasets are adopted for evaluation, i.e. i-LIDS (Zheng et al., 2009), VIPeR (Gray and Tao, 2008), CAVIAR (Dong et al., 2011) and 3DPeS (Baltieri et al., 2011). These datasets possess different characteristics (outdoor/indoor, large/small variations in view angle, constant/varying image scale, presence/absence of occlusion) and give a faithful representation of real-word challenges for person re-id. Some samples are illustrated in Fig. 4.

Setting: We

Conclusion

In this paper, we present an efficient and effective method for person re-id. Instead of single metric learning, we propose to learn multiple sub-metrics for the problem, one sub-metric for each feature type. The sub-metrics are learned with Cross-view Quadratic Discriminant Analysis, which learns a discriminant low dimensional subspace and derived metric at the same time. The sub-metrics are linearly combined to form the final metric with weights learned by a two-step procedure. The first step

Acknowledgment

This work was supported partly by the National Natural Science Foundation of China (61471032, 61472030, 61403024), Program for Innovative Research Team in University of Ministry of Education of China (IRT-16R04), the Beijing Natural Science Foundation (No.4163075).

References (59)

S. Ding et al.
Deep feature learning with relative distance comparison for person re-identification
Pattern Recognit.
(2015)
J. Jia et al.
Geometric preserving local fisher discriminant analysis for person re-identification
Neurocomputing
(2016)
V. Vapnik et al.
A new learning paradigm: learning using privileged information
Neural Netw.
(2009)
T. Avraham et al.
Learning implicit transfer for person re-identification
Proceedings of the International Conference on Computer Vision
(2012)
D. Baltieri et al.
3dpes: 3d people dataset for surveillance and forensics
International ACM Workshop on Multimedia Access To 3d Human Objects
(2011)
D. Chen et al.
Similarity learning with spatial constraints for person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016)
S.Z. Chen et al.
Deep ranking for person re-identification via joint representation learning
IEEE Trans. Image Process.
(2015)
Y.-C. Chen et al.
Mirror representation for modeling view-specific transform in person re-identification
International Joint Conference on Artificial Intelligence
(2015)
Z. Cui et al.
Fusing robust face region descriptors via multiple metric learning for face recognition in the wild
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2013)
J.V. Davis et al.
Information-theoretic metric learning
Machine Learning, Proceedings of the Twenty-Fourth International Conference
(2007)

A.J. Ma et al.

Cross-domain person reidentification using domain adaptation ranking svms.

IEEE Trans. Image Process.

(2015)

Cited by (10)

Person re-identification: A retrospective on domain specific open challenges and future trends
2023, Pattern Recognition
Person Re-Identification (Re-ID) is a critical aspect of visual surveillance systems, which aims to automatically recognize and locate individuals across a multi-camera network with non-overlapping fields-of-view. Despite significant progress in recent years through the use of deep learning-based approaches, there remain many vision-related challenges, such as occlusion, pose, background clutter, misalignment, scale, viewpoint, low resolution & illumination, and cross-domain generalization across camera modalities, that hinder the accurate identification of individuals. The majority of the proposed approaches directly or indirectly aim to solve one or multiple of these existing challenges. To further advance the development of Re-ID solutions, a comprehensive review of the current approaches is necessary. However, no focused review currently exists that analyses and highlights specific aspects for further development. To fill this gap, we present a systematic challenge-specific literature survey of about 300 papers published between 2015 and 2022, which reviews Re-ID approaches from a solution-oriented perspective. This survey is the first of its kind to provide an in-depth analysis of the different approaches used to address the various challenges in Re-ID. Furthermore, our review highlights several prominent and diverse research trends in the Re-ID domain. These trends offer a visionary perspective regarding ongoing person Re-ID research, and they may eventually lead to the development of practical real-world solutions. We highlighted the AI ethics that must be followed while developing a Re-ID solution, and recently being practiced as well. Another exciting future dimension of person Re-ID research is the long-term Re-ID, which is still under evolution. Overall, our survey aims to serve as a valuable resource for researchers and practitioners working in the field of Re-ID and to inspire the development of innovative and effective Re-ID solutions.
Top distance regularized projection and dictionary learning for person re-identification
2019, Information Sciences
Citation Excerpt :
These procedures are repeated 10 times and the average value is reported as the final experimental result. To verify the performance of the proposed algorithm, we compared our method with the state-of-the-art methods PRDC (2011) [49], Metric Ensembles (2015) [30], DGD (2016) [40], TCP (2016) [13], MMLQAW (2017) [18], MMLBD (2017) [47], JDML (2017) [50], FSCML (2017) [6], and JDSML (2019) [7]. The experimental results of the different approaches are reported in Table 6.
Person re-identification (PRID) is integral to many smart surveillance systems. However, owing to the visual ambiguities arising from the variability in viewing angles and illumination, and the presence of occlusions, PRID continues to present many challenges, especially when only a single image per view is available for each person. To overcome this problem, we propose a top distance regularized projection and dictionary learning (DL) model for PRID. The model incorporates both projection and DL to form a unified optimization framework to enhance the effectiveness of both these types of learning. Thus, the dictionary and projection matrix are jointly learned within this framework. In particular, the learned projection maps the coding coefficient into a discriminative space and minimizes the distance between the same persons across non-overlapping views such that the dictionary and projection can be discriminated. Moreover, we exploit listwise distances to capture all pairwise similarities. Based on this design, we derive a top distance regularization term to refine the solution space of the DL model such that the discriminative ability of the learned projection matrix and dictionary are further improved. Experiments on different challenging datasets demonstrate the effectiveness of our method and its superiority over a few current state-of-the-art approaches.
Person re-identification with multiple similarity probabilities using deep metric learning for efficient smart security applications
2019, Journal of Parallel and Distributed Computing
Surveillance video analysis plays a vital role in the daily operations of smart cities, which increasingly relies on person re-identification technology to sustain smart security applications. However, research challenges of re-identification remain especially in terms of recognizing the different appearances of the same person in a harsh real-world environment: (1) the adaptability of the selected features to the dynamic environment cannot be guaranteed, and (2) existing methods rooted from metric learning aim to find a single metric function, and they lack the ability to measure the different appearances of the same person. To address these problems, this study proposes a multiple deep metric learning method empowered by the functionality of person similarity probability measurement. The proposed method exploits multiple stacked auto-encoder networks and classification networks to quantify pedestrians’ similarity relations. The stacked auto-encoder networks directly recognize persons from surveillance images at the pixel level. The classification networks are equipped with the Softmax regression models and produce multiple similarity probabilities to characterize different appearances belonging to the same person. An Adaboost-like model is designed to fuse the probabilities corresponding to multiple metrics, which ensures a high accuracy of recognition. Experimental results on two public datasets (VIPeR and CUHK-01) indicate that the proposed method outperforms existing algorithms by 2%–10% at rank 1. Based on the similarity probabilities learned by the proposed model, the algorithm for matching the person pair can achieve a time complexity as low as $O (n)$ , which can be deployed at a large scale on the distributed intelligent surveillance network, with each node maintaining limited computing capabilities.
Determination of Local and Global Decision Weights Based on Fuzzy Modeling
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Person Re-identification: A Retrospective on Domain Specific Open Challenges and Future Trends
2022, arXiv
Pedestrian Reidentification Algorithm Based on Deconvolution Network Feature Extraction-Multilayer Attention Mechanism Convolutional Neural Network
2021, Journal of Sensors

View all citing articles on Scopus

View full text

Multiple metric learning with query adaptive weights and multi-task re-weighting for person re-identification

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed method

Experimental results

Conclusion

Acknowledgment

Pattern Recognit.

Neurocomputing

Neural Netw.

Learning implicit transfer for person re-identification

Proceedings of the International Conference on Computer Vision

3dpes: 3d people dataset for surveillance and forensics

International ACM Workshop on Multimedia Access To 3d Human Objects

Similarity learning with spatial constraints for person re-identification

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep ranking for person re-identification via joint representation learning

IEEE Trans. Image Process.

Mirror representation for modeling view-specific transform in person re-identification

International Joint Conference on Artificial Intelligence

Fusing robust face region descriptors via multiple metric learning for face recognition in the wild

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Information-theoretic metric learning

Machine Learning, Proceedings of the Twenty-Fourth International Conference

Pedestrian recognition with a learned metric

Proceedings of the Asian Conference on Computer Vision

Custom pictorial structures for re-identification

British Machine Vision Conference

Evaluation of multi feature fusion at score-level for appearance-based person re-identification

International Joint Conference on Neural Networks (IJCNN),2015

Person re-identification by symmetry-driven accumulation of local features

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Generalized smo algorithm for svm-based multitask learning

IEEE Trans. Neural Networks Learn. Syst.

Metric learning by collapsing classes.

Adv. Neural Inf. Process. Syst.

Viewpoint invariant pedestrian recognition with an ensemble of localized features

European Conference on Computer Vision

Large margin multi-metric learning for face and kinship verification in the wild

Proceedings of the Asian Conference on Computer Vision

Pcca: A new approach for distance learning from sparse pairwise constraints

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Synergy-based learning of facial identity

Proc. DAGM Symposium

Person re-identification by attributes

British Machine Vision Conference

Learning locally-adaptive decision functions for person verification

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Connection between svm+ and multi-task learning

IEEE International Joint Conference on Neural Networks

Person re-identification by local maximal occurrence representation and metric learning

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Efficient psd constrained asymmetric metric learning for person re-identification

Proceedings of the IEEE International Conference on Computer Vision

Person re-identification by iterative re-weighted sparse ranking

IEEE Trans. Pattern Anal. Mach. Intell.

Person re-identification: what features are important?

European Conference on Computer Vision, International Workshop on Re-Identification

An ensemble color model for human re-identification

Applications of Computer Vision

Cross-domain person reidentification using domain adaptation ranking svms.

IEEE Trans. Image Process.