Multiple metric learning with query adaptive weights and multi-task re-weighting for person re-identification

https://doi.org/10.1016/j.cviu.2017.04.003Get rights and content

Highlights

  • A metric ensemble model as weighted sum of multiple sub-metrics is proposed.

  • A two-step weighting procedure is proposed, both separately and collaboratively.

  • A novel algorithm for calculating query adaptive weights is proposed.

  • The components are tailored to specific feature type in multi-task structural SVM.

Abstract

Metric learning has been widely studied in person re-identification (re-id). However, most existing metric learning methods only learn one holistic Mahalanobis distance metric for the concatenated high dimensional feature. This single metric learning strategy cannot handle complex nonlinear data structure and may easily encounter overfitting. Besides, feature concatenation is incapable of exploring the discriminant capability of different features and low dimensional features tend to be dominated by high dimensional ones. Motivated by these problems, we propose a multiple metric learning method for the re-id problem, where individual sub-metrics are separately learned for each feature type and the final metric is formed as weighted sum of the sub-metrics. The sub-metrics are learned with the Cross-view Quadratic Discriminant Analysis (XQDA) algorithm and the weights to each sub-metric are assigned in a two-step procedure. First, the importance of each feature type is estimated according to its discriminative power, which is measured in a query adaptive manner as related to the partial Area Under Curve (pAUC) scores. Then, the weights of all feature types are learned simultaneously with a maximum-margin based multi-task structural SVM learning framework, in order to make sure that relevant gallery images are ranked before irrelevant ones within all feature spaces. Finally, the sub-metrics are integrated with the learned weights in an ensemble model, generating a sophisticated distance metric. Experiments on the challenging i-LIDS, VIPeR, CAVIAR and 3DPeS datasets demonstrate the effectiveness of the proposed method.

Introduction

Person re-identification (re-id), which aims to re-identify a target person in one camera when he/she disappears from another, has attracted huge interest over the recent decades. It is a special case of image retrieval (Zheng, Wang, Liu, Tian, 2014, Zheng, Wang, Tian, He, Liu, Tian, 2015b) in video surveillance and undergoes severe challenges like significant variations in viewpoints, poses or illumination, and occlusions.

Feature extraction and metric learning are two key components in person re-id. Numerous features have been proposed for the problem, e.g. color histograms (CH) (Zheng, Yang, Hauptmann, Zheng, Gong, Xiang, 2013), color names (CN) (Liu, Wang, Wu, Yang, Yang, 2015, Zheng, Wang, Tian, He, Liu, Tian, 2015b), textures (Lisanti, Masi, Bagdanov, Bimbo, 2015, Zheng, Gong, Xiang, 2013) and attributes (Layne et al., 2012). Specifically, often exploited color histograms are RGB, HSV, YUV, Lab (Zheng et al., 2013) , while commonly used texture features include Schmid, Gabor, LBP and HOG (Lisanti et al., 2015). Attributes (Layne et al., 2012) refer to semantic description of people such as hair-style, shoe-type or clothing-style.

For metric learning, a distance metric is learned from training samples such that the inter-class distance is maximized whilst the intra-class distance is minimized. Many metric learning methods have been proposed for person re-id, such as Relative Distance Comparison (RDC) (Zheng et al., 2013), Large Margin Nearest Neighbor (LMNN) (Dikmen et al., 2010), Keep It Simple and Straightforward (KISSME) (Roth et al., 2012) and Pairwise Constrained Component Analysis (PCCA) (Jurie and Mignon, 2012).

While these methods could achieve encouraging performance, they all only learn one unitary distance metric for all the heterogeneous features. The weaknesses are two folds: First, single metric learning is not robust against the complex nonlinear data structure in person re-id. Images of the same person may be far away from each other due to dramatic appearance variations, while images of different people may be very close to each other, e.g., two different people wearing similar color or pattern (Jia et al., 2016). Second, single metric learning encounters the bottleneck of Small Sample Size (SSS), i.e. the number of training samples is far less than the feature dimension. Typically only hundreds of training samples are available whilst the feature dimension is often in the order of thousands or higher. Consequently, traditional single metric learning methods have to resort to dimensionality reduction and/or matrix regularization, which may lead to sub-optimal solutions and loss of discriminative power (Zhang et al., 2016a).

Another issue with single metric learning is that it cannot deal with multiple feature representations directly. In person re-id, we usually have access to multiple heterogeneous features like color histograms, textures, and color names et al. for each image. Each feature has its unique characteristic and renders various performance, thus the fusion of features becomes a hard task. In most metric learning algorithms, different features are concatenated into a high dimensional vector and a corresponding distance metric is learned from the combined vector.

This feature-level fusion scheme has several disadvantages compared to score-level fusion (Eisenbach, Kolarow, Vorndran, Niebling, 2015, Zheng, Wang, Tian, He, Liu, Tian, 2015b) and decision-level fusion (Liu, Wang, Wu, Yang, Yang, 2015, de Prates, Schwartz, 2015) approaches. First, the divergent importance and discriminant capability of each individual feature is ignored. Each feature is treated equally with uniform weighting, regardless of its particular characteristic. Second, low dimensional features tend to be neglected when combined with high dimensional features (Liu et al., 2015), thus their discriminant power might be discarded. Third, the combination makes the feature dimension quite high, which easily leads to overfitting because pair or triplet-based constraints become much easier to satisfy in a high-dimensional feature space and thereby results in poor generalization performance.

In light of these problems, we propose a multiple metric learning method for the re-id problem, where a final metric is learned as weighted sum of a bunch of sub-metrics. The sub-metrics are separately learned for each feature type in contrast to learning a unitary metric for all the features, see Fig. 1. The sub-metrics can be learned with off-the-shelf metric learning methods since it’s not the focus of this work.

Now the problem arises: how to assign weights to the sub-metrics? We argue that different feature types should not be assigned with the same weight as in single metric learning. What’s more, the importance of each feature type is not constant across all the individuals. On the contrary, it should be measured according to the query in question. Certain appearance features can be more important than others in describing a specific query and distinguishing him/her from other people (Liu et al., 2012). For instance, if the query is wearing bright shirt and pants without any texture, color features are clearly more important and should be given more weight. However, if the query is wearing plaid shirt or the illumination change is too drastic to rely on color features, texture information becomes critical and should be given more weight. With this intuition, we propose a query adaptive weight learning strategy for each sub-metric.

Furthermore, though conducting metric learning separately, these different feature types are not totally dependent but still relevant. They are complementary and may share information from each other (Cui, Li, Xu, Shan, Chen, 2013, Hu, Lu, Yuan, Tan, 2015). Therefore we aim to learn a number of weights collaboratively which can guarantee correct ranking within all feature spaces. To this end, by modeling the ranking in each feature space as a separate task, we utilize a maximum-margin based multi-task learning framework to jointly learn the weights. The framework treats the ranking of various feature types as different but related tasks and enables information propagation among tasks. By utilizing relatedness among different tasks, the framework can guarantee that within all feature spaces, relevant images to the query are all ranked before irrelevant ones. The multi-task ranking model is superior to traditional ranking methods (Paisitkriangkrai, Shen, van den Anton, 2015, Wu, Mukunoki, Funatomi, Minoh, Lao, 2011), especially when the training sample size is small for each task (Su et al., 2015).

In this paper, we propose a multiple metric learning and two-step weighting procedure for the re-id problem. Multiple sub-metrics are learned separately and linearly combined to form the final metric. The sub-metrics are learned with techniques from XQDA (Liao et al., 2015) due to its prominent effectiveness and high computation efficiency. The algorithm learns a discriminant low dimensional subspace and derived metric at the same time. The weights of each sub-metric are assigned in a two-step procedure: First, a query adaptive weight is assigned to each sub-metric. The weight is estimated with pAUC scores, which have the ability to measure the discriminative abilities of divergent features, as stated in Zheng et al. (2015b), Eisenbach et al. (2015) and Zhao et al. (2014). Second, to reach a more sophisticated decision, re-weighting parameters that can effectively rank relevant gallery images before irrelevant ones are simultaneously learned with a multi-task structural SVM learning framework. The ranking within each feature space is modeled as a task and we model the task relatedness in a way that all tasks are close to their mean, following (Pontil, 2004). Finally, the weighted sub-metrics are integrated in an ensemble model. In this way, the discriminant information of each feature type is effectively exploited. The feature dimension of each metric learning task is reduced thus overfitting is alleviated. Our method has the advantage to perform well and fast, even if only few samples are available. Experiments on four challenging datasets i-LIDS (Zheng et al., 2009), VIPeR (Gray and Tao, 2008), CAVIAR (Dong et al., 2011), 3DPeS (Baltieri et al., 2011) demonstrate the effectiveness of the proposed method.

The main contributions of the proposed method are:

  • A metric ensemble model where final metric is represented as weighted sum of multiple sub-metrics is proposed, which is more discriminative and effective than holistic metric learning.

  • A two-step weighting strategy is proposed, which assigns weights to the sub-metrics both separately and collaboratively. Separately, the weight of each sub-metric is learned adaptively to the query. Collaboratively, the weights are set to guarantee proper ranking within all feature spaces.

  • A novel algorithm for calculating query adaptive weights is proposed, which considers the similarity label between query and gallery images in computing pAUC scores.

  • The structural SVM learning is extended to multi-task framework which tailors feature map and loss function to a specific feature type.

The rest of the paper is organized as follows: a brief view of related works is presented in Section 2. The metric learning algorithm, query adaptive weighting strategy and multi-task structural SVM learning are introduced in Section 3. Experimental results are presented in Section 4. Finally, the concluding remarks and suggestions for future work are discussed in Section 5.

Section snippets

Related work

A.Person re-identification

Existing person re-id methods can be roughly divided into two categories: feature based and metric learning based. Feature based approaches focus on designing a feature representation that is both distinctive and robust to large appearance variations. Different kinds of features have been proposed in previous work, for instance, histograms from various color and textures (Lisanti, Masi, Bagdanov, Bimbo, 2015, Liu, Wang, Wu, Yang, Yang, 2015, Zheng, Gong, Xiang, 2013),

Proposed method

In contrast to learning a universal metric, we aim to learn a set of preliminary sub-metrics, each computed on a specific feature type and then linearly combined to build a sophisticated metric using query adaptive weights and multi-task structural SVM weight learning. The basic flow of the proposed method is depicted in Fig. 2. More concretely, the final metric is constructed as follows: M=i=1TwiMiT is the number of feature types, wi,i=1,2T is the learned weights.

Experimental results

In this section, four most challenging datasets are adopted for evaluation, i.e. i-LIDS (Zheng et al., 2009), VIPeR (Gray and Tao, 2008), CAVIAR (Dong et al., 2011) and 3DPeS (Baltieri et al., 2011). These datasets possess different characteristics (outdoor/indoor, large/small variations in view angle, constant/varying image scale, presence/absence of occlusion) and give a faithful representation of real-word challenges for person re-id. Some samples are illustrated in Fig. 4.

Setting: We

Conclusion

In this paper, we present an efficient and effective method for person re-id. Instead of single metric learning, we propose to learn multiple sub-metrics for the problem, one sub-metric for each feature type. The sub-metrics are learned with Cross-view Quadratic Discriminant Analysis, which learns a discriminant low dimensional subspace and derived metric at the same time. The sub-metrics are linearly combined to form the final metric with weights learned by a two-step procedure. The first step

Acknowledgment

This work was supported partly by the National Natural Science Foundation of China (61471032, 61472030, 61403024), Program for Innovative Research Team in University of Ministry of Education of China (IRT-16R04), the Beijing Natural Science Foundation (No.4163075).

References (59)

  • S. Ding et al.

    Deep feature learning with relative distance comparison for person re-identification

    Pattern Recognit.

    (2015)
  • J. Jia et al.

    Geometric preserving local fisher discriminant analysis for person re-identification

    Neurocomputing

    (2016)
  • V. Vapnik et al.

    A new learning paradigm: learning using privileged information

    Neural Netw.

    (2009)
  • T. Avraham et al.

    Learning implicit transfer for person re-identification

    Proceedings of the International Conference on Computer Vision

    (2012)
  • D. Baltieri et al.

    3dpes: 3d people dataset for surveillance and forensics

    International ACM Workshop on Multimedia Access To 3d Human Objects

    (2011)
  • D. Chen et al.

    Similarity learning with spatial constraints for person re-identification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • S.Z. Chen et al.

    Deep ranking for person re-identification via joint representation learning

    IEEE Trans. Image Process.

    (2015)
  • Y.-C. Chen et al.

    Mirror representation for modeling view-specific transform in person re-identification

    International Joint Conference on Artificial Intelligence

    (2015)
  • Z. Cui et al.

    Fusing robust face region descriptors via multiple metric learning for face recognition in the wild

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • J.V. Davis et al.

    Information-theoretic metric learning

    Machine Learning, Proceedings of the Twenty-Fourth International Conference

    (2007)
  • M. Dikmen et al.

    Pedestrian recognition with a learned metric

    Proceedings of the Asian Conference on Computer Vision

    (2010)
  • S.C. Dong et al.

    Custom pictorial structures for re-identification

    British Machine Vision Conference

    (2011)
  • M. Eisenbach et al.

    Evaluation of multi feature fusion at score-level for appearance-based person re-identification

    International Joint Conference on Neural Networks (IJCNN),2015

    (2015)
  • M. Farenzena et al.

    Person re-identification by symmetry-driven accumulation of local features

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2010)
  • C. Feng et al.

    Generalized smo algorithm for svm-based multitask learning

    IEEE Trans. Neural Networks Learn. Syst.

    (2012)
  • A. Globerson et al.

    Metric learning by collapsing classes.

    Adv. Neural Inf. Process. Syst.

    (2005)
  • D. Gray et al.

    Viewpoint invariant pedestrian recognition with an ensemble of localized features

    European Conference on Computer Vision

    (2008)
  • J. Hu et al.

    Large margin multi-metric learning for face and kinship verification in the wild

    Proceedings of the Asian Conference on Computer Vision

    (2015)
  • F. Jurie et al.

    Pcca: A new approach for distance learning from sparse pairwise constraints

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • M. Kostinger et al.

    Synergy-based learning of facial identity

    Proc. DAGM Symposium

    (2012)
  • R. Layne et al.

    Person re-identification by attributes

    British Machine Vision Conference

    (2012)
  • Z. Li et al.

    Learning locally-adaptive decision functions for person verification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • L. Liang et al.

    Connection between svm+ and multi-task learning

    IEEE International Joint Conference on Neural Networks

    (2008)
  • S. Liao et al.

    Person re-identification by local maximal occurrence representation and metric learning

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • S. Liao et al.

    Efficient psd constrained asymmetric metric learning for person re-identification

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • G. Lisanti et al.

    Person re-identification by iterative re-weighted sparse ranking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • C. Liu et al.

    Person re-identification: what features are important?

    European Conference on Computer Vision, International Workshop on Re-Identification

    (2012)
  • X. Liu et al.

    An ensemble color model for human re-identification

    Applications of Computer Vision

    (2015)
  • A.J. Ma et al.

    Cross-domain person reidentification using domain adaptation ranking svms.

    IEEE Trans. Image Process.

    (2015)
  • Cited by (10)

    • Top distance regularized projection and dictionary learning for person re-identification

      2019, Information Sciences
      Citation Excerpt :

      These procedures are repeated 10 times and the average value is reported as the final experimental result. To verify the performance of the proposed algorithm, we compared our method with the state-of-the-art methods PRDC (2011) [49], Metric Ensembles (2015) [30], DGD (2016) [40], TCP (2016) [13], MMLQAW (2017) [18], MMLBD (2017) [47], JDML (2017) [50], FSCML (2017) [6], and JDSML (2019) [7]. The experimental results of the different approaches are reported in Table 6.

    • Determination of Local and Global Decision Weights Based on Fuzzy Modeling

      2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text