Elsevier

Neurocomputing

Volume 151, Part 3, 3 March 2015, Pages 1283-1292
Neurocomputing

Set-label modeling and deep metric learning on person re-identification

https://doi.org/10.1016/j.neucom.2014.11.002Get rights and content

Abstract

Person re-identification aims at matching individuals across multiple non-overlapping adjacent cameras. By condensing multiple gallery images of a person as a whole, we propose a novel method named Set-Label Model (SLM) to improve the performance of person re-identification under the multi-shot setting. Moreover, we utilize mutual-information to measure the relevance between query image and gallery sets. To decrease the computational complexity, we apply a Naive–Bayes Nearest-Neighbor algorithm to approximate the mutual-information value. To overcome the limitations of traditional linear metric learning, we further develop a deep non-linear metric learning (DeepML) approach based on Neighborhood Component Analysis and Deep Belief Network. To evaluate the effectiveness of our proposed approaches, SLM and DeepML, we have carried out extensive experiments on two challenging datasets i-LIDS and ETHZ. The experimental results demonstrate that the proposed methods can obtain better performances compared with the state-of-the-art methods.

Introduction

In the recent years, the task of person re-identification (Re-Id) is becoming largely attractive in video surveillance. It aims to match people across multiple non-overlapping cameras, for example, identify people across multi-view cameras in the multi-camera network, or recognize the identical person who disappeared in one camera and appeared in another camera later. It also can be embedded in widespread applications such as tracking and target re-acquisition.

According to the experimental setting, the methods of Re-Id can be divided into two groups, single-shot and multi-shot. The former group selects only one image for each person, while the latter group describes multiple images as a signature for each person Id(class label). Re-Id is a challenging problem, since it suffers illumination changes, low-resolution, and view variations in multiple cameras. For recent best efforts from researchers, one kind of the Re-Id methods focuses on designing discriminative features [1], [2], [3], [4], [5], [6], [7]. By utilizing the supervised information, the other kind of methods aims at finding a global linear transformation to re-weight feature dimensions (e.g. learning a Mahalanobis distance) [8], [9], [10], [11].

In this paper, we first propose a Set-Label Model named SLM approach to improve the performance of person re-identification under the multi-shot setting. There are three steps for SLM. Firstly, we define a set-based structure for each class, which contains concatenated features between the query feature and the gallery features. In the following, we use SET to replace the set-based structure for simplicity. We utilize mutual-information to measure the relationship between features w.r.t their class label. Secondly, since the features distribution of conditional probability can be hardly assumed, we apply a Naive–Bayes Nearest-Neighbor algorithm (NBNN) [12] to approximate the mutual-information value instead of directly accessing the probability form. In addition, the NBNN algorithm can also provide a significant efficiency. Finally, the mutual-information values can be ranked in a descending order and the corresponding class label of the highest value is assigned to the query.

By utilizing the labeled data, we further develop a deep non-linear metric learning method named DeepML based on Neighborhood Component Analysis (NCA) [13] and Deep Belief Network (DBN) [14]. NCA aims to maximize the expected numbers of classified sample in training data via a data transformation. By NCA, an improvement can be performed on the algorithms, which are based on computing the distance of two features (such as k-nearest-neighbor classification). To extend the data transformation in NCA, we utilize DBN to learn a nonlinear feature transformation. NCA is placed on the top layer of RBM to adjust the weights of top layer. Then fine-tuning is carried out to adjust the weights of other layers. By this way, the discriminative power of features can be enhanced by the learned new metric (transformation).

There are two main contributions for this paper. (1) We model the relevance among multiple image features by mutual-information. Furthermore, we apply an approximate algorithm NBNN to value the mutual-information instead of directly accessing the probability form. As our knowledge, it is the first time that mutual-information theory is applied in the task of person re-identification. (2) By considering the labeled images, we develop a deep non-linear metric learning method to improve the discriminative power of our features. As most metric learning methods focus on learning a linear transformation, we apply deep learning architecture to provide a non-linear mapping from the origin features to new non-linear features.

We evaluate SLM and DeepML on two benchmark datasets i-LIDS and ETHZ, both of which have multiple images per person and are undergoing the changes of illumination, view angle, low-resolution and occlusion. The experimental results demonstrate that SLM can obtain 100 percent matching accuracy with simple color features (HSV) on ETHZ after rank 3, and DeepML can gain additional improvements by combining SLM with deep non-linear metric learning.

The remainder of this paper is organized as follows: related works are introduced in Section 2. The details of SLM and DeepML are described in Section 3. The experimental performance and results are presented in Section 4. Finally, we draw some conclusions and put forward future works in Section 5.

Section snippets

Related work

Recently, the task of person re-identification, aiming at matching the same individual across multiple disjoint cameras, has obtained increasing attention in video surveillance. To improve the performance of Re-Id, existing works mainly focus on two aspects, appearance feature extraction and distance metric learning.

The appearance based methods mainly rely on designing descriptive features such as low-dimensional discriminative features [1], viewpoint invariance features [2], [15], accumulation

The proposed method

In this section, we introduce our proposed person Re-Id method. Our method consists of two parts: SLM and DeepML. In Section 3.1, the feature modeling approach SLM is introduced. Specifically, we construct SET between query feature and gallery set, and utilize mutual-information to measure relationship between features and their labels. In Section 3.2, we provide a nearest neighborhood based algorithm [12] to estimate the mutual information value. In Section 3.3, to enhance the discriminative

Experiments

In this section, to show the effectiveness, we test the proposed method on two public datasets: i-LIDS and ETHZ. For evaluation, we use the standard measurement named Cumulative Match Characteristic (CMC) curve, where the vertical coordinate donates the correct matching rates and the horizontal coordinate exploits top ranking k. According to the CMC curve, we can figure out the correct matches from the top suspected pedestrian images.

Dataset: In our validation, we use two standard datasets:

Conclusion and future work

In this paper, we re-formulate Re-Id as a set-based classification problem from the perspective of information theory. Specifically, we define the set-based structure between the query image and the gallery images, and signify the relationship between the set and the class label by mutual-information. Further, we propose a non-linear metric learning method (DeepML), which is based on NCA and DBM. The proposed DeepML can both introduce the supervised information into SLM, and improve the

Acknowledgments

This work was supported in part by National Basic Research Program of China (973 Program): 2012CB316400, in part by National Natural Science Foundation of China: 61025011, 61133003, 61332016, 61390510, 61303154.

Hao Liu received the B.S. degree in software engineering from Sichuan University, China, in 2011 and the Engineering Master degree in computer technology from University of Chinese Academy of Sciences, China, in 2014. He is currently pursuing the Ph.D. degree at the department of automation, Tsinghua University. His research interests include medical imaging, computer vision, and deep learning.

References (33)

  • W. Schwartz, L. Davis, Learning discriminative appearance-based models using partial least squares, in: Computer...
  • D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, in: European...
  • M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person re-identification by symmetry-driven accumulation...
  • L. Bazzani, M. Cristani, A. Perina, M. Farenzena, V. Murino, Multiple-shot person re-identification by HPE signature,...
  • B. Ma, Y. Su, F. Jurie, Bicov: a novel image representation for person re-identification and face verification, in:...
  • R. Layne, T. Hospedales, S. Gong, Person re-identification by attributes, in: British Machine Vision Conference,...
  • B. Ma, Y. Su, F. Jurie, Local descriptors encoded by Fisher vectors for person re-identification, in: International...
  • M. Kostinger, M. Hirzer, P. Wohlhart, P. Roth, H. Bischof, Large scale metric learning from equivalence constraints,...
  • M. Hirzer, P. Roth, M. Kstinger, H. Bischof, Relaxed pairwise learned metric for person re-identification, in: European...
  • W. Zheng, S. Gong, T. Xiang, Transfer re-identification: from person to set-based verification, in: IEEE Conference on...
  • W. Zheng, S. Gong, T. Xiang, Person re-identification by probabilistic relative distance comparison, in: IEEE...
  • O. Boiman, E. Shechtman, M. Irani, In defense of nearest-neighbor based image classification, in: IEEE Conference on...
  • R. Salakhutdinov, G. Hinton, Learning a nonlinear embedding by preserving class neighbourhood structure, in:...
  • G. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks, vol. 313, 2006, pp....
  • L. Qin, W. Gao, Image matching based on a local invariant descriptor, in: IEEE International Conference on Image...
  • J. Davis, B. Kulis, P. Jain, S. Sra, D. Suvrit, S. Inderjit, Information-theoretic metric learning, in: International...
  • Cited by (0)

    Hao Liu received the B.S. degree in software engineering from Sichuan University, China, in 2011 and the Engineering Master degree in computer technology from University of Chinese Academy of Sciences, China, in 2014. He is currently pursuing the Ph.D. degree at the department of automation, Tsinghua University. His research interests include medical imaging, computer vision, and deep learning.

    Bingpeng Ma received the B.S. degree in mechanics, in 1998 and the M.S. degree in mathematics, in 2003 from Huazhong University of Science and Technology. He received Ph.D. degree in computer science at the Institute of Computing Technology, Chinese Academy of Sciences, P.R. China, in 2009. He was a post-doctorial researcher in University of Caen, France, from 2011 to 2012. He joined the School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, in March 2013 and now he is an assistant professor. His research interests cover image analysis, pattern recognition, and computer vision.

    Lei Qin received the B.S. and M.S. degrees in mathematics from the Dalian University of Technology, Dalian, China, in 1999 and 2002, respectively, and the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2008. He is currently an associate professor with the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His research interests include image/video processing, computer vision, and pattern recognition. He has authored or coauthored over 40 technical papers in the area of computer vision. Dr. Qin is a reviewer for IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Transactions on Cybernetics. He has served as TPC member for various conferences, including ECCV, ICPR, ICME, PSIVT, ICIMCS, PCM.

    Junbiao Pang received the B.S. and the M.S. degrees in computational fluid dynamics and computer science from the Harbin Institute of Technology, Harbin, China, in 2002 and 2004, respectively, and the Ph.D. degree at the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2011. He is currently a faculty member with the College of Metropolitan Transportation, Beijing University of Technology, Beijing, China. His research areas include computer vision, multi-media and machine learning, and he has authored or coauthored approximately ten technical papers.

    Chunjie Zhang received his Ph.D. degree in Pattern Recognition and Intelligent Systems from Institute of Automation, Chinese Academy of Sciences, China in 2011. He received his B.E. degree from Nanjing University of Posts and Telecommunications, China, 2006. He worked as an engineer in the Henan Electric Power Research Institute during 2011–2012. He is currently working as a postdoc at Graduate University of Chinese Academy of Sciences, Beijing, China. Dr. Zhang׳s current research interests include image processing, machine learning, cross media content analysis, pattern recognition and computer vision.

    Qingming Huang (SM׳08) received the B.S. degree in computer science and Ph.D. degree in Computer Engineering from Harbin Institute of Technology, Harbin, China, in 1988 and 1994, respectively. He is currently a professor with the University of the Chinese Academy of Sciences (CAS), China, and an Adjunct Research Professor with the Institute of Computing Technology, CAS. His research areas include multimedia computing, image processing, computer vision, pattern recognition and machine learning. He has published more than 200 academic papers in prestigious international journals including IEEE Transactions on Multimedia, IEEE Transactions on CSVT and IEEE Transactions on Image Processing, and top-level conferences such as ACM Multimedia, ICCV, CVPR and ECCV. He is the associate editor of Acta Automatica Sinica, and the reviewer of various international journals including IEEE Transactions on Multimedia, IEEE Transactions on CSVT, and IEEE Transactions on Image Processing. He has served as a program chair, track chair and TPC member for various conferences, including ACM Multimedia, CVPR, ICCV, ICME and PSIVT.

    View full text