Locality based discriminative measure for multiple-shot human re-identification

doi:10.1016/j.neucom.2015.04.068

Neurocomputing

Volume 167, 1 November 2015, Pages 280-289

https://doi.org/10.1016/j.neucom.2015.04.068 Get rights and content

Abstract

Multiple-shot human re-identification is an important issue in both academia and industry. It addresses the problem of building correspondences among object instances appearing in a camera network using biometric cues. Among all possible cues, face is a typical one that has long been investigated, while the whole body has become a recent trend. This problem is challenging because of small intra-class similarities and inter-class dissimilarities caused by the variations of human appearance in real scenarios. Existing methods mainly involve designing a representation and/or devising a measure to explore the within-class compactness and between-class separation. Although encouraging progress has been made, the results are still far from satisfactory. In this paper, we propose a novel set-based matching model, “Locality Based Discriminative Measure”, to re-identify the human body when a set of test samples for each person are available. A new set-to-set dissimilarity is crafted considering both majorities and minorities of samples from each pair of sets. The discriminability of this dissimilarity is then further exploited by the local metric field; it can thereby serve as a more capable low-level measure to support the high-level measure for the final matching. Extensive experiments on widely used benchmarks demonstrate that our proposal remarkably outperforms state-of-the-arts.

Introduction

To determine the re-appearance of a person who has been previously observed in deployed cameras, human re-identification is a valuable but challenging visual surveillance task for both academia and industry. Human re-identification has a wide range of applications, such as off-line video retrieval, on-line tracking, and others. Among biometric cues, face is a crucial one and is commonly used to distinguish human identities [1], [2]. However, in some real-world situations, the low quality of the images makes face re-identification impossible; therefore, whole body based re-identification has been given increasing attention. Nevertheless, unavoidable pose variations, illumination changes, viewpoint alterations, occlusions, and possibly similar body shapes and clothing styles present significant challenges to this approach.

This paper addresses multiple-shot human re-identification in which there are multiple body images available in query and corpus domains for each identity in question. The relevant methodologies used can be categorized in two paradigms. Some of the methodologies pay attention to reliable representation [3], [4], [5], [6], [7], [8]. Although a sound representation can characterize the human appearance, the complexity of the real-world situation and the subjectivity of the feature hand-designing process inevitably impede the performance enhancement. Other solutions rely on robust measure for the representation to address these problems [9], [10], [11], [12], [13], [14]. Their inspiring results have been achieved because of improved intra-class compactness and inter-class separation. However, a lack of significant progress in this direction has also rapidly slowed the performance improvement.

Feature representations of human images can be considered as points in the topological space. We define a group of features extracted from multiple-shot images of the same person to be one whole set. Based on this definition, the key problem of multiple-shot human re-identification involves determining a suitable dissimilarity measure for accurately matching the sets between query and corpus domains. This type of measure can be explored from two aspects: one is to adapt the point-based distance to the set level; the other is to seek the underlying metrics for the sets. Exploration and collaboration of these two aspects could lead to a significant advancement in this important yet unresolved area in computer vision. Based on this objective, we propose a novel set-based matching model, referred to as “Locality Based Discriminative Measure (LBDM)”, to re-identify the human images.

As depicted in Fig. 1, LBDM comprises three primary steps: set-to-set dissimilarity crafting, local metric field constructing, and set-based matching. The first step involves designing a novel set-to-set dissimilarity for multiple-shot human images. This dissimilarity transfers the sample based distance to the set level, while providing a tool to determine the neighboring sets of each set needed in the local metric learning framework. Local metric learning is intended to pull closer together samples of the same set and those in its neighborhood farther away to ensure a more reliable set-to-set dissimilarity measure. By enabling a more reliable set-to-set dissimilarity measure, the set-level oriented common-near-neighbor modeling can be fully leveraged for effective matching.

In sum, the main contributions of this paper are as follows:

•
We craft a novel set-to-set dissimilarity as a low-level measure. By considering the local minorities of samples in each set, the measure can preserve within-set variability; by determining the global distance between majorities of samples from paired sets, the measure is robust to the irregular outliers (Section 3.1).
•
We introduce a new local metric learning model. It constructs the local metric field,¹ in which the discriminability of the new set-to-set distance is greatly enhanced into a middle-level measure (Section 3.2).
•
We present an effective set-based matching framework. It propagates the effectiveness of the middle-level dissimilarity measure to the high-level dissimilarity measure presented by set-level common-near-neighbor modeling for final matching (Section 3.3).

The above contributions are based on but extend our previous work [15], [16], with further scope of elaboration and experimentation.

Section snippets

Proposed method: problem definition and overview

Generally speaking, the issue of person re-identification can be divided into two research directions. One is single-shot re-identification; the other is multiple-shot re-identification. For the single-shot case, there is only one single image for each identity in query and corpus sides. While for the multiple-shot case, there are several images, which form an image set, for each identity in query and corpus domains. The set concept is very important for multiple-shot re-identification.

Set-to-set distance crafting

To correctly match the sets, an opportune set-to-set distance is important. Most previous methods have spotlighted minority-based distance, while claiming the effectiveness of this strategy. Minority-based distance takes within-set variability into account by measuring the closest local minorities of each paired sets. Two exemplary methods are Minimum Point-wise Distance (MPD) and Convex Hull based Image Set Distance (CHISD) [17]. MPD measures the minimum point-wise distance between sample

Experimental setup

We set up experiments on several public benchmark datasets of whole-body cues, including i-LIDS-MA, i-LIDS-AA [5], and Caviar4REID [27]. All of them have multiple images with spatio-temporal variations for each identity, as shown in Fig. 4.

The i-LIDS-MA and i-LIDS-AA datasets are obtained from the videos of i-LIDS MCTS captured by a multi-camera CCTV network at an airport arrival hall during a busy time. From these videos, i-LIDS-MA [5] is made of manually annotated individual images, while

Conclusions

This paper has proposed a novel method referred to as “LBDM” to address the problem of multiple-shot human re-identification. In LBDM, a new set-to-set dissimilarity was crafted and its discriminability was exploited by LMF to help enhance the matching ability of SCNNM. Results have demonstrated the reliability and superiority of LBDM for multiple-shot person re-identification. Future work will cover, but not limited to, developing the LBDM application scope to cross-camera tracking and

Wei Li is with School of Instrument Science and Engineering, Southeast University, China. He received the bachelor's and master's degrees from Southeast University in 2007 and 2010, respectively, and the doctoral degree from Kyoto University in 2014. His research interests include computer vision, pattern recognition, and machine learning.

References (31)

L. Lin et al.
Representing and recognizing objects with massive local image patches
Pattern Recognit.
(2012)
J. Wright et al.
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2009)
L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition? in:...
M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person re-identification by symmetry-driven accumulation...
B. Loris, C. Marco, P. Alessandro, F. Michela, M. Vittorio, Multiple-shot person re-identification by hpe signature,...
B. Slawomir, C. Etienne, B. Francois, T. Monique, Multiple-shot human re-identification by mean riemannian covariance...
Y. Wu, M. Minoh, M. Mukunoki, S. Lao, Robust object recognition via third-party collaborative representation, in:...
Y. Xu, L. Lin, W.-S. Zheng, X. Liu, Human re-identification by matching compositional template with cluster sampling,...
W. Li, R. Zhao, T. Xiao, X. Wang, Deepreid: deep filter pairing neural network for person re-identification, in:...
Y. Wu, M. Minoh, M. Mukunoki, W. Li, S. Lao, Collaborative sparse approximation for multiple-shot across-camera person...

Y. Wu, M. Minoh, M. Mukunoki, Collaboratively regularized nearest points for set based recognition, in: Proceedings of...

Y. Wu, M. Minoh, M. Mukunoki, S. Lao, Set based discriminative ranking for recognition, in: Proceedings of the 12th...

W. Li et al.

Bi-level relative information analysis for multiple-shot person re-identification

IEICE Trans. Inf. Syst.

(2013)

X. Liu, M. Song, D. Tao, X. Zhou, C. Chen, J. Bu, Semi-supervised coupled dictionary learning for person...

S. Pedagadi, J. Orwell, S. Velastin, B. Boghossian, Local fisher discriminant analysis for pedestrian...

Cited by (0)

Yang Wu is with Institute for Research Initiatives, Nara Institute of Science and Technology, Japan. He received a BS degree and a Ph.D degree from Xi׳an Jiaotong University in 2004 and 2010, respectively. From September 2007 to December 2008, he was a visiting student at University of Pennsylvania. From 2011 to 2014, he was a program specific researcher at the Academic Center for Computing and Media Studies, Kyoto University. His research is in the fields of computer vision, pattern recognition, image/video search and retrieval, and machine learning.

Masayuki Mukunoki is a professor in Faculty of Engineering, University of Miyazaki, Japan. He received the bachelor's, master's, and doctoral degrees in Information Science from Kyoto University. His research interests include computer vision, video media processing, lecture video analysis and human activity sensing with camera. He is a member of Information Processing Society of Japan and Institute of Electronics, Information and Communication Engineers of Japan.

Yinghui Kuang is a professor and the vice-dean of Chien-Shiung Wu College, Southeast University, China. She received the bachelor's and master's degrees from Xi'an Jiaotong University, and the doctoral degree from Southeast University. Her research interests mainly include measurement and control technology and intelligent systems.

Michihiko Minoh is a professor at Academic Center for Computing and Media Studies, Kyoto University, Japan. He received the B.Eng., M.Eng., and D.Eng. degrees from Kyoto University, in 1978, 1980 and 1983, respectively. Since October 2010, he has been vice-president, chief information officer at Kyoto University, and director-general at Institute for Information Management and Communication, Kyoto University. His research interests include image processing, artificial intelligence and multimedia applications. He is a member of IEEE, ACM, etc.

View full text