Elsevier

Neurocomputing

Volume 307, 13 September 2018, Pages 25-37
Neurocomputing

Person re-identification by enhanced local maximal occurrence representation and generalized similarity metric learning

https://doi.org/10.1016/j.neucom.2018.04.013Get rights and content

Abstract

To solve the challenging person re-identification problem, great efforts have been devoted to feature representation and metric learning. However, existing feature extractors are either stripe-based or dense-block-based, the fine details and coarse appearance are not well integrated. What is more, the metrics are generally learned independently from distance view or bilinear similarity view. Few works have exploited the mutual complementary effects of their combination. To address these issues, we propose a new feature representation termed enhanced Local Maximal Occurrence (eLOMO) which fuses a new overlapping-stripe-based descriptor with the Local Maximal Occurrence (LOMO) extracted from dense blocks. Such integration makes eLOMO resemble the coarse-to-fine recognition mechanism of human vision system, thus it can provide a more discriminative descriptor for re-identification. Besides, we show the advantages of learning generalized similarity by combining the Mahalanobis distance and bilinear similarity together. Specifically, we derive a logistic metric learning method to jointly learn a distance metric and a bilinear similarity metric, which exploits both the distance and angle information from training data. Taking advantage of learning in the intra-class subspace, the proposed method can be solved efficiently by coordinate descent optimization. Experiments on four challenging datasets including VIPeR, PRID450S, QMUL GRID, and CUHK01, show that the proposed method outperforms the state-of-the-art approaches significantly.

Introduction

Person re-identification is the task of matching individuals across disjoint camera views over distributed spaces, which plays an important role in intelligent video surveillance. Although it is assumed that people do not change clothes in different camera views, person re-identification still remains a challenging problem due to large appearance variations caused by illumination, pose, viewpoint, and occlusion.

Great efforts have been devoted for years to tackle person re-identification along two directions. One is to design robust visual descriptors against cross-view variations, and the other is to learn a discriminant similarity/distance functissson to determine whether an image pair belongs to the same person or not. For visual descriptors, a number of feature representations have been proposed, such as Symmetry-Driven Accumulation of Local Features (SDALF) [1], Mid-level Filter (MLF) [2], Biologically Inspired Features (BIF) [3], Salient Color Names (SCN) [4], Local Maximal Occurrence (LOMO) [5], and the Gaussian of Gaussian (GOG) descriptor [6]. Most of them are extracted from either horizontal stripes or dense blocks. Although impressive advancement has been made, designing a more robust yet discriminative descriptor remains an open problem.

As for similarity/distance function learning, a number of metric learning algorithms have been devised [5], [7], [8], [9], [10], [11], [12], [13], [14]. Some of them, like [10], [11], [13], [14], focus on learning a Mahalanobis distance metric from distance constraints. While some other works, like [7], [12], seek for a bilinear similarity metric by utilizing the angle information between instances in high-dimensional feature space. However, most of the works fail to exploit the mutual complementary effects of their combination. Only considering either of them may lead to a less discriminative similarity measurement.

In this paper, we propose an efficient feature representation termed enhanced Local Maximal Occurrence (eLOMO), and a Generalized Similarity Metric Learning (GSML) method for person re-identification. The eLOMO is an integration of a new overlapping-stripe-based descriptor with the existing LOMO [5] feature. The stripe-based descriptor can better exploit coarse appearance information from larger regions, while LOMO is good at capturing the fine details of dense blocks. Thus the fusion of them can lead to a coarse-to-fine representation which is in line with the human recognition mechanism. To learn a discriminant similarity function, we combine the Mahalanobis distance and the bilinear similarity together, such that the distance and angle information of training data are exploited simultaneously. The proposed method is formulated as a logistic metric learning problem with Positive Semi-definite (PSD) constraints, and we derive an efficient coordinate descent algorithm to solve it based on the Accelerated Proximal Gradient (APG) optimization method. To suppress the large intra-class variations of cross-view appearances, we project samples into the intra-class subspace before learning. The pipeline of the proposed method is shown in Fig. 1.

We conduct extensive experiments to validate the efficacy of the proposed method. Experimental results show that the proposed method achieves significant improvements over existing approaches on four challenging person re-identification datasets, namely VIPeR [15], PRID450S [16], QMUL GRID [17], and CUHK01 [2].

The rest of this paper is organized as follows. In Section 2 we briefly review the related works and discuss their differences with our method. Section 3 introduces the eLOMO feature representation. Section 4 presents the GSML in detail. The experimental results and the analysis of our method are reported in Section 5. Finally, we draw some conclusions in Section 6.

Section snippets

Related work

Given one probe image containing an individual of interest, the task of person re-identification is to find its true match (or usually the best match) from a large number of gallery images. Existing works for solving this problem generally follow a two-step paradigm. Firstly, a robust and distinctive feature representation is extracted for every pedestrian image. Secondly, the similarity/distance for each probe-gallery image pair is measured by a certain metric, which is then used to rank the

Enhanced local maximal occurrence representation

Similar to the coarse-to-fine recognition mechanism of human vision system, a discriminative feature representation for visual learning should also take both fine details and holistic appearance information into consideration. The advantage is that they can work co-operatively to capture the invariance of pedestrian appearance in different camera views. As a result, it will greatly help to identify the interested target. Although some descriptors like LOMO and GOG have considered computing

Problem formulation

Let {X, Z, Y} be a given cross-view training set, where XRd×n and ZRd×m are the feature matrices of probe set and gallery set, with n and m samples in a d-dimensional feature space respectively; YRn×m is the matching label matrix between X and Z, with yij=1 if (xi, zj) is a positive pair (i.e. xi and zj represent the same person), and yij=1 otherwise. The re-identification task is to learn a similarity function f(xi, zj) to measure the similarity between each pair {(xi,zj)}i,j=1n,m.

Experiments

We evaluate the proposed method on four widely used person re-identification datasets including VIPeR [15], PRID450S [16], QMUL GRID [17], and CUHK01 [2]. Fig. 5 shows some image pairs randomly selected from these datasets. The performance is evaluated by the Cumulative Matching Characteristics (CMC) curve which represents the expectation of finding the right match in top r matches. To get a robust performance for comparison, we repeat the experiment procedure 10 times with random

Conclusion

In this paper, we have proposed a discriminative and robust feature representation termed eLOMO, and an effective metric learning method called GSML for person re-identification. The eLOMO fuses the features extracted from both horizontal stripes and dense blocks, such that the fine details and holistic appearance information can be integrated together to enhance the discrimination. The proposed GSML jointly learns a Mahalanobis distance metric and a bilinear similarity metric to simultaneously

Acknowledgment

This work was partially supported by National Natural Science Foundation of China (NSFC Grant No. 61773272, 61272258, 61301299, 61572085, 61170124, 61272005), Provincial Natural Science Foundation of Jiangsu (Grant No. BK20151254, BK2015-1260), Science and Education Innovation based Cloud Data fusion Foundation of Science and Technology Development Center of Education Ministry(2017B03112), Six talent peaks Project in Jiangsu Province (DZXX-027), Key Laboratory of Symbolic Computation and

Husheng Dong received his M.S. degree from School of Computer Science & Technology, Soochow University in 2008, and he is pursuing the Ph.D. degree now. He is also a teacher of Suzhou Institute of Trade & Commerce. His research interest includes computer vision, image and video processing, and machine learning.

References (58)

  • MaB. et al.

    Covariance descriptor based on bio-inspired features for person re-identification and face verification

    Image Vis. Comput.

    (2014)
  • A. Bedagkar-Gala et al.

    A survey of approaches and trends in person re-identification

    Image Vis. Comput.

    (2014)
  • M. Farenzena et al.

    Person re-identification by symmetry-driven accumulation of local features

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2010)
  • ZhaoR. et al.

    Learning mid-level filters for person re-identification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • YangY. et al.

    Salient color names for person re-identification

    Proceedings of the European Conference on Computer Vision

    (2014)
  • LiaoS. et al.

    Person re-identification by local maximal occurrence representation and metric learning

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • T. Matsukawa et al.

    Hierarchical gaussian descriptor for person re-identification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • ChenJ. et al.

    Relevance metric learning for person re-identification by exploiting listwise similarities

    IEEE Trans. Image Process.

    (2015)
  • M. Hirzer et al.

    Relaxed pairwise learned metric for person re-identification

    Proceedings of the European Conference on Computer Vision

    (2012)
  • M. Köestinger et al.

    Large scale metric learning from equivalence constraints

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • LiaoS. et al.

    Efficient PSD constrained asymmetric metric learning for person re-identification

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • A. Mignon et al.

    Pcca: a new approach for distance learning from sparse pairwise constraints

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • NguyenH.V. et al.

    Cosine similarity metric learning for face verification

    Proceedings of the Asian Conference on Computer Vision

    (2010)
  • K.Q. Weinberger et al.

    Distance metric learning for large margin nearest neighbor classification

    J. Mach. Learn. Res.

    (2009)
  • ZhengW.-S. et al.

    Reidentification by relative distance comparison

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • D. Gray et al.

    Viewpoint invariant pedestrian recognition with an ensemble of localized features

    Proceedings of the European Conference on Computer Vision

    (2008)
  • P.M. Roth et al.

    Mahalanobis Distance Learning for Person Re-identification

    (2014)
  • C.C. Loy et al.

    Multi-camera activity correlation analysis

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2009)
  • G. Doretto et al.

    Appearance-based person reidentification in camera networks: problem overview and current approaches

    J. Ambient Intell. Humaniz. Comput.

    (2011)
  • GongS. et al.

    Person Re-identification

    (2014)
  • L. Zheng, Y. Yang, A.G. Hauptmann, Person re-identification: past, present and future, arXiv:1610.02984...
  • ChenY.C. et al.

    Mirror representation for modeling view-specific transform in person re-identification

    Proceedings of the International Conference on Artificial Intelligence

    (2015)
  • G. Lisanti et al.

    Person re-identification by iterative re-weighted sparse ranking

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • LiZ. et al.

    Learning locally-adaptive decision functions for person verification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • ZhaoR. et al.

    Person re-identification by salience matching

    Proceedings of the IEEE International Conference on Computer Vision

    (2013)
  • ChenS.Z. et al.

    Deep ranking for person re-identification via joint representation learning

    IEEE Trans. Image Process.

    (2016)
  • XiaoT. et al.

    Learning deep feature representations with domain guided dropout for person re-identification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • Y.C. Chen, X. Zhu, W.S. Zheng, J.H. Lai, Person re-identification by camera correlation aware feature augmentation,...
  • SunC. et al.

    Person re-identification via distance metric learning with latent variables

    IEEE Trans. Image Process.

    (2017)
  • Cited by (39)

    • Dual-path image pair joint discrimination for visible–infrared person re-identification

      2022, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Specifically, we compare our method with feature learning methods (zero-padding, one-stream, two-stream [13], TONE [14], ECMC [59], EAKD [60]), metric learning method (BDTR, BCTR [3], D-HSME [54], BEAT [58]) and image generation methods (cmGAN [55], D2RL [56], CMPG [47]). In addition, we also compare it with traditional methods, including feature extraction methods (HOG [50], LOMO [51], MLBP [52]) metric learning method (GSM [63]) and visible light person re-id methods (PCB [53], SVDNet [57], MDAN [61], CPN [62]), and the results show that our method is very competitive. The results on SYSU-MM01 dataset and RegDB dataset are shown in Tables 1 and 2 respectively.

    • Visible–infrared person re-identification based on key-point feature extraction and optimization

      2022, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Results on RegDB Dataset. We compare with methods described in HOG [40], LOMO [41], BDTR [42], D2RL [43], JSIA [12], CDP [27] on RegDB dataset. We test visible2thermal and thermal2visible modes respectively.

    View all citing articles on Scopus

    Husheng Dong received his M.S. degree from School of Computer Science & Technology, Soochow University in 2008, and he is pursuing the Ph.D. degree now. He is also a teacher of Suzhou Institute of Trade & Commerce. His research interest includes computer vision, image and video processing, and machine learning.

    Ping Lu received her B.Eng and M.S. degree from School of Computer Science and Technology, Soochow University in 2002 and 2005, respectively. She is an associate professor at Suzhou Institute of Trade & Commerce. Her research interest includes digital image processing and pattern recognition.

    Shan Zhong received her M.S. and Ph.D. from Jiang University (2007) and Soochow University (2017), respectively. She is a teacher of Changshu Institute of Technology now. Her research interests include machine learning and Deep learning.

    Chunping Liu received her Ph.D. degree in pattern recognition and artificial intelligence from Nanjing University of Science & Technology in 2002. She is now a professor of School of Computer Science & Technology, Soochow University. Her research interests include computer vision, image analysis and recognition, in particular in the domains of visual saliency detection, object detection and recognition and scene understanding.

    Yi Ji received her M.S. Degree from National University of Singapore, Singapore and Ph.D. degree from INSA de Lyon, France. She is now an associate professor in School of Computer Science & Technology of Soochow University. Her research areas are 3D action recognition and complex scene understanding.

    Shengrong Gong received his M.S. degree from Harbin Institute of Technology in 1993, and his Ph.D. degree from Beihang University in 2001. He is the dean of School of Computer Science and Engineering, Changshu Institute of Technology, and a professor and doctoral supervisor of School of Computer Science & Technology, Soochow University. His research interests include image and video processing, pattern recognition, and computer vision.

    View full text