Elsevier

Pattern Recognition

Volume 94, October 2019, Pages 53-61
Pattern Recognition

AlignedReID++: Dynamically matching local information for person re-identification

https://doi.org/10.1016/j.patcog.2019.05.028Get rights and content

Highlights

  • We porpose a new method name DMLI that can dynamically match horizontal stripes without requiring extra supervision or explicit pose estimation.

  • We introduce a local branch based on DMLI and design a novel framework called AlignedReID++, which can guide the global branch to learn more discriminative global features.

  • Experimental results demonstrate that the proposed approach achieves competitive results in both rank-1 accuracy and mAP on Market1501, DukeMTMCReID, CUHK03 and MSMT17 databases.

Abstract

Person re-identification (ReID) is a challenging problem, where global features of person images are not enough to solve unaligned image pairs. Many previous works used human pose information to acquire aligned local features to boost the performance. However, those methods need extra labeled data to train an available human pose estimation model. In this paper, we propose a novel method named Dynamically Matching Local Information (DMLI) that could dynamically align local information without requiring extra supervision. DMLI could achieve better performance, especially when encountering the human pose misalignment caused by inaccurate person detection boxes. Then, we propose a deep model name AlignedReID++ which is jointly learned with global features and local feature based on DMLI. AlignedReID++ improves the performance of global features, and could use DMLI to further increase accuracy in the inference phase. Experiments show effectiveness of our proposed method in comparison with several state-of-the-art person ReID approaches. Additionally, it achieves rank-1 accuracy of 92.8% on Market1501 and 86.2% on DukeMTMCReID with ResNet50. The code and models have been released2.

Introduction

Person re-identification (ReID), identifying a person of interest in non-overlapping camera system, is a challenging task in computer vision. Most previous works focused on learning global features of a person using Convolutional Neural Networks (CNNs), which trained with either a straightforward classification loss or a deep metric learning loss [1]. These global features based methods are difficult to solve some problems, such as variations in pose, viewpoints illumination, occlusion, etc. Some hard examples are shown in Fig. 1. To learn better local features, some works [2], [3], [4], [5], [6] apply a human pose estimation model to acquire human pose points, which are used to match different human parts or align viewpoints. However, training a human pose estimation model needs a lot of labeled data and acquiring human pose heatmaps need much more GPU memory. On the other hand, some methods [7], [8] used horizontal stripes or grids to extract local features of each body part. However, directly matching stripes or girds requires strict person alignment beforehand to get better performance.

In this paper, we propose a novel method named Dynamically Matching Local Information (DMLI) that can dynamically align horizontal stripes without requiring extra supervision or explicit pose estimation. The proposed method can be easily applied into almost all CNN based person ReID frameworks. To a certain extend, DMLI can solve the human pose variations caused by inaccurate person detection boxes, changed viewpoints, occlusions, etc. Then, we propose a novel ReID framework called AlignedReID++, which learns a global feature jointed with a local branch based on DMLI. In the local branch, we align local parts by introducing a shortest path distance. We find the local branch can guide the global branch to learn more discriminative global features. In the inference stage, combining global and local features (DMLI) can slightly improve the accuracy in further. In addition, the better global features keeps our approach attractive for the deployment of a large ReID system, without costly local features matching.

Except for the proposed method, extend experimental results also show some interesting phenomenon, which may give other researchers some inspiration. We find high-level feature maps are more suitable for aligning local parts. Although they have nearly “global” receptive field. In addition, we manually design two partial ReID datasets to simulate inaccurate person detection bounding boxes. In this special case, unaligned local features significantly do harm for re-identification.

Finally, combining global and local features, AlignedReID++ achieves the state-of-the-arts results on several datasets including Market1501[9], DukeMTMCReID[10], CUHK03[7] and MSMT17 [11]. And just using global features, our method also acquires competitive performance. In summary, our contributions are threefold:

  • We porpose a new method name DMLI that can dynamically match horizontal stripes without requiring extra supervision or explicit pose estimation.

  • We introduce a local branch based on DMLI and design a novel framework called AlignedReID++, which can guide the global branch to learn more discriminative global features.

  • Experimental results demonstrate that the proposed approach achieves competitive results in both rank-1 accuracy and mAP on Market1501, DukeMTMCReID, CUHK03 and MSMT17 databases.

Section snippets

Deep person ReID

In recent years many deep learning based person ReID methods [12], [13], [14] have been proposed and achieved surprising performance. The methods can be divided into the representation learning and the metric learning methods. The representation learning methods [15], [16], [17], [18], [19], [20], [21] regard the person ReID task as a classification problem trained with softmax loss (ID loss). In ID Embedding Net (IDENet) [15], [22], each person ID is considered a category of the classification

Methods

In this section, we describe the details of Dynamically Matching Local Information and the structure of AlignedReID++ shown in Fig. 3.

Datasets

In this section, we introduce the person ReID datasets used in the paper. Since Market1501 [9] and DukeMTMCReID [10] are most popular datasets of ReID research, we mainly choose these two datasets to demonstrate the ablation study. To better present the effectiveness of our proposed DMLI, we manually make two partial ReID datasets named Market1501-Partial and DukeMTMCReID. Additionally, we also report the performance of our AlignedReID++ on MSMT17 [11] and CUHK03 [7].

Visualization of DMLI

For the pair images of Fig. 2, we show the aligned results computed by our DMLI method in Fig. 4. Because the upper part of the right image mainly contains background, the feature distances between the first four blocks of the two images are very large. However, both the first stripe of left image and the fifth stripe of right image exist human head. So the feature distance between them is small, and DMLI contects these two parts. In addition, DMLI reasonably aligns chest, leg, foot, etc of

Conclusions and future works

In this paper, we propose DMLI, a dynamically matching local information method which aligns local stripes of a person image pair without other supervision, for deep person re-identification. Our proposed DMLI is easily introduced to any CNNs based ReID methods, and is good at solving the pose misalignments or other hard samples, especially partial ReID caused by error detection bounding boxes. We then propose AlignedReID++ framework through adding a local branch based on DMLI into a CNNs.

Acknowledgment

This research is supported by National Natural Science Foundation of China (61633019).

Hao Luo received the BSc degree in Automation from Zhejiang University, PR China, in 2015. Currently, he is doing his PhD at Zhejiang University with interests in pattern recognition, computer vision and deep learning.

References (55)

  • L. Wei et al.

    Glad: Global-local-alignment descriptor for pedestrian retrieval

    Proceedings of the 2017 ACM on Multimedia Conference

    (2017)
  • C. Su et al.

    Pose-driven deep convolutional model for person re-identification

    Computer Vision (ICCV), 2017 IEEE International Conference on

    (2017)
  • W. Li et al.

    Deepreid: Deep filter pairing neural network for person re-identification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • D. Cheng et al.

    Person re-identification by multi-channel parts-based cnn with improved triplet loss function

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • L. Zheng et al.

    Scalable person re-identification: a benchmark

    Computer Vision, IEEE International Conference

    (2015)
  • E. Ristani et al.

    Performance measures and a data set for multi-target, multi-camera tracking

    European Conference on Computer Vision workshop on Benchmarking Multi-Target Tracking

    (2016)
  • L. Wei et al.

    Person transfer gan to bridge domain gap for person re-identification

    CVPR

    (2018)
  • Z. Zheng et al.

    A discriminatively learned cnn embedding for person reidentification

    ACM Trans. Multime. Comput., Commun. Applica. (TOMM)

    (2017)
  • E. Ristani et al.

    Features for multi-target multi-camera tracking and re-identification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • H. Shi et al.

    Embedding deep metric for person re-identification: astudy against large variations

    European Conference on Computer Vision

    (2016)
  • L. Zheng et al.

    Person re-identification in the wild

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • H. Chen et al.

    Deep transfer learning for person re-identification

    2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)

    (2018)
  • T. Matsukawa et al.

    Person re-identification using cnn features learned from combination of attributes

    Pattern Recognition (ICPR), 2016 23rd International Conference on

    (2016)
  • Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Y. Yang, Improving person re-identification by attribute and identity learning,...
  • Y. Sun et al.

    Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)

    The European Conference on Computer Vision (ECCV)

    (2018)
  • L. Zheng et al.

    Person re-identification: past, present and future

    (2017)
  • Y. Wang et al.

    Person re-identification with cascaded pairwise convolutions

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • Cited by (219)

    • A transfer learning-based approach to maritime warships re-identification

      2023, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    Hao Luo received the BSc degree in Automation from Zhejiang University, PR China, in 2015. Currently, he is doing his PhD at Zhejiang University with interests in pattern recognition, computer vision and deep learning.

    Wei Jiang received the PhD degree in Pattern Recognition from Tokyo Institute of Technology, Japan. He is currently an Associate Professor in the Institute of Cyber-Systems and Control, Zhejiang University. His current research interests include large-scale pattern recognition, computer vision, deep learning, and control systems.

    This paper is the extended version of our unsubmitted preprint paper [1].

    1

    Zhejiang University, 2Megvii Inc (Face++)

    2

    https://github.com/michuanhaohao/AlignedReID

    View full text