AlignedReID++: Dynamically matching local information for person re-identification☆
Introduction
Person re-identification (ReID), identifying a person of interest in non-overlapping camera system, is a challenging task in computer vision. Most previous works focused on learning global features of a person using Convolutional Neural Networks (CNNs), which trained with either a straightforward classification loss or a deep metric learning loss [1]. These global features based methods are difficult to solve some problems, such as variations in pose, viewpoints illumination, occlusion, etc. Some hard examples are shown in Fig. 1. To learn better local features, some works [2], [3], [4], [5], [6] apply a human pose estimation model to acquire human pose points, which are used to match different human parts or align viewpoints. However, training a human pose estimation model needs a lot of labeled data and acquiring human pose heatmaps need much more GPU memory. On the other hand, some methods [7], [8] used horizontal stripes or grids to extract local features of each body part. However, directly matching stripes or girds requires strict person alignment beforehand to get better performance.
In this paper, we propose a novel method named Dynamically Matching Local Information (DMLI) that can dynamically align horizontal stripes without requiring extra supervision or explicit pose estimation. The proposed method can be easily applied into almost all CNN based person ReID frameworks. To a certain extend, DMLI can solve the human pose variations caused by inaccurate person detection boxes, changed viewpoints, occlusions, etc. Then, we propose a novel ReID framework called AlignedReID++, which learns a global feature jointed with a local branch based on DMLI. In the local branch, we align local parts by introducing a shortest path distance. We find the local branch can guide the global branch to learn more discriminative global features. In the inference stage, combining global and local features (DMLI) can slightly improve the accuracy in further. In addition, the better global features keeps our approach attractive for the deployment of a large ReID system, without costly local features matching.
Except for the proposed method, extend experimental results also show some interesting phenomenon, which may give other researchers some inspiration. We find high-level feature maps are more suitable for aligning local parts. Although they have nearly “global” receptive field. In addition, we manually design two partial ReID datasets to simulate inaccurate person detection bounding boxes. In this special case, unaligned local features significantly do harm for re-identification.
Finally, combining global and local features, AlignedReID++ achieves the state-of-the-arts results on several datasets including Market1501[9], DukeMTMCReID[10], CUHK03[7] and MSMT17 [11]. And just using global features, our method also acquires competitive performance. In summary, our contributions are threefold:
- •
We porpose a new method name DMLI that can dynamically match horizontal stripes without requiring extra supervision or explicit pose estimation.
- •
We introduce a local branch based on DMLI and design a novel framework called AlignedReID++, which can guide the global branch to learn more discriminative global features.
- •
Experimental results demonstrate that the proposed approach achieves competitive results in both rank-1 accuracy and mAP on Market1501, DukeMTMCReID, CUHK03 and MSMT17 databases.
Section snippets
Deep person ReID
In recent years many deep learning based person ReID methods [12], [13], [14] have been proposed and achieved surprising performance. The methods can be divided into the representation learning and the metric learning methods. The representation learning methods [15], [16], [17], [18], [19], [20], [21] regard the person ReID task as a classification problem trained with softmax loss (ID loss). In ID Embedding Net (IDENet) [15], [22], each person ID is considered a category of the classification
Methods
In this section, we describe the details of Dynamically Matching Local Information and the structure of AlignedReID++ shown in Fig. 3.
Datasets
In this section, we introduce the person ReID datasets used in the paper. Since Market1501 [9] and DukeMTMCReID [10] are most popular datasets of ReID research, we mainly choose these two datasets to demonstrate the ablation study. To better present the effectiveness of our proposed DMLI, we manually make two partial ReID datasets named Market1501-Partial and DukeMTMCReID. Additionally, we also report the performance of our AlignedReID++ on MSMT17 [11] and CUHK03 [7].
Visualization of DMLI
For the pair images of Fig. 2, we show the aligned results computed by our DMLI method in Fig. 4. Because the upper part of the right image mainly contains background, the feature distances between the first four blocks of the two images are very large. However, both the first stripe of left image and the fifth stripe of right image exist human head. So the feature distance between them is small, and DMLI contects these two parts. In addition, DMLI reasonably aligns chest, leg, foot, etc of
Conclusions and future works
In this paper, we propose DMLI, a dynamically matching local information method which aligns local stripes of a person image pair without other supervision, for deep person re-identification. Our proposed DMLI is easily introduced to any CNNs based ReID methods, and is good at solving the pose misalignments or other hard samples, especially partial ReID caused by error detection bounding boxes. We then propose AlignedReID++ framework through adding a local branch based on DMLI into a CNNs.
Acknowledgment
This research is supported by National Natural Science Foundation of China (61633019).
Hao Luo received the BSc degree in Automation from Zhejiang University, PR China, in 2015. Currently, he is doing his PhD at Zhejiang University with interests in pattern recognition, computer vision and deep learning.
References (55)
- et al.
Multi-type attributes driven multi-camera person re-identification
Pattern Recogn.
(2018) - et al.
Spherereid: deep hypersphere manifold embedding for person re-identification
J. Vis. Commun. Image Represen.
(2019) - et al.
Deep self-paced learning for person re-identification
Pattern Recogn.
(2018) - et al.
Attention driven person re-identification
Pattern Recogn.
(2019) - et al.
What-and-where to match: deep spatially multiplicative integration networks for person re-identification
Pattern Recogn.
(2018) - et al.
Deep ranking model by large adaptive margin learning for person re-identification
Pattern Recogn.
(2018) - et al.
Alignedreid: Surpassing human-level performance in person re-identification
(2017) - et al.
Pose invariant embedding for deep person re-identification
IEEE Transactions on Image Processing
(2019) - et al.
Spindle net: Person re-identification with human body region guided feature decomposition and fusion
CVPR
(2017) - et al.
A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018)
Glad: Global-local-alignment descriptor for pedestrian retrieval
Proceedings of the 2017 ACM on Multimedia Conference
Pose-driven deep convolutional model for person re-identification
Computer Vision (ICCV), 2017 IEEE International Conference on
Deepreid: Deep filter pairing neural network for person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Person re-identification by multi-channel parts-based cnn with improved triplet loss function
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Scalable person re-identification: a benchmark
Computer Vision, IEEE International Conference
Performance measures and a data set for multi-target, multi-camera tracking
European Conference on Computer Vision workshop on Benchmarking Multi-Target Tracking
Person transfer gan to bridge domain gap for person re-identification
CVPR
A discriminatively learned cnn embedding for person reidentification
ACM Trans. Multime. Comput., Commun. Applica. (TOMM)
Features for multi-target multi-camera tracking and re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Embedding deep metric for person re-identification: astudy against large variations
European Conference on Computer Vision
Person re-identification in the wild
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Deep transfer learning for person re-identification
2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)
Person re-identification using cnn features learned from combination of attributes
Pattern Recognition (ICPR), 2016 23rd International Conference on
Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)
The European Conference on Computer Vision (ECCV)
Person re-identification: past, present and future
Person re-identification with cascaded pairwise convolutions
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cited by (219)
Multi-axis interactive multidimensional attention network for vehicle re-identification
2024, Image and Vision ComputingMI<sup>3</sup>C: Mining intra- and inter-image context for person search
2024, Pattern RecognitionA bidirectional fusion branch network with penalty term-based trihard loss for person re-identification
2023, Journal of Visual Communication and Image RepresentationA transfer learning-based approach to maritime warships re-identification
2023, Engineering Applications of Artificial IntelligenceGlobalAP: Global average precision optimization for person re-identification
2023, Pattern RecognitionPerson re-identification: A retrospective on domain specific open challenges and future trends
2023, Pattern Recognition
Hao Luo received the BSc degree in Automation from Zhejiang University, PR China, in 2015. Currently, he is doing his PhD at Zhejiang University with interests in pattern recognition, computer vision and deep learning.
Wei Jiang received the PhD degree in Pattern Recognition from Tokyo Institute of Technology, Japan. He is currently an Associate Professor in the Institute of Cyber-Systems and Control, Zhejiang University. His current research interests include large-scale pattern recognition, computer vision, deep learning, and control systems.
- ☆
This paper is the extended version of our unsubmitted preprint paper [1].
- 1
Zhejiang University, 2Megvii Inc (Face++)
- 2
https://github.com/michuanhaohao/AlignedReID