Pose-guided part matching network via shrinking and reweighting for occluded person re-identification
Introduction
Person re-identification (ReID) [1], [2], [3] targets on retrieving a pedestrian of interest from non-overlapping camera views. Combining with pedestrian detection and tracking, ReID is widely used in video surveillance, security and smart city.
With the explosive development of deep learning, the performance of ReID has been significantly improved in recent years, however, the previous works [4], [5], [6], [7], [8] did not work well when dealing with occluded datasets, due to the assumption that the entire body of the pedestrian is available. In real-world scenarios, the degree of occlusion is more serious, because pedestrians are more likely to be occluded by static obstacles like trees, cars, and roadblocks, etc. Even in a crowded crowd, most of the bodies of the target pedestrian will be occluded by other pedestrians. Thus, it is essential to seek an effective method to solve the matching problem of person images with occlusion, which is known as the occluded person Re-ID problem [9], [10], [11], [12]. Occluded ReID is more challenging due to the following reasons: ① Part-based features have been proved to be efficient [4], but there exists the problem of misalignment in part-to-part matching, which will be more serious in seriously occluded situations. ② Occlusion will not only reduce the discriminative information of the image but also introduce extra noise. ③ If the part area suffers from occlusion or outliers, etc., hard one-to-one part matching is meaningless and not very robust.
In recent years, some works [11], [12], [13] combined with pose estimation can overcome the ①, which can divide person images to avoid the misalignment by locating the semantic parts of the human body. Specifically, we propose a novel framework named SRNet to solve the ② and ③. In Fig. 1(a), We can see that there are severely occluded scenes in both two datasets, hence the network that extracts global features should focus on the more accurate non-occluded pedestrian areas, because paying attention to the occluded areas will introduce additional noise, which is ②. So we first adopt a strong backbone that introduces multiple tricks to improve the ability to extract global features for occluded Re-ID, as shown in Fig. 1(b, c), the effect of the strong backbone is better than ResNet50 [14]. However, the network will inevitably pay attention to areas besides pedestrian, to alleviate this problem, we further improve the performance by adding the DRS Module [15] to the strong backbone. So as to eliminate the unimportant feature, we set the unimportant features below the soft threshold to 0 by learning the soft thresholds of different channels. The result of the strong backbone with the DRS Module is illustrated in Fig. 1(d), and our improved network focuses on non-occluded pedestrian areas more accurately and concentratedly, and can learn more robust and discriminative features. As for ③, it can be seen from Fig. 1(a) that most of the key-points of pedestrians are occluded. In order to weaken the influence of occluded part features in the part matching stage, we designed a Reweight Module for Part Matching (RMPM) to learn self-adaptive weights for part features, the part features that the network focuses on can get larger weights. We do not use the key-points confidence to determine whether the part is occluded or not, because the confidence is completely dependent on the result of the pose estimation model, and it only represents the probability of the key-point, and cannot be equal to the degree that the network should focus on. RMPM make the part matching stage more robust and reduce the influence of the meaningless part features in the point-to-point corresponding graph matching.
In conclusion, the main contributions of this paper are summarized as follows:
- (1)
We propose a novel network SRNet that can learn more robust and highly discriminative global features, and obtain soft thresholds through DRS Module to eliminate the noise-related features.
- (2)
We design an effective Reweight Module for Part Matching (RMPM). By assigning adaptive weights to each part feature, meaningless part features (occluded, outlier) are weakened, which is beneficial to part matching and learn robust alignment.
- (3)
Extensive experimental results on occluded and partial re-id datasets clearly demonstrate that the proposed method achieves remarkable performance to the state-of-the-art methods. In addition, our method has also achieved competitive performance on holistic datasets.
Section snippets
Person re-identification
Most existing ReID works [4], [5], [16], [17] based on deep learning can automatically learn complex features, whose performance is higher than hand-crafted descriptors [18], [19], [20]. These works can be grouped into global feature learning and local feature learning according to the area where the features are extracted, global feature learning extracts a global feature vector from each person image, however, the using of global features alone cannot meet the performance requirements due to
The proposed method
In this section, we demonstrate the pipeline of our SRNet in Fig. 2, this framework includes two parts: the shrinking global features and a reweight module for part matching. In Section 3.1, we introduce the extraction of global features through shrinking, at first, we introduce the strong backbone which combines some effective designs and training tricks to learn the robust and discriminative global features, and then introduce the DRS Module, so as to eliminate unimportant features by
Train and inference
In summary, we use four loss functions which includes the for more robust and highly discriminative global features, for maintaining the discriminative power of each part feature, and the verification loss and the permutation cross-entropy loss during the training stage. The overall objective function of our framework is formulated in Eq. (14), where λbackbone, λVer, λpce are weights of corresponding terms
The whole training
Datasets
To demonstrate the efficacy of our method on the occlusion problem, we evaluate our proposed SRNet on two occluded datasets: Occluded-Duke [11], Occluded-ReID [9] and one partial dataset: Partial-REID [25], we also evaluate our method on two holistic datasets: Market-1501 [34], DukeMTMC-reID [35].
- 1)
Occluded-Duke [11] is modified from DukeMTMC-reID [35] dataset, it contains 15,618 training images, 17,661 gallery images, and 2210 occluded query images. Occluded-ReID is captured by the mobile
Conclusion
In this paper, we propose a novel framework for occluded ReID. For extracting more robust and discriminative global features, we use a strong backbone that combines some effective designs and training tricks. In order to make the area that the network pays attention to more precise and concentrated, we utilize the Deep Residual Shrinkage Module (DRS Module) to eliminate unimportant features. In order to weaken the influence of meaningless part features (occluded, outlier) in the part matching
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (44)
- et al.
Covariance descriptor based on bio-inspired features for person re-identification and face verification
Image Vis. Comput.
(2014) - et al.
Person re-identification
- et al.
Person re-identification: Past, present and future
(2016) - et al.
Deep learning for person re-identification: a survey and outlook
IEEE Trans. Pattern Anal. Mach. Intell. PP
(2021) - et al.
Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)
- et al.
Learning discriminative features with multiple granularities for person re-identification
- et al.
Harmonious attention network for person reidentification
- et al.
Adversarially occluded samples for person re-identification
- et al.
Pedestrian alignment network for largescale person re-identification
IEEE Transac. Circu. Syst. Video Technol.
(2018) - et al.
Occluded person re-identification