Pose-guided part matching network via shrinking and reweighting for occluded person re-identification

https://doi.org/10.1016/j.imavis.2021.104186Get rights and content

Abstract

Occluded person re-identification (ReID) is a challenging task, which aims at retrieving an occluded person across multiple non-overlapping cameras. To address this issue, we propose a novel framework named Shrinking and Reweighting Network (SRNet) that jointly learns global features by shrinking and reweights part features for matching in an end-to-end framework. Specifically, we use a strong backbone that combines some effective designs and training tricks to learn the robust and discriminative global features. Even so, there exist noise-related features due to the occlusion, so we utilize the Deep Residual Shrinkage Module (DRS Module) to eliminate unimportant features by automatically determining the soft thresholds. When aligning two groups of part features from two images, we view it as a graph matching problem and design an effectively Reweight Module for Part Matching (RMPM) to learn self-adaptive weights for part features before the part matching stage, the proposed RMPM can alleviate the influence of meaningless part features in the part matching stage. Eventually, extensive experimental results on occluded, partial, and holistic re-id datasets clearly demonstrate that the proposed method achieves competitive performance to the state-of-the-art methods. Specifically, our framework remarkably outperforms state-of-the-art by 8.9% mAP scores on Occluded-Duke dataset. Code is available at https://github.com/chenxiangzZ/SRNet.

Introduction

Person re-identification (ReID) [1], [2], [3] targets on retrieving a pedestrian of interest from non-overlapping camera views. Combining with pedestrian detection and tracking, ReID is widely used in video surveillance, security and smart city.

With the explosive development of deep learning, the performance of ReID has been significantly improved in recent years, however, the previous works [4], [5], [6], [7], [8] did not work well when dealing with occluded datasets, due to the assumption that the entire body of the pedestrian is available. In real-world scenarios, the degree of occlusion is more serious, because pedestrians are more likely to be occluded by static obstacles like trees, cars, and roadblocks, etc. Even in a crowded crowd, most of the bodies of the target pedestrian will be occluded by other pedestrians. Thus, it is essential to seek an effective method to solve the matching problem of person images with occlusion, which is known as the occluded person Re-ID problem [9], [10], [11], [12]. Occluded ReID is more challenging due to the following reasons: ① Part-based features have been proved to be efficient [4], but there exists the problem of misalignment in part-to-part matching, which will be more serious in seriously occluded situations. ② Occlusion will not only reduce the discriminative information of the image but also introduce extra noise. ③ If the part area suffers from occlusion or outliers, etc., hard one-to-one part matching is meaningless and not very robust.

In recent years, some works [11], [12], [13] combined with pose estimation can overcome the ①, which can divide person images to avoid the misalignment by locating the semantic parts of the human body. Specifically, we propose a novel framework named SRNet to solve the ② and ③. In Fig. 1(a), We can see that there are severely occluded scenes in both two datasets, hence the network that extracts global features should focus on the more accurate non-occluded pedestrian areas, because paying attention to the occluded areas will introduce additional noise, which is ②. So we first adopt a strong backbone that introduces multiple tricks to improve the ability to extract global features for occluded Re-ID, as shown in Fig. 1(b, c), the effect of the strong backbone is better than ResNet50 [14]. However, the network will inevitably pay attention to areas besides pedestrian, to alleviate this problem, we further improve the performance by adding the DRS Module [15] to the strong backbone. So as to eliminate the unimportant feature, we set the unimportant features below the soft threshold to 0 by learning the soft thresholds of different channels. The result of the strong backbone with the DRS Module is illustrated in Fig. 1(d), and our improved network focuses on non-occluded pedestrian areas more accurately and concentratedly, and can learn more robust and discriminative features. As for ③, it can be seen from Fig. 1(a) that most of the key-points of pedestrians are occluded. In order to weaken the influence of occluded part features in the part matching stage, we designed a Reweight Module for Part Matching (RMPM) to learn self-adaptive weights for part features, the part features that the network focuses on can get larger weights. We do not use the key-points confidence to determine whether the part is occluded or not, because the confidence is completely dependent on the result of the pose estimation model, and it only represents the probability of the key-point, and cannot be equal to the degree that the network should focus on. RMPM make the part matching stage more robust and reduce the influence of the meaningless part features in the point-to-point corresponding graph matching.

In conclusion, the main contributions of this paper are summarized as follows:

  • (1)

    We propose a novel network SRNet that can learn more robust and highly discriminative global features, and obtain soft thresholds through DRS Module to eliminate the noise-related features.

  • (2)

    We design an effective Reweight Module for Part Matching (RMPM). By assigning adaptive weights to each part feature, meaningless part features (occluded, outlier) are weakened, which is beneficial to part matching and learn robust alignment.

  • (3)

    Extensive experimental results on occluded and partial re-id datasets clearly demonstrate that the proposed method achieves remarkable performance to the state-of-the-art methods. In addition, our method has also achieved competitive performance on holistic datasets.

Section snippets

Person re-identification

Most existing ReID works [4], [5], [16], [17] based on deep learning can automatically learn complex features, whose performance is higher than hand-crafted descriptors [18], [19], [20]. These works can be grouped into global feature learning and local feature learning according to the area where the features are extracted, global feature learning extracts a global feature vector from each person image, however, the using of global features alone cannot meet the performance requirements due to

The proposed method

In this section, we demonstrate the pipeline of our SRNet in Fig. 2, this framework includes two parts: the shrinking global features and a reweight module for part matching. In Section 3.1, we introduce the extraction of global features through shrinking, at first, we introduce the strong backbone which combines some effective designs and training tricks to learn the robust and discriminative global features, and then introduce the DRS Module, so as to eliminate unimportant features by

Train and inference

In summary, we use four loss functions which includes the backbone for more robust and highly discriminative global features, part for maintaining the discriminative power of each part feature, and the verification loss Ver and the permutation cross-entropy loss pce during the training stage. The overall objective function of our framework is formulated in Eq. (14), where λbackbone, λVer, λpce are weights of corresponding terms=part+λbackbonebackbone+λVerVer+λpcepce

The whole training

Datasets

To demonstrate the efficacy of our method on the occlusion problem, we evaluate our proposed SRNet on two occluded datasets: Occluded-Duke [11], Occluded-ReID [9] and one partial dataset: Partial-REID [25], we also evaluate our method on two holistic datasets: Market-1501 [34], DukeMTMC-reID [35].

  • 1)

    Occluded-Duke [11] is modified from DukeMTMC-reID [35] dataset, it contains 15,618 training images, 17,661 gallery images, and 2210 occluded query images. Occluded-ReID is captured by the mobile

Conclusion

In this paper, we propose a novel framework for occluded ReID. For extracting more robust and discriminative global features, we use a strong backbone that combines some effective designs and training tricks. In order to make the area that the network pays attention to more precise and concentrated, we utilize the Deep Residual Shrinkage Module (DRS Module) to eliminate unimportant features. In order to weaken the influence of meaningless part features (occluded, outlier) in the part matching

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (44)

  • B. Ma et al.

    Covariance descriptor based on bio-inspired features for person re-identification and face verification

    Image Vis. Comput.

    (2014)
  • S. Gong et al.

    Person re-identification

  • L. Zheng et al.

    Person re-identification: Past, present and future

    (2016)
  • M. Ye et al.

    Deep learning for person re-identification: a survey and outlook

    IEEE Trans. Pattern Anal. Mach. Intell. PP

    (2021)
  • Y. Sun et al.

    Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)

  • G. Wang et al.

    Learning discriminative features with multiple granularities for person re-identification

  • W. Li et al.

    Harmonious attention network for person reidentification

  • H. Huang et al.

    Adversarially occluded samples for person re-identification

  • Z. Zheng et al.

    Pedestrian alignment network for largescale person re-identification

    IEEE Transac. Circu. Syst. Video Technol.

    (2018)
  • J. Zhuo et al.

    Occluded person re-identification

  • L. He et al.

    Foregroundaware pyramid reconstruction for alignment-free occluded person reidentification

  • J. Miao et al.

    Pose-guided feature alignment for occluded person re-identification

  • G. Wang et al.

    High-order information matters: Learning relation and topology for occluded person re-identification

  • S. Gao et al.

    Pose-guided visible part matching for occluded person reid

  • K. He et al.

    Deep residual learning for image recognition

  • M. Zhao et al.

    Deep residual shrinkage networks for fault diagnosis

    IEEE Transac. Indus.Inform.

    (2019)
  • A. Hermans et al.

    In defense of the triplet loss for person re-identification

    (2017)
  • G.-A. Wang et al.

    Cross-modality paired-images generation for rgb-infrared person re-identification

  • Y. Yang et al.

    Salient color names for person re-identification

  • S. Liao et al.

    Person re-identification by local maximal occurrence representation and metric learning

  • X. Zhang et al.

    Alignedreid: Surpassing human-level performance in person re-identification

    (2017)
  • H. Zhao et al.

    Spindle net: Person re-identification with human body region guided feature decomposition and fusion

  • Cited by (0)

    View full text