Elsevier

Pattern Recognition

Volume 100, April 2020, 107120
Pattern Recognition

Dynamic imposter based online instance matching for person search

https://doi.org/10.1016/j.patcog.2019.107120Get rights and content

Abstract

Person search aims to locate the target person matching a given query from a list of unconstrained whole images. It is a challenging task due to the unavailable bounding boxes of pedestrians, limited samples for each labeled identity and large amount of unlabeled persons in existing datasets. To address these issues, we propose a novel end-to-end learning framework for person search. The proposed framework settles pedestrian detection and person re-identification concurrently. To achieve the goal of co-learning and utilize the information of unlabeled persons, a novel yet extremely efficient Dynamic Imposter based Online Instance Matching (DI-OIM) loss is formulated. The DI-OIM loss is inspired by the observation that pedestrians appearing in the same image obviously have different identities. Thus we assign the unlabeled persons with dynamic pseudo-labels. The pseudo-labeled persons along with the labeled persons can be used to learn powerful feature representations. Experiments on CUHK-SYSU and PRW datasets demonstrate that our method outperforms other state-of-the-art algorithms. Moreover, it is superior and efficient in terms of memory capacity comparing with existing methods.

Introduction

Person re-identification is the task of searching person-of-interest across non-overlapping camera views[1]. It has attracted growing research interests for its great value of applications in criminal spotting [2], multi-pedestrian tracking [3] and intelligent security [4]. Numerous endeavors on person re-identification have been made over recent decades [5], [6]. However, it is still far from applying current person re-identification techniques into practical intelligent monitoring systems. One of the key reasons is that typical re-identification systems assume that the person images must be well cropped and aligned from the scene images. While in real-world applications, we usually need to find a target person from the whole images or video frames without available pedestrian boxes.

Person search is a new valuable topic that bridges the gap between person re-identification and the real-world applications [7], [8]. We illustrate the difference between person search and conventional re-identification in Fig. 1. The new task requires a close cooperation between the detector and the identifier. Recently, great efforts have been poured into person search. The technique roots can be coarsely divided into two categories: detection-free methods and detection-based methods. The detection-free methods attempt to recursively shrink the focus area till achieving the precise localization of the target [9], [10]. However, it is computationally prohibitive with the increasing of the gallery size. For the detection-based methods, the most common way is to divide the problem into pedestrian detection and person re-identification tasks [8], [11]. However, the two tasks are highly correlated. Firstly, the feature information can be shared to avoid accumulative error, and save heavy time cost for images of crowds. Secondly, detection and re-identification can complement each other. The qualities of detections largely determine the accuracy of recognition, while the results of recognition provide feedback to refine the locations of detections. Therefore, it will be beneficial to co-learn the pedestrian detection and person re-identification simultaneously.

Despite the considerable progress achieved in recent years, it is still a challenging problem to learn powerful features for person matching. The main reason is that the training samples for each identity are considerably small, and a large amount of unlabeled identities are existed in person search datasets. It is tough to learn discriminative person representations with many classes and little class-specific samples. Therefore, some approaches attempt to exploit the information of unlabeled pedestrians to reinforce the representation power. For example, the Online Instance Matching (OIM) loss [7] treats all the unlabeled persons as a negative class. It forces a labeled person to keep away from the different labeled identities stored in a lookup table, and the unlabeled persons maintained in a circular queue. Nevertheless, the unlabeled persons do not participate in the training process. To solve this problem, the Instance Enhancing Loss (IEL) [12] is proposed to integrate unlabeled persons into the feature learning process. It selectively annotates unlabeled new persons to the labeled identities that they are most similar to. However, the selected unlabeled persons are actually hard negative samples. To learn discriminative representations, those hard negative samples should keep away from the corresponding labeled identities.

To address the above issues, in this paper we propose an novel end-to-end person search framework which integrates both pedestrian detection and person identification to improve the overall accuracy and reduce computations. To make better use of the unlabeled persons, a novel Dynamic Imposter based Online Instance Matching (DI-OIM) loss is proposed. The proposed loss is inspired by the observation that pedestrians appearing in the same image obviously have different identities. Thus, we assign unlabeled persons with dynamic pseudo-labels. The representations of pseudo-labeled persons are defined as imposters, since they do not belong to any of the labeled identities. The features of all the labeled persons are stored in a lookup table. The imposters along with the lookup table are used to optimize the proposed framework. All the different persons are forced to keep away from each other. With the proposed DI-OIM loss, our end-to-end model demonstrates a good efficiency and effectiveness.

In summary, our main contributions are three-folds:

  • An end-to-end trainable learning framework is proposed for person search. The framework integrates pedestrian detection and person re-identification in a unified framework. By co-learning the two tasks, the learned features are more informative.

  • A novel DI-OIM loss is proposed to exploit the information of the unlabeled pedestrians. The proposed loss can not only distinguish labeled pedestrians from different identities, but also make the unlabeled pedestrians far from each other.

  • By unifying the detection and the re-identification tasks, the proposed model achieves state-of-the-art performances on the CUHK-SYSU [7] and PRW datasets [8].

Section snippets

Pedestrian detection

Pedestrian detection aims to localize pedestrians in images and generate bounding boxes for persons. In person search systems, pedestrian detection plays an important role. A large number of efforts have been made to automatically detect pedestrians in natural scenes. Traditional methods are mainly based on handcrafted features and linear classifiers, e.g. Aggregated Channel Features (ACF) [13] and Locally Decorrelated Channel Features (LDCF) [14]. Recently, Convolutional Neural Networks (CNNs)

The proposed approach

In this section, we firstly describe the overall architecture of our framework. Then we briefly explain the OIM loss and the IEL. After that, we elaborate the proposed DI-OIM loss and describe the inference process.

Experiments

In this section, we thoroughly evaluate our method on two public person search datasets. We first briefly introduce the datasets, the evaluation protocols and the implementation details. Secondly, we analyze the proposed loss and make comparisons with other related losses. To validate the effectiveness of our method, we then make extensive comparisons with state-of-the-art algorithms. At last, we conduct further analysis and discussions.

Conclusion

In this work, we focus on the problem of unconstrained person search, where pedestrian bounding boxes are unavailable. We propose an end-to-end framework to simultaneously consider pedestrian detection and person re-identification. Since many unlabeled pedestrians exist in person search datasets, a novel DI-OIM loss is proposed to exploit the information of unlabeled persons. Inspired by the observation that pedestrians within the same image obviously have different identities, we assign

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China (NSFC), Nos. 61725202, 61751212 and 61771088.

References (39)

  • L. Zhang et al.

    Learning a discriminative null space for person re-identification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • T. Xiao et al.

    Joint detection and identification feature learning for person search

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • L. Zheng et al.

    Person re-identification in the wild

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • H. Liu et al.

    Neural person search machines

    Proceedings of IEEE International Conference on Computer Vision

    (2017)
  • X. Chang et al.

    RCAA: relational context-aware agents for person search

    Proceedings of European Conference on Computer Vision

    (2018)
  • D. Chen et al.

    Person search via a mask-guided two-stream CNN model

    Proceedings of European Conference on Computer Vision

    (2018)
  • W. Shi et al.

    Instance enhancing loss: deep identity-sensitive feature embedding for person search

    Proceedings of IEEE International Conference on Image Processing

    (2018)
  • D. Piotr et al.

    Fast feature pyramids for object detection

    (2014)
  • W. Nam et al.

    Local decorrelation for improved pedestrian detection

    Advances in Neural Information Processing Systems

    (2014)
  • Cited by (21)

    View all citing articles on Scopus
    View full text