Full length articleUnsupervised person re-identification by Intra–Inter Camera Affinity Domain Adaptation☆
Introduction
Person re-identification (re-ID) aims at retrieving specified person images across multiple cameras located in different directions. Re-ID is widely used in intelligent video surveillance criminal, city management, and criminal tracing, etc. It is a challenging task in computer vision since significant variations in body pose, view angle, and illumination across different views can lead to the appearance of the same person in two images look highly different and vice versa. Thanks to the successful application of deep learning technology, impressive improvements have been accomplished in person re-identification under the supervised framework [1], [2], [3], [4], [5], [6]. Nevertheless, when testing on unseen datasets, they usually suffer from dramatic performance degradation because of domain discrepancy caused by the variations of background, illumination, camera disposition, etc. Gathering and labeling sufficient training person images for a specific scene is required, whereas laborious and costly.
To address this problem, increasing efforts are being made on unsupervised domain adaptation (UDA) based person re-ID, which targets to adapt the unlabeled target dataset by exploiting information from a labeled source dataset. These UDA person re-ID methods focus on two aspects: (a) applying the domain adaptation technique to reduce the source and target distribution inconsistency [7], [8], [9], [10], [11], (b) assigning the same pseudo-labels on similar target instances for training via clustering, k-NN search, etc. [12], [13], [14]. The former aims to alleviate domain inconsistency between source and target, and the latter tries to make use of the potential relations among target instances and retrain the model with assigned pseudo-labels. Specifically, the latter, i.e., the pseudo-label-based branch, dominates the state-of-the-art methods. They usually consist of two steps: (1) obtaining a primary discriminative model by supervised training with annotated source data; (2) training on the target domain by pseudo-label prediction and fine-tune iteratively.
Despite their promising performance for person re-ID, the clustering-based branch is mainly limited by the clustering accuracy, which cannot always be guaranteed. Therefore, the assigned pseudo-labels for target samples can be noisy, and there is no doubt that training the model with them will result in performance degradation. The main reason is that each identity can be recorded by multi-cameras with varied parameters and environments, which may significantly change the appearance of the identity. In other words, the distribution gaps among cameras make it difficult to identify samples of the same identity, as well as to optimize intra-class feature similarity. Thus, not merely the domain gap between source and target domains but also the distribution gaps between target cameras need to be alleviated for the UDA person re-ID task. However, existing UDA person re-ID works cannot solve these two problems simultaneously, thus their performance is limited in cross-domain re-ID.
In this study, we try to resolve these problems by developing a novel unsupervised domain adaptation model named Intra–Inter Camera Affinity Domain Adaptation (ICADA). To better remove noisy pseudo-labels and bridge domain gaps, we decompose ICADA into two modules: generative adversarial learning (GAL) module and affinity transfer learning (ATL) module.
Specifically, the GAL module firstly trains a feature extractor with source labeled data, which contains rich discriminative information constrained by cross-entropy loss and hard-batch triplet loss. In this case, the source knowledge is fully exploited, and a basic discriminative model is trained. In order to map the target data to the same space as the source domain and alleviate the global domain inconsistency, we train the feature extractor by adversarial learning [15]. Unlike [9], [16], [17], they utilize a generative adversarial network (GAN) to train one or multiple independent discriminator-generator pairs to generate new images, discard the discriminator-generator pairs, and train the feature extractor with the generated images. Instead, we share the weights of feature extractor between source and target images via generative adversarial learning at the feature level. Compared with the GAN-based method at the image level, applying adversarial learning at the feature level saves computing resources on the one hand and avoids the severe noise produced by the image-level adversarial learning on the other hand. In the ATL module, we utilize the trained feature extractor to calculate the target pseudo-labels, which can be inferred to the confidence score according to the similarity metrics. Moreover, we design intra-camera and inter-camera affinity matrices to remove noisy pseudo-labels based on the ranked confidence scores. We optimize the network by the constructed intra-camera and inter-camera losses using the corresponding matrices. Benefit from our proposed GAL and ATL modules, ICADA can mitigate the domain discrepancy between source and target, and remove the noisy pseudo-labels by mitigating the distribution gaps among target cameras.
We test the proposed model on commonly used Market1501 [18] and DukeMTMC-ReID [19] datasets. Extensive experiments show that each component in our approach is valid in boosting the re-ID performance. The entire model consisting of the three modules exhibits the best performance.
Shortly, we elaborate the novelties of our ICADA as below:
- •
We propose a novel UDA model named ICADA for person re-ID. It simultaneously considers both the domain discrepancy between source and target domains and distribution gaps among target cameras by the generative adversarial learning (GAL) and the affinity transfer learning (ATL) modules.
- •
We design a generative adversarial learning (GAL) module to bridge the domain gap between source and target datasets at the feature level, which is computational resource-saving and significant for noise reduction compared to the GAN-based methods. We utilize the generator to produce domain invariant feature representations for source and target person images and involve a discriminator to strengthen the feature learning capability of the generator. Besides, GAL employs cross-entropy loss and triplet loss to make full use of labeled source images.
- •
We develop an affinity transfer learning (ATL) module to exploit inherent similarities among cross-camera person images in the target domain, which constructs the intra-camera and inter-camera affinity matrices to filter out the noisy pseudo-labels and develop the corresponding losses.
- •
We implement ablation studies and prove the effectiveness of ICADA quantitatively on three commonly used datasets, and the results reveal the excellent performance on UDA-based person re-ID, surpassing recent advanced models.
Section snippets
Related work
Unsupervised domain adaptation. This work is relevant to unsupervised domain adaptation (UDA), which is a transfer learning branch that aims to learn a target model with labeled source data and unlabeled target data provided. In the literature, most unsupervised domain adaptation approaches mitigate the distribution discrepancy between domains. Several methods achieve by reducing the Maximum Mean Discrepancy (MMD) [20] between domains [21], [22], [23], [24]. Yan et al. [24] integrate
Method
This paper aims to integrate unsupervised domain adaptation (UDA) technique into person re-identification (re-ID) task. In cross-domain person re-identification, labeled person images from source domain and unlabeled person images from target domain are defined in this work, where / denotes source/target image and is corresponding label to .
Generally, the images from different datasets lie in different distributions, and
Data and evaluation metrics
This work evaluates the ICADA on three popular person re-ID datasets, including Market1501 [18], DukeMTMC-ReID [19], and MSMT17 [9].
Market1501 contains 32,668 pedestrian images, captured from 1501 identities in six camera views. This dataset is partitioned into 12,936 images of 751 persons for training, and the left 19,732 images of 750 persons for testing.
DukeMTMC-ReID is collected from 8 non-overlapping camera views, composing of 16,522 pedestrian images of 702 identities (training data),
Conclusion
This paper proposes an intra–inter camera affinity matrix based unsupervised domain adaptation (UDA) framework for person re-ID task. The generative adversarial learning fully exploits labeled source data and bridge domain gap between source and target domains. The affinity transfer learning module leverages the extracted target features to perform clustering using the similarity metric and filter out noisy pseudo-labels by ranked computed confidence scores based on the trained feature
Uncited references
[64]
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research has been financed by the National Natural Science Foundation of China Error analysis and control of semi-algebraic model detection method (61772006), the Science and Technology Major Project of Guangxi, China Research and Application Demonstration of Key Technologies for Intelligent Ship Networking in Beibu Gulf (AA17204096), the Key Research and Development Project of Guangxi DPA-proof full asynchronous RSA security crypto chip: design methods, tools and prototypes (AB17129012),
References (64)
- et al.
Unsupervised domain adaptive re-identification: Theory and practice
Pattern Recognit.
(2020) - G. Wang, Y. Yuan, X. Chen, J. Li, X. Zhou, Learning discriminative features with multiple granularities for person...
- Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, Beyond part models: Person retrieval with refined part pooling (and a...
- et al.
Alignedreid: Surpassing human-level performance in person re-identification
(2017) - Z. Zhang, C. Lan, W. Zeng, X. Jin, Z. Chen, Relation-aware global attention for person re-identification, in:...
- et al.
A strong baseline and batch normalization neck for deep person re-identification
IEEE Trans. Multimed.
(2019) - G. Chen, C. Lin, L. Ren, J. Lu, J. Zhou, Self-critical attention learning for person re-identification, in: Proceedings...
- A. Wu, W.-S. Zheng, J.-H. Lai, Unsupervised person re-identification by camera-aware similarity consistency learning,...
- et al.
Multi-task mid-level feature alignment network for unsupervised cross-dataset person re-identification
(2018) - L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap for person re-identification, in:...
Unsupervised multi-source domain adaptation for person re-identification
Performance measures and a data set for multi-target, multi-camera tracking
A kernel method for the two-sample problem
Unsupervised domain adaptation with residual transfer networks
Deep domain confusion: Maximizing for domain invariance
Learning transferable features with deep adaptation networks
Generative adversarial networks
Unsupervised domain adaptation by backpropagation
Joint disentangling and adaptation for cross-domain person re-identification
Deep credible metric learning for unsupervised domain adaptation person re-identification
Self-paced contrastive learning with hybrid memory for domain adaptive object re-id
Cited by (0)
- ☆
This paper has been recommended for acceptance by Zicheng Liu.