Unsupervised person re-identification by Intra–Inter Camera Affinity Domain Adaptation

doi:10.1016/j.jvcir.2021.103310

Journal of Visual Communication and Image Representation

Volume 80, October 2021, 103310

https://doi.org/10.1016/j.jvcir.2021.103310 Get rights and content

Highlights

•
We develop an Intra–Inter Camera Affinity Domain Adaptation method for person re-id.
•
Generative adversarial learning relieves the distribution gap between two domains.
•
Affinity transfer learning reduces the distribution inconsistency among camera views.

Abstract

Person re-identification (re-ID) based on unsupervised domain adaptation intends to distill knowledge from annotated source dataset to identify target persons in another dataset. Although the advanced UDA re-ID models are dominated by pseudo-label methods, they almost transform images from various camera views into the same feature space, without considering the camera distribution gaps, which may lead to generate considerably noisy pseudo-labels. In this study, we develop an Intra–Inter Camera Affinity Domain Adaptation (I $^{2}$ CADA) to tackle these problems for UDA person re-ID. Precisely, I $^{2}$ CADA framework is composed of two modules. The first one is generative adversarial learning module, aiming to train a feature extractor that can map target data to source feature space by supervised learning and adversarial learning, which can relieve the distribution gap between different datasets (domains). The second one is affinity transfer learning module, which simultaneously considers intra-camera clustering and inter-camera separation among persons with similar appearances in the target domain, thus mitigating the distribution inconsistency among person images collected from multiple target camera views. Besides, comprehensive experiments exhibit that I $^{2}$ CADA outperforms the existing UDA person re-identification approaches.

Introduction

Person re-identification (re-ID) aims at retrieving specified person images across multiple cameras located in different directions. Re-ID is widely used in intelligent video surveillance criminal, city management, and criminal tracing, etc. It is a challenging task in computer vision since significant variations in body pose, view angle, and illumination across different views can lead to the appearance of the same person in two images look highly different and vice versa. Thanks to the successful application of deep learning technology, impressive improvements have been accomplished in person re-identification under the supervised framework [1], [2], [3], [4], [5], [6]. Nevertheless, when testing on unseen datasets, they usually suffer from dramatic performance degradation because of domain discrepancy caused by the variations of background, illumination, camera disposition, etc. Gathering and labeling sufficient training person images for a specific scene is required, whereas laborious and costly.

To address this problem, increasing efforts are being made on unsupervised domain adaptation (UDA) based person re-ID, which targets to adapt the unlabeled target dataset by exploiting information from a labeled source dataset. These UDA person re-ID methods focus on two aspects: (a) applying the domain adaptation technique to reduce the source and target distribution inconsistency [7], [8], [9], [10], [11], (b) assigning the same pseudo-labels on similar target instances for training via clustering, k-NN search, etc. [12], [13], [14]. The former aims to alleviate domain inconsistency between source and target, and the latter tries to make use of the potential relations among target instances and retrain the model with assigned pseudo-labels. Specifically, the latter, i.e., the pseudo-label-based branch, dominates the state-of-the-art methods. They usually consist of two steps: (1) obtaining a primary discriminative model by supervised training with annotated source data; (2) training on the target domain by pseudo-label prediction and fine-tune iteratively.

Despite their promising performance for person re-ID, the clustering-based branch is mainly limited by the clustering accuracy, which cannot always be guaranteed. Therefore, the assigned pseudo-labels for target samples can be noisy, and there is no doubt that training the model with them will result in performance degradation. The main reason is that each identity can be recorded by multi-cameras with varied parameters and environments, which may significantly change the appearance of the identity. In other words, the distribution gaps among cameras make it difficult to identify samples of the same identity, as well as to optimize intra-class feature similarity. Thus, not merely the domain gap between source and target domains but also the distribution gaps between target cameras need to be alleviated for the UDA person re-ID task. However, existing UDA person re-ID works cannot solve these two problems simultaneously, thus their performance is limited in cross-domain re-ID.

In this study, we try to resolve these problems by developing a novel unsupervised domain adaptation model named Intra–Inter Camera Affinity Domain Adaptation (I $^{2}$ CADA). To better remove noisy pseudo-labels and bridge domain gaps, we decompose I $^{2}$ CADA into two modules: generative adversarial learning (GAL) module and affinity transfer learning (ATL) module.

Specifically, the GAL module firstly trains a feature extractor with source labeled data, which contains rich discriminative information constrained by cross-entropy loss and hard-batch triplet loss. In this case, the source knowledge is fully exploited, and a basic discriminative model is trained. In order to map the target data to the same space as the source domain and alleviate the global domain inconsistency, we train the feature extractor by adversarial learning [15]. Unlike [9], [16], [17], they utilize a generative adversarial network (GAN) to train one or multiple independent discriminator-generator pairs to generate new images, discard the discriminator-generator pairs, and train the feature extractor with the generated images. Instead, we share the weights of feature extractor between source and target images via generative adversarial learning at the feature level. Compared with the GAN-based method at the image level, applying adversarial learning at the feature level saves computing resources on the one hand and avoids the severe noise produced by the image-level adversarial learning on the other hand. In the ATL module, we utilize the trained feature extractor to calculate the target pseudo-labels, which can be inferred to the confidence score according to the similarity metrics. Moreover, we design intra-camera and inter-camera affinity matrices to remove noisy pseudo-labels based on the ranked confidence scores. We optimize the network by the constructed intra-camera and inter-camera losses using the corresponding matrices. Benefit from our proposed GAL and ATL modules, I $^{2}$ CADA can mitigate the domain discrepancy between source and target, and remove the noisy pseudo-labels by mitigating the distribution gaps among target cameras.

We test the proposed model on commonly used Market1501 [18] and DukeMTMC-ReID [19] datasets. Extensive experiments show that each component in our approach is valid in boosting the re-ID performance. The entire model consisting of the three modules exhibits the best performance.

Shortly, we elaborate the novelties of our I $^{2}$ CADA as below:

•
We propose a novel UDA model named I $^{2}$ CADA for person re-ID. It simultaneously considers both the domain discrepancy between source and target domains and distribution gaps among target cameras by the generative adversarial learning (GAL) and the affinity transfer learning (ATL) modules.
•
We design a generative adversarial learning (GAL) module to bridge the domain gap between source and target datasets at the feature level, which is computational resource-saving and significant for noise reduction compared to the GAN-based methods. We utilize the generator to produce domain invariant feature representations for source and target person images and involve a discriminator to strengthen the feature learning capability of the generator. Besides, GAL employs cross-entropy loss and triplet loss to make full use of labeled source images.
•
We develop an affinity transfer learning (ATL) module to exploit inherent similarities among cross-camera person images in the target domain, which constructs the intra-camera and inter-camera affinity matrices to filter out the noisy pseudo-labels and develop the corresponding losses.
•
We implement ablation studies and prove the effectiveness of I $^{2}$ CADA quantitatively on three commonly used datasets, and the results reveal the excellent performance on UDA-based person re-ID, surpassing recent advanced models.

Section snippets

Related work

Unsupervised domain adaptation. This work is relevant to unsupervised domain adaptation (UDA), which is a transfer learning branch that aims to learn a target model with labeled source data and unlabeled target data provided. In the literature, most unsupervised domain adaptation approaches mitigate the distribution discrepancy between domains. Several methods achieve by reducing the Maximum Mean Discrepancy (MMD) [20] between domains [21], [22], [23], [24]. Yan et al. [24] integrate

Method

This paper aims to integrate unsupervised domain adaptation (UDA) technique into person re-identification (re-ID) task. In cross-domain person re-identification, $N_{s}$ labeled person images ${x_{i}^{s}, y_{i}^{s}}_{i = 1}^{N_{s}}$ from source domain ${X_{s}, Y_{s}}$ and $N_{t}$ unlabeled person images ${x_{i}^{t}}_{i = 1}^{N_{t}}$ from target domain ${X_{t}}$ are defined in this work, where $x_{i}^{s} \in X_{s}$ / $x_{i}^{t} \in X_{t}$ denotes source/target image and $y_{i}^{s} \in Y_{s}$ is corresponding label to $x_{i}^{s}$ .

Generally, the images from different datasets lie in different distributions, and

Data and evaluation metrics

This work evaluates the I $^{2}$ CADA on three popular person re-ID datasets, including Market1501 [18], DukeMTMC-ReID [19], and MSMT17 [9].

Market1501 contains 32,668 pedestrian images, captured from 1501 identities in six camera views. This dataset is partitioned into 12,936 images of 751 persons for training, and the left 19,732 images of 750 persons for testing.

DukeMTMC-ReID is collected from 8 non-overlapping camera views, composing of 16,522 pedestrian images of 702 identities (training data),

Conclusion

This paper proposes an intra–inter camera affinity matrix based unsupervised domain adaptation (UDA) framework for person re-ID task. The generative adversarial learning fully exploits labeled source data and bridge domain gap between source and target domains. The affinity transfer learning module leverages the extracted target features to perform clustering using the similarity metric and filter out noisy pseudo-labels by ranked computed confidence scores based on the trained feature

Uncited references

[64]

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research has been financed by the National Natural Science Foundation of China Error analysis and control of semi-algebraic model detection method (61772006), the Science and Technology Major Project of Guangxi, China Research and Application Demonstration of Key Technologies for Intelligent Ship Networking in Beibu Gulf (AA17204096), the Key Research and Development Project of Guangxi DPA-proof full asynchronous RSA security crypto chip: design methods, tools and prototypes (AB17129012),

References (64)

SongL. et al.
Unsupervised domain adaptive re-identification: Theory and practice
Pattern Recognit.
(2020)
G. Wang, Y. Yuan, X. Chen, J. Li, X. Zhou, Learning discriminative features with multiple granularities for person...
Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, Beyond part models: Person retrieval with refined part pooling (and a...
ZhangX. et al.
Alignedreid: Surpassing human-level performance in person re-identification
(2017)
Z. Zhang, C. Lan, W. Zeng, X. Jin, Z. Chen, Relation-aware global attention for person re-identification, in:...
LuoH. et al.
A strong baseline and batch normalization neck for deep person re-identification
IEEE Trans. Multimed.
(2019)
G. Chen, C. Lin, L. Ren, J. Lu, J. Zhou, Self-critical attention learning for person re-identification, in: Proceedings...
A. Wu, W.-S. Zheng, J.-H. Lai, Unsupervised person re-identification by camera-aware similarity consistency learning,...
LinS. et al.
Multi-task mid-level feature alignment network for unsupervised cross-dataset person re-identification
(2018)
L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap for person re-identification, in:...

X. Chang, Y. Yang, T. Xiang, T.M. Hospedales, Disjoint label space transfer learning with common factorised space, in:...

BaiZ. et al.

Unsupervised multi-source domain adaptation for person re-identification

(2021)

F. Yang, K. Li, Z. Zhong, Z. Luo, X. Sun, H. Cheng, X. Guo, F. Huang, R. Ji, S. Li, Asymmetric co-teaching for...

Z. Zhong, L. Zheng, Z. Luo, S. Li, Y. Yang, Invariance matters: Exemplar memory for domain adaptive person...

E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative domain adaptation, in: Proceedings of the IEEE...

W. Deng, L. Zheng, Q. Ye, G. Kang, Y. Yang, J. Jiao, Image-image domain adaptation with preserved self-similarity and...

Z. Zheng, L. Zheng, Y. Yang, Unlabeled samples generated by gan improve the person re-identification baseline in vitro,...

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scalable person re-identification: A benchmark, in: Proceedings...

RistaniE. et al.

Performance measures and a data set for multi-target, multi-camera tracking

GrettonA. et al.

A kernel method for the two-sample problem

(2008)

LongM. et al.

Unsupervised domain adaptation with residual transfer networks

(2016)

TzengE. et al.

Deep domain confusion: Maximizing for domain invariance

(2014)

LongM. et al.

Learning transferable features with deep adaptation networks

H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, W. Zuo, Mind the class weight bias: Weighted maximum mean discrepancy for...

GoodfellowI.J. et al.

Generative adversarial networks

(2014)

GaninY. et al.

Unsupervised domain adaptation by backpropagation

J. Zhang, Z. Ding, W. Li, P. Ogunbona, Importance weighted adversarial nets for partial domain adaptation, in:...

R. Li, Q. Jiao, W. Cao, H.-S. Wong, S. Wu, Model adaptation: Unsupervised domain adaptation without source data, in:...

X. Zhang, J. Cao, C. Shen, M. You, Self-training with progressive augmentation for unsupervised cross-domain person...

ZouY. et al.

Joint disentangling and adaptation for cross-domain person re-identification

(2020)

ChenG. et al.

Deep credible metric learning for unsupervised domain adaptation person re-identification

GeY. et al.

Self-paced contrastive learning with hybrid memory for domain adaptive object re-id