Neighbor similarity and soft-label adaptation for unsupervised cross-dataset person re-identification
Introduction
Person re-identification (Re-ID) plays an important role in surveillance video analysis because of a wide range of real-world applications, such as searching the target person, analyzing the trace of crowd flow etc.. Given a query pedestrian image, the task aims at matching the same pedestrian from multiple non-overlapping cameras. In spite of various forms of different re-id methods, it shares a common goal of learning an optimal visual representation from image space to feature space, which pulls images of the same identity close to each other while pushing those of different identities apart in the learned feature space.
Deep neural networks (DNNs) have shown prominent advantages in representation learning and have been proven highly effective in supervised person re-identification [1], [4], [7], [21], [32], [33], [41], [46], [47]. With the manually labeled identity for each image, the objective function based on similarity (e.g. pairwise [33], triplet [25] or quadruplet [3] loss) and classification (e.g. Softmax [39], [49] or OIM [40] loss) is applied to train a DNN model, which learns an optimal feature representation of person images. However, the manual annotations require expensive human labor, especially in long-period multi-camera scenarios. For each person appearing in one camera, it needs to traverse all other cameras to find out if the person appears again. The annotation cost limits the application and expansion of supervised re-id methods in the large-scale real-world scenarios.
Hence, unsupervised cross-dataset re-id algorithms have been raised in recent years. Given an annotated source dataset, this task aims at learning the discriminative feature representation on the target dataset without any label. This is a challenging problem due to the objective gap between source and target domains, including the view of cameras, quality of images, change of dressing style due to the different regions and seasons, etc.. A common practice to cross-datasets problem is unsupervised domain adaptation. but this method assumes that the source and target domains share the same set of classes. However, this assumption does not hold for person re-id because the source and target datasets usually contain entirely different identities. A few methods [23], [34] assume that the datasets share the same semantic attributes that can be learned from the source domain and transferred to the target domain. Another kind of methods [6], [36], [52] train an image-to-image translation model and generate images with identities on the target domain.
In this work, we follow the self-supervised methods [9], [20] and propose Neighbor Similarity and Soft-label Adaptation (NSSA), a simple yet effective algorithm for unsupervised cross-dataset person re-id problem. The illustration of this method is shown in Fig. 1. Firstly, a feature representation model is trained on the supervised source dataset, then applied on the target domain to extract the feature points on the dataset. Note that there is not any supervised label on the target dataset, the self-supervised mechanism relies on the initial feature distance to obtain similar pairs. Besides the commonly used Euclidean distance, we introduce fused distance metrics, including inner-domain neighbor similarity and inter-domain soft-label, to achieve a better similarity metric. Then we select the samples following the assumption that the feature pair with smaller distance ought to have higher probability of sharing the same identity label. The most credible samples, which are associated by the pseudo-id, are selected to fine-tune the model . These steps are performed iteratively to incrementally improve the discriminability of model .
It is worth noting that the fused distance metric in our method consists of three aspects: (1) The vanilla Euclidean distance of features, which is solely applied in previous self- supervised cross-dataset re-id methods [9], [20]. (2) The inner-domain neighbor similarity, which explores the topological relationship between the feature points and utilizes the priority that the group of features from the same identity should be close to each other and have similar neighbors. (3) The inter-domain soft-label, which spreads the identity label from source domain to target domain according to the cross-domain point-wise distance. The soft-label provides effective supervised information on the unlabeled target domain.
The unlabeled data samples are progressively taken into the training schedule. The initial model is suboptimal on target dataset at the beginning iteration, hence not all the feature points can be assigned with an accurate pseudo-id at the beginning. Therefore, the samples with higher confidence are sampled to fine-tune the model . With the improvement of the model at the training stage, more credible samples are selected into the training set.
The remainder of the paper is organized as follows: Related works are reviewed in Section 2. In Section 3, we discuss the detailed implementation of our proposed model. The experimental analysis and comparison with the state-of-the-art methods are presented in Section 4.
Section snippets
Related works
Related works of the proposed method can be summarized into three categories: unsupervised person re-id, semi-supervised learning, and curriculum learning. We will explain the connections and differences between NSSA and these methods in the corresponding aspects.
Overview
We will introduce the proposed Neighbor Similarity and Soft-label Adaptation algorithm in this section. The architecture of the method is illustrated in Fig. 2, which contains 4 steps:
- •
Step (1). Model pre-training by supervised learning on the labeled source dataset. In this step, a person feature embedding model is trained by classification loss and domain adaptation loss. However, it is suboptimal on the target dataset due to the domain gap, so it needs to be fine-tuned.
- •
Step (2). Feature
Datasets and settings
Datasets and evaluation protocol. We evaluate our proposed model on three person re-id datasets: Market-1501 [48], DukeMTMC [31] and MSMT17 [37]. Market-1501 contains 32,668 labeled images from 1501 people, which are captured by 6 cameras. The standard training/test split (750 / 751 ids) and single-query is adopted in our experiments. Duke-MTMC has 8 cameras and 1404 identities with 36,411 images. Half of the identities are used for training and another half are for testing. MSMT17 has 4101
Conclusion
In this paper, we propose a simple yet effective algorithm, NSSA for unsupervised cross-dataset person re-identification. We introduce the inner-domain neighbor similarity and the inter-domain soft-label adaptation to explore the topological relationship besides the vanilla Euclidean distance. The representation ability of the feature model improves by the iterative training process. Extensive experimental results on three real-world datasets demonstrate the advantage of the proposed model over
Declaration of Competing Interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Acknowledgements
This paper is partially supported by NSFC (No. 61772330, 61533012, 61876109), the pre-research project (No. 61403120201), Shanghai authentication key Lab. (2017XCWZK01), and Technology Committee the interdisciplinary Program of Shanghai Jiao Tong University (YG2019QNA09).
Yiru Zhao received the B.S. degree in computer science from Tongji University, China, in 2015. He is currently pursuing Ph.D. degree in Shanghai Jiao Tong University, China. His research interests include deep learning, image retrieval and machine learning.
References (56)
- et al.
Deep feature learning with relative distance comparison for person re-identification
Pattern Recognit.
(2015) - et al.
Enhancing person re-identification in a self-trained subspace
ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)
(2017) - et al.
An improved deep learning architecture for person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2015) - et al.
Curriculum learning
Proceedings of the 26th Annual International Conference on Machine Learning
(2009) - et al.
Beyond triplet loss: a deep quadruplet network for person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017) - et al.
Person re-identification by multi-channel parts-based CNN with improved triplet loss function
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016) - et al.
Custom pictorial structures for re-identification.
Proceedings of the BMVC
(2011) - et al.
Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person reidentification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018) - et al.
A density-based algorithm for discovering clusters in large spatial databases with noise.
Proceedings of the KDD
(1996) - et al.
Unsupervised person re-identification: clustering and fine-tuning
ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)
(2018)
Person re-identification by symmetry-driven accumulation of local features
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Generative adversarial nets
Advances in Neural Information Processing Systems
A kernel method for the two-sample-problem
Advances in Neural Information Processing Systems
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Self-paced learning with diversity
Advances in Neural Information Processing Systems
Person re-identification by unsupervised l1 graph learning
Proceedings of the European Conference on Computer Vision
Dictionary learning with iterative Laplacian regularisation for unsupervised person re-identification.
Proceedings of the BMVC
Imagenet classification with deep convolutional neural networks
Advances in Neural Information Processing Systems
Self-paced learning for latent variable models
Advances in Neural Information Processing Systems
Unsupervised person re-identification by deep learning Tracklet association
Proceedings of the European Conference on Computer Vision
DeepReID: Deep filter pairing neural network for person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Adaptation and re-identification network: an unsupervised deep transfer learning approach to person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Person re-identification by iterative re-weighted sparse ranking
IEEE Trans. Pattern Anal. Mach. Intell.
End-to-end comparative attention networks for person re-identification
IEEE Trans. Image Process.
Semi-supervised coupled dictionary learning for person re-identification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Learning transferable features with deep adaptation networks
Proceedings of the International Conference on Machine Learning
Cited by (9)
Generative Segment-pose Representation based Augmentation (GSRA) for unsupervised person re-identification
2023, Image and Vision ComputingUnsupervised visual feature learning based on similarity guidance
2022, NeurocomputingCitation Excerpt :Zhang et al. [26] proposed an entropy-based distance metric that quantifies the distance between categories by exploiting the information provided by different attributes that correlate with the target one. Zhao et al. [27] introduced a distance metric which incorporates inner-domain neighbor similarity. In [28,29], the sample pairs were measured taking into account the surrounding information, and the original ordering in the image retrieval is rearranged.
Cross-domain person re-identification with pose-invariant feature decomposition and hypergraph structure alignment
2022, NeurocomputingCitation Excerpt :These methods consider only the inter-domain variation between the source and target domains whereas the intra-domain (different camera views) variation of a single domain has been ignored, which is an important factor affecting Re-ID performance. The methods mine the underlying data distribution information of the target domain for model refinement [7–11]. These methods only take the model pre-trained on the source domain as the initial model for the feature learning in the target domain.
Deep manifold clustering based optimal pseudo pose representation (DMC-OPPR) for unsupervised person re-identification
2020, Image and Vision ComputingCitation Excerpt :Bottom-up clustering (BUC) [34] clusters the unlabeled images using diversity regularizer without considering external parameters which influences labeling. A domain adaptive method (NSSA) [35] is proposed based on the nearest neighbor approach. Conventional pose estimation methods matches the unlabeled poses to a pre-defined canonical pose based on keypoint detection which fails in a crowded surveillance environment.
Multi-information Constraint Learning for Unsupervised Domain Adaptive Person Re-identification
2023, Neural Processing Letters
Yiru Zhao received the B.S. degree in computer science from Tongji University, China, in 2015. He is currently pursuing Ph.D. degree in Shanghai Jiao Tong University, China. His research interests include deep learning, image retrieval and machine learning.
Hongtao Lu is now a Professor in the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. His current research interests include computer vision, deep learning and machine learning. He had authored or co-authored more than 100 papers in journals and premier conferences.