The devil in the tail: Cluster consolidation plus cluster adaptive balancing loss for unsupervised person re-identification
Introduction
Person re-identification (Re-ID) is to track or match pedestrians in multiple camera views and various locations under varying posture, different illumination and weather conditions, and from different view perspectives. It is very helpful to public security or criminal tracking. Existing person re-identification methods are mostly designed to use manually labeled images to train deep neural networks. Unfortunately, the generalization performance of the trained neural networks seriously degenerates when facing open-world data [1], [2], [3], [4]. Therefore, it is crucial to get rid of the limitations of heavy dependency on manually labeled data. This leads to the urgent need for unsupervised person Re-ID [2].
Unsupervised person Re-ID in recent years has attracted lots of research attention. The primal idea in unsupervised person Re-ID is to perform clustering algorithms to generate pseudo labels at first and then train the Convolution Neural Network (CNN) with the pseudo labels. For instance, hierarchical clustering is used in Zeng et al. [5], Lin et al. [6] to generate pseudo labels, and -means and DBSCAN [7] are jointly used in Ge et al. [3] to perform unsupervised domain adaptation. In practice, training CNN for learning effective features depends on the quality of the generated pseudo labels. Unfortunately, the generated pseudo labels are: a) contaminated with mistakes in larger clusters, and b) containing a number of tiny clusters or even singletons. The mistakes in pseudo labels provide wrong supervision information to train the network and thus lead to degenerated performance. On the other hand, the pseudo labels are heavily imbalanced due to the occurrence of larger clusters, tiny clusters and singletons imposing more challenges to train the network.
In previous works, the singletons are viewed as noisy samples and thus usually not used for training the network [4]. However, the singletons are also of rich potential information to train the network. Simply ignoring the singletons leads to a waste of data. There exist some attempts to regard each singleton as an extra independent class and jointly train the network with the data of all clusters [3], [8]. In these methods, however, the pseudo labels of whatever the larger clusters, tiny clusters, or even singletons, are used with equal importance. This is not a wise way to exploit the supervision information provided by the pseudo labels from clusters of different sizes.
In this paper, we focus on leveraging the supervision information provided by whatever the larger clusters, tiny clusters or even singletons to effectively train the network. The primary idea of our work consists of two components: a) to improve the quality of the pseudo labels at first; and then b) to formulate the task of training the network with imbalance pseudo labels as a long-tail learning problem.
Specifically, we propose a simple but effective approach to improve the quality of the pseudo labels, called Cluster Consolidation (CC), which properly excludes a small proportion of unreliable data points from clusters of larger size. By doing so, the mistakes in the pseudo labels from larger clusters can be reduced, but at the meantime more tiny clusters or even singletons (which are the unreliable data points excluded from larger clusters) are generated, resulting in a heavy imbalance distribution of the pseudo labels. To tackle the imbalance in pseudo labels, we propose a Cluster Adaptive Balancing (CAB) loss, which is able to automatically assign balancing weights to the pseudo labels with respect to cluster sizes and confidence. In this way, the supervision information from whatever larger clusters, tiny clusters or even singletons can be properly exploited to efficiently train the network for feature learning.
Paper contributions The contributions of the paper are highlighted as follows.
- 1.
We propose a simple yet effective approach, called Cluster Consolidation (CC), to reorganize the clustering result. The reorganization step can improve the compactness of larger clusters by pruning a proportion of unreliable samples into tiny clusters or singletons.
- 2.
We propose a Cluster Adaptive Balancing (CAB) loss to effectively train the network by automatically assigning proper weights to the imbalanced and noisy pseudo labels. In this way, the unsupervised person Re-ID task is formulated as a cluster adaptive long-tail learning problem.
- 3.
Extensive experiments on widely used benchmark datasets are conducted and demonstrate state-of-the-art performance. A set of ablation studies are also provided.
Section snippets
Person re-identification
Person Re-ID aims to match different pedestrian images from multiple camera views. Roughly, the existing approaches for person Re-ID can be divided into two groups: supervised person Re-ID [9], [10], [11], [12] and unsupervised person Re-ID [13], [14], [15]. Supervised person Re-ID is based on a large number of manually labeled data for training and has achieved remarkably high accuracy. However, supervised person Re-ID methods require a huge amount of labeled data, which are expensive, and the
Our proposal: CAB
This section will present our proposed Cluster Consolidation plus Cluster Adaptive Balancing (CAB) learning framework for unsupervised person Re-ID.
For clarity, we illustrate our proposed CAB framework in Fig. 2. Overall, the CAB framework trains a convolution neural network to learn effective feature representation equipped with the CAB loss, where the supervision information is provided by the pseudo labels from reorganized clustering. The key ingredients to improve the performance of
Experiments
To validate the effectiveness of our proposal, we conduct extensive experiments on two benchmark datasets and evaluate the effectiveness of each component with a set of ablation studies. We also visualize the cluster configuration and the varying distribution of the CAB loss during the training stage.
Conclusions
We have proposed a pseudo labels based unsupervised learning framework for person Re-ID, in which a cluster consolidation method is designed to improve the quality of the clustering result by excluding the unreliable samples in the clusters of larger size and a cluster adaptive balancing loss is proposed to formulate the pseudo label based unsupervised learning problem into a cluster-adaptive long-tail learning task, where the distribution of the pseudo labels changes with the clustering
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work is partially supported by the National Natural Science Foundation of China under Grant 61876022. The authors would like to thank the anonymous reviewers for their constructive comments to improve the quality of the work.
Ming-Kun Li is currently pursuing his Ph.D. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision and machine learning, especially in unsupervised learning for person re-identification.
References (41)
- et al.
Semi-supervised person re-identification using multi-view clustering
Pattern Recognit.
(2019) - et al.
Learning hybrid ranking representation for person re-identification
Pattern Recognit.
(2022) - et al.
Depth occlusion perception feature analysis for person re-identification
Pattern Recognit. Lett.
(2020) - et al.
Improving person re-identification by attribute and identity learning
Pattern Recognit.
(2019) - et al.
Alignedreid++: dynamically matching local information for person re-identification
Pattern Recognit.
(2019) - et al.
Deviation based clustering for unsupervised person re-identification
Pattern Recognit. Lett.
(2020) - et al.
Progressive sample mining and representation learning for one-shot person re-identification
Pattern Recognit.
(2021) - et al.
Unsupervised domain adaptive re-identification: theory and practice
Pattern Recognit.
(2020) - et al.
Person re-identification by unsupervised video matching
Pattern Recognit.
(2017) - et al.
Towards open-world person re-identification by one-shot group-based verification
IEEE Trans. Pattern Anal. Mach. Intell.
(2015)
Self-paced contrastive learning with hybrid memory for domain adaptive object re-id
Adv. Neural Inf. Process. Syst.
Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification
International Conference on Learning Representations
Hierarchical clustering with hard-batch triplet loss for person re-identification
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
A bottom-up clustering approach to unsupervised person re-identification
Proceedings of the AAAI Conference on Artificial Intelligence
A density-based algorithm for discovering clusters in large spatial databases with noise
Knowledge Discovery and Data Mining
Transferable joint attribute-identity deep learning for unsupervised person re-identification
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Transferring learning from multi-person tracking to person re-identification
Integr. Computer-Aided Eng.
Unsupervised person re-identification via multi-label classification
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Unsupervised person re-identification via softened similarity learning
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Cited by (0)
Ming-Kun Li is currently pursuing his Ph.D. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision and machine learning, especially in unsupervised learning for person re-identification.
He Sun is currently pursuing his M.S. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision, especially in person re-identification.
Chaoqun Lin received the M.S. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision and graph neural network.
Chun-Guang Li currently is an Associate Professor with the School of Artificial Intelligence, Beijing University of Posts and Telecommunications. He has served as an Area Chair for ICPR2020 and CVPR2021. His research interests are pattern recognition and machine learning and has published over 60 refereed papers.
Jun Guo has served as the Vice President of Beijing University of Posts and Telecommunications (BUPT). He is currently a principal Professor with the School of Artificial Intelligence, BUPT. His current research interests include pattern recognition theory and applications, information retrieval, content-based information security, and network management. He has published over 100 technical articles.