Elsevier

Pattern Recognition

Volume 129, September 2022, 108763
Pattern Recognition

The devil in the tail: Cluster consolidation plus cluster adaptive balancing loss for unsupervised person re-identification

https://doi.org/10.1016/j.patcog.2022.108763Get rights and content

Highlights

  • We propose a simple yet effective approach, called cluster consolidation (CC), to reorganize the clustering result. The reorganization step can improve the compactness of larger clusters by pruning a proportion of unreliable samples into tiny clusters or singletons.

  • We propose a cluster adaptive balancing (CAB) loss to effectively train the network by automatically assigning proper weights to the imbalanced and noisy pseudo labels. In this way, the unsupervised person Re-ID task is formulated as a cluster adaptive long-tail learning problem.

  • Extensive experiments on widely used benchmark datasets are conducted and demonstrate state-of-the-art performance. A set of ablation studies are also provided.

Abstract

Unsupervised person re-identification (Re-ID) is to retrieve pedestrians from different camera views without supervision information. State-of-the-art methods are usually built upon training a convolution neural network with pseudo labels generated by clustering. Unfortunately, the pseudo labels are highly unbalanced and heavily noisy, carrying ineffective or even erroneous supervision information. To address these deficiencies, we present an effective clustering and reorganization approach, called Cluster Consolidation, which aims to separate a small proportion of unreliable data points from each cluster. This approach benefits to improve the quality of the pseudo labels, but also yields more tiny clusters. Thus, we further propose a Cluster Adaptive Balancing (CAB) loss to effectively train the network with the imbalance pseudo labels, where our CAB loss is able to automatically balance the importance of each cluster. We conduct extensive experiments on widely used person Re-ID benchmark datasets and demonstrate the effectiveness of our proposals.

Introduction

Person re-identification (Re-ID) is to track or match pedestrians in multiple camera views and various locations under varying posture, different illumination and weather conditions, and from different view perspectives. It is very helpful to public security or criminal tracking. Existing person re-identification methods are mostly designed to use manually labeled images to train deep neural networks. Unfortunately, the generalization performance of the trained neural networks seriously degenerates when facing open-world data [1], [2], [3], [4]. Therefore, it is crucial to get rid of the limitations of heavy dependency on manually labeled data. This leads to the urgent need for unsupervised person Re-ID [2].

Unsupervised person Re-ID in recent years has attracted lots of research attention. The primal idea in unsupervised person Re-ID is to perform clustering algorithms to generate pseudo labels at first and then train the Convolution Neural Network (CNN) with the pseudo labels. For instance, hierarchical clustering is used in Zeng et al. [5], Lin et al. [6] to generate pseudo labels, and k-means and DBSCAN [7] are jointly used in Ge et al. [3] to perform unsupervised domain adaptation. In practice, training CNN for learning effective features depends on the quality of the generated pseudo labels. Unfortunately, the generated pseudo labels are: a) contaminated with mistakes in larger clusters, and b) containing a number of tiny clusters or even singletons. The mistakes in pseudo labels provide wrong supervision information to train the network and thus lead to degenerated performance. On the other hand, the pseudo labels are heavily imbalanced due to the occurrence of larger clusters, tiny clusters and singletons imposing more challenges to train the network.

In previous works, the singletons are viewed as noisy samples and thus usually not used for training the network [4]. However, the singletons are also of rich potential information to train the network. Simply ignoring the singletons leads to a waste of data. There exist some attempts to regard each singleton as an extra independent class and jointly train the network with the data of all clusters [3], [8]. In these methods, however, the pseudo labels of whatever the larger clusters, tiny clusters, or even singletons, are used with equal importance. This is not a wise way to exploit the supervision information provided by the pseudo labels from clusters of different sizes.

In this paper, we focus on leveraging the supervision information provided by whatever the larger clusters, tiny clusters or even singletons to effectively train the network. The primary idea of our work consists of two components: a) to improve the quality of the pseudo labels at first; and then b) to formulate the task of training the network with imbalance pseudo labels as a long-tail learning problem.

Specifically, we propose a simple but effective approach to improve the quality of the pseudo labels, called Cluster Consolidation (CC), which properly excludes a small proportion of unreliable data points from clusters of larger size. By doing so, the mistakes in the pseudo labels from larger clusters can be reduced, but at the meantime more tiny clusters or even singletons (which are the unreliable data points excluded from larger clusters) are generated, resulting in a heavy imbalance distribution of the pseudo labels. To tackle the imbalance in pseudo labels, we propose a Cluster Adaptive Balancing (CAB) loss, which is able to automatically assign balancing weights to the pseudo labels with respect to cluster sizes and confidence. In this way, the supervision information from whatever larger clusters, tiny clusters or even singletons can be properly exploited to efficiently train the network for feature learning.

Paper contributions The contributions of the paper are highlighted as follows.

  • 1.

    We propose a simple yet effective approach, called Cluster Consolidation (CC), to reorganize the clustering result. The reorganization step can improve the compactness of larger clusters by pruning a proportion of unreliable samples into tiny clusters or singletons.

  • 2.

    We propose a Cluster Adaptive Balancing (CAB) loss to effectively train the network by automatically assigning proper weights to the imbalanced and noisy pseudo labels. In this way, the unsupervised person Re-ID task is formulated as a cluster adaptive long-tail learning problem.

  • 3.

    Extensive experiments on widely used benchmark datasets are conducted and demonstrate state-of-the-art performance. A set of ablation studies are also provided.

Section snippets

Person re-identification

Person Re-ID aims to match different pedestrian images from multiple camera views. Roughly, the existing approaches for person Re-ID can be divided into two groups: supervised person Re-ID [9], [10], [11], [12] and unsupervised person Re-ID [13], [14], [15]. Supervised person Re-ID is based on a large number of manually labeled data for training and has achieved remarkably high accuracy. However, supervised person Re-ID methods require a huge amount of labeled data, which are expensive, and the

Our proposal: C3AB

This section will present our proposed Cluster Consolidation plus Cluster Adaptive Balancing (C3AB) learning framework for unsupervised person Re-ID.

For clarity, we illustrate our proposed C3AB framework in Fig. 2. Overall, the C3AB framework trains a convolution neural network to learn effective feature representation equipped with the CAB loss, where the supervision information is provided by the pseudo labels from reorganized clustering. The key ingredients to improve the performance of

Experiments

To validate the effectiveness of our proposal, we conduct extensive experiments on two benchmark datasets and evaluate the effectiveness of each component with a set of ablation studies. We also visualize the cluster configuration and the varying distribution of the CAB loss during the training stage.

Conclusions

We have proposed a pseudo labels based unsupervised learning framework for person Re-ID, in which a cluster consolidation method is designed to improve the quality of the clustering result by excluding the unreliable samples in the clusters of larger size and a cluster adaptive balancing loss is proposed to formulate the pseudo label based unsupervised learning problem into a cluster-adaptive long-tail learning task, where the distribution of the pseudo labels changes with the clustering

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is partially supported by the National Natural Science Foundation of China under Grant 61876022. The authors would like to thank the anonymous reviewers for their constructive comments to improve the quality of the work.

Ming-Kun Li is currently pursuing his Ph.D. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision and machine learning, especially in unsupervised learning for person re-identification.

References (41)

  • L. Zheng, Y. Yang, A. G. Hauptmann, Person re-identification: past, present and future, arXiv preprint...
  • Y. Ge et al.

    Self-paced contrastive learning with hybrid memory for domain adaptive object re-id

    Adv. Neural Inf. Process. Syst.

    (2020)
  • Y. Ge et al.

    Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification

    International Conference on Learning Representations

    (2020)
  • K. Zeng et al.

    Hierarchical clustering with hard-batch triplet loss for person re-identification

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • Y. Lin et al.

    A bottom-up clustering approach to unsupervised person re-identification

    Proceedings of the AAAI Conference on Artificial Intelligence

    (2019)
  • M. Ester et al.

    A density-based algorithm for discovering clusters in large spatial databases with noise

    Knowledge Discovery and Data Mining

    (1996)
  • J. Wang et al.

    Transferable joint attribute-identity deep learning for unsupervised person re-identification

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2018)
  • M.J. Gómez-Silva et al.

    Transferring learning from multi-person tracking to person re-identification

    Integr. Computer-Aided Eng.

    (2019)
  • D. Wang et al.

    Unsupervised person re-identification via multi-label classification

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • Y. Lin et al.

    Unsupervised person re-identification via softened similarity learning

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • Cited by (0)

    Ming-Kun Li is currently pursuing his Ph.D. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision and machine learning, especially in unsupervised learning for person re-identification.

    He Sun is currently pursuing his M.S. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision, especially in person re-identification.

    Chaoqun Lin received the M.S. degree in signal and information processing at the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, PR China. His research interests include computer vision and graph neural network.

    Chun-Guang Li currently is an Associate Professor with the School of Artificial Intelligence, Beijing University of Posts and Telecommunications. He has served as an Area Chair for ICPR2020 and CVPR2021. His research interests are pattern recognition and machine learning and has published over 60 refereed papers.

    Jun Guo has served as the Vice President of Beijing University of Posts and Telecommunications (BUPT). He is currently a principal Professor with the School of Artificial Intelligence, BUPT. His current research interests include pattern recognition theory and applications, information retrieval, content-based information security, and network management. He has published over 100 technical articles.

    View full text