Elsevier

Pattern Recognition

Volume 133, January 2023, 109012
Pattern Recognition

Towards prior gap and representation gap for long-tailed recognition

https://doi.org/10.1016/j.patcog.2022.109012Get rights and content

Highlights

  • A unified theoretical framework for long-tailed recognition is established.

  • Corresponding mitigation solutions for prior gap and representation gap are proposed.

  • Theoretically analyzing the existing methods and the proposed methods in terms of the impact on two gaps.

  • The proposed methods yield superior performance on five long-tailed benchmarks.

Abstract

Most deep learning models are elaborately designed for balanced datasets, and thus they inevitably suffer performance degradation in practical long-tailed recognition tasks, especially to the minority classes. There are two crucial issues in learning from imbalanced datasets: skew decision boundary and unrepresentative feature space. In this work, we establish a theoretical framework to analyze the sources of these two issues from Bayesian perspective, and find that they are closely related to the prior gap and the representation gap, respectively. Under this framework, we show that existing long-tailed recognition methods manage to remove either the prior gap or the presentation gap. Different from these methods, we propose to simultaneously remove the two gaps to achieve more accurate long-tailed recognition. Specifically, we propose the prior calibration strategy to remove the prior gap and introduce three strategies (representative feature extraction, optimization strategy adjustment and effective sample modeling) to mitigate the representation gap. Extensive experiments on five benchmark datasets validate the superiority of our method against the state-of-the-art competitors.

Introduction

In many real-world applications, e.g., character recognition [1], electronic commodity recognition [2] and scene instance segmentation [3], datasets are subject to long-tailed distribution, where the tail portion makes up the multitude of classes with scarce samples while the head portion covers very few classes with abundant samples [4], [5], [6]. When confronted with long-tailed datasets, existing deep learning methods, usually trained with equal weight and sampling rate for all samples, are likely to suffer a noticeable performance drop. As shown in Fig. 1(a), the imbalanced distribution enforces the decision boundary skew towards the minority classes. Meanwhile, the minority classes are not well characterized in the feature space. Therefore, existing deep methods tend to fail to recognize the invisible samples of the minority classes.

To remedy the above two issues, a great number of previous methods tried to balance decision boundaries or enrich the feature space of minority classes. Existing works can be roughly categorized into three groups: re-sampling based methods [7], [8], [9], re-weighting based methods [10], [11], [12] and transfer learning strategies [13], [14], [15]. Re-sampling based methods oversample the minority classes and undersample the majority classes to highlight the significance of minority classes. Re-weighting based methods assign different weights for various classes to balance the label distribution. Fig. 1(b) shows that though these two kinds of methods promote a uniform shift of classification boundary, they hurt the feature representation, resulting critical under-fitting of majority classes and over-fitting of minority classes. Transfer learning strategies enhance the feature representation of minority classes by transferring knowledge from majority classes with more complex models. However, such transfer by data enhancement only overcomes the skewed decision boundary partially.

In this work, we propose a theoretical framework for long-tailed recognition, considering jointly the unbalanced decision boundary and the uncharacterized feature space. Specifically, we construct the relationship between neural networks and the Bayesian classifier from Bayesian perspective. By comparing the ideal Bayesian classifier with the actual classifier learned from the training set, we find that the discrepancies lie in two aspects: prior gap and representation gap, revealing the reasons for unbalanced decision boundary (corresponding to prior gap) and uncharacterized feature space (corresponding to representation gap).

In turn, rethinking the long-tailed problem from these two gaps will further facilitate the understanding of recent methods [6], [8], [13], [16]. The traditional re-balancing methods, including re-sampling based methods and re-weighting based methods, try to eliminate the prior gap, whereas they amplify the representation gap in the optimization stage. Recent state-of-the-art models, such as BBN [8], cRT [6], DRW [12] and LWS [6], benefit from easing the prior gap by re-weighting or re-sampling, while less enlarging the representation gap by keeping the feature encoder unchanged or by fine-tuning. Transfer learning strategies [13], [14], [15] generate samples of minority classes, aiming to alleviate the prior gap and the representation gap simultaneously, but no specific evidence supports that they work well on both gaps. Recent threshold rescaling methods [16], [17], [18] try to estimate the prior gap empirically but ignore the effect of representation gap. In summary, these existing methods are empirical and lacking in a unified theoretical framework, thus lead to suboptimal results. In contrast, this paper concentrates on two gaps explicitly and separately, and tries to relieve them simultaneously.

Under our theoretical framework, to alleviate the prior gap, we propose the prior calibration strategy. It can estimate prior gap accurately without affecting the learned representation distribution, as demonstrated in Fig. 1(c). To relieve the representation gap, we propose three effective strategies applied in the training phase and test phase: representative feature extraction, optimization strategy adjustment and effective sample modeling. The representative feature extraction strategy aims to extract richer features from minority classes via self-supervision training. The optimization strategy adjusts the training batch size and the training epoch according to the characteristics of optimization process of long-tailed learning. The effective sample modeling strategy uses the valid sample number proposed by [8] to estimate the representation gap at the test phase. As shown in Fig. 1(d), when removing the prior gap and relieving the representation gap simultaneously, the classification performance improves significantly. We conducted extensive experiments on five benchmark datasets, namely CIFAR10-IMB, CIFAR100-IMB, ImageNet_LT, iNaturalist2018 and Place_LT, and the results verify our analysis with respect to the joint impact of two gaps, and validate the superiority of our proposed method.

In summary, the main contributions are as follows:

  • We establish a theoretical framework for long-tailed recognition from Bayesian perspective, which unifies the prior gap and the representation gap. It contributes to a deeper understanding of existing methods and provides guidelines for the future work.

  • We propose corresponding mitigation solutions for prior gap and representation gap, and theoretically analyze the existing methods and the proposed methods in terms of the impact on these two gaps.

  • Our experimental results demonstrate that the proposed method removes the prior gap and relieves the representation gap, and yields superior performance on five long-tailed benchmarks.

The remainder of this paper is organized as follows: Section 2 reviews related works. Section 3 introduces the preliminary formulations. Section 4 describes the theoretical framework unifying prior gap and representation gap. Section 5 proposes mitigation strategies to relieve two gaps, and analyzes the validity of proposed strategies. Section 6 provides experimental results and ablation studies on five long-tailed datasets. Section 7 draws concluding remarks.

Section snippets

Related work

The long-tailed recognition is receiving increasing attention in recent years because recognition methods based on deep learning produce serious performance degradation on long-tailed datasets. Current solutions to long-tailed learning mainly fall into three groups: re-sampling based methods, re-weighting based methods and transfer learning strategies. Besides, recent threshold rescaling approaches have shown stronger performance but are much simpler.

Re-sampling based methods (also called

Preliminary

In this section, we define notations used throughout this paper. Let D={(x,ω)} be a data set, where x is the sample and ω is the corresponding label. D is divided by class as Di={(x,ω)|ω=i}D,i=1K with K being the number of classes. The subscript of sample is omitted here for simplification. |·| denotes the sample cardinality of set, then it satisfies that |D|=i=1K|Di|. Without loss of generality, let the classes sorted by sample cardinality in ascending order, i.e., |D1||D2||DK|.

As to

Analysis

We make a theoretical connection between deep neural networks and Bayesian classifier, pointing out the prior gap and the representation gap in long-tailed learning.

Under ideal conditions (the dataset is abundant enough to represent the real distribution, and the network model is complex enough), the output of neural networks approximates the Bayesian posterior probability. This proposition has been proven in detail by Richard et al. [26]. Ideally, the neural network classifier outputs

Strategies towards two gaps

The analysis in Section 4 specifies the prior gap and representation gap existing in long-tailed learning. Although these two gaps are closely coupled, we attempt to deal with them separately, while taking account into the holistic effect and performance.

Datasets and evaluation metric

We evaluate the performance of proposed methods on five benchmark datasets. The datasets CIFAR10-IMB and CIFAR100-IMB are all constructed in the long-tailed imbalance mode, same as [12] with imbalance factors 10, 50, 100. In particular, the sample number drops as an exponential function n=niμi, where ni is the original class cardinality of class i, and μ(0,1). The imbalance factor is defined as the ratio of the largest class cardinality to the smallest class cardinality, i.e., |DK|/|D1|. The

Conclusion

In this paper, we identify prior gap and representation gap in long-tailed recognition from the perspectives of Bayesian formula and model optimization. Based on this, we propose corresponding mitigation strategies for these two gaps respectively, and make intensive analyses for diverse existing and proposed methods. Although it is hard to disentangle two gaps completely, our method significantly improves the performance on a variety of benchmarks of long-tailed recognition tasks. Besides, fair

Data Availability

Towards Prior Gap and Representation Gap for Long-Tailed Recognition (Mendeley Data)

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the National Key Research and Development Program Grant 2018AAA0100400, the National Natural Science Foundation of China (NSFC) Grants U20A20223, 62076236 and 61721004.

Ming-Liang Zhang received the B.S. degrees in computational mathematics from Hefei University of Technology, Hefei, China, in 2018. He is currently pursuing his Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include machine learning, computer vision, pattern recognition and deep learning.

References (56)

  • B. Kang et al.

    Decoupling representation and classifier for long-tailed recognition

    Proc. International Conference on Learning Representations

    (2020)
  • B. Zhou et al.

    BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition

    Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2020)
  • Y. Wang et al.

    Dynamic curriculum learning for imbalanced data classification

    Proc. IEEE International Conference on Computer Vision

    (2019)
  • K. Cao et al.

    Learning imbalanced datasets with label-distribution-aware margin loss

    Proc. Conference on Neural Information Processing Systems

    (2019)
  • P. Chu et al.

    Feature space augmentation for long-tailed data

    Proc. European Conference on Computer Vision

    (2020)
  • J. Liu et al.

    Deep representation learning on long-tailed data: alearnable embedding augmentation perspective

    Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2020)
  • X. Yin et al.

    Feature transfer learning for face recognition with under-represented data

    Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2019)
  • A.K. Menon et al.

    Long-tail learning via logit adjustment

    Proc. International Conference on Learning Representations

    (2021)
  • J. Tian et al.

    Posterior re-calibration for imbalanced datasets

    Proc. Conference on Neural Information Processing Systems

    (2020)
  • Y. Ding et al.

    Adaptive exploration for unsupervised person re-identification

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2020)
  • H. Zhang et al.

    Mixup: Beyond empirical risk minimization

    Proc. International Conference on Learning Representations

    (2017)
  • B. Liu et al.

    Semi-supervised long-tailed recognition using alternate sampling

    CoRR

    (2021)
  • Y. Cui et al.

    Class-balanced loss based on effective number of samples

    Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2019)
  • M. Ren et al.

    Learning to reweight examples for robust deep learning

    Proc. International Conference on Machine Learning

    (2018)
  • M.A. Jamal et al.

    Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective

    Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2020)
  • R. Singh et al.

    Metamed: few-shot medical image classification using gradient-based meta-learning

    Pattern Recognit.

    (2021)
  • S. Lawrence et al.

    Neural network classification and prior class probabilities

    Neural Networks: Tricks of the Trade

    (1998)
  • J. Chen et al.

    Decision threshold adjustment in class prediction

    SAR QSAR Environ. Res.

    (2006)
  • Cited by (10)

    View all citing articles on Scopus

    Ming-Liang Zhang received the B.S. degrees in computational mathematics from Hefei University of Technology, Hefei, China, in 2018. He is currently pursuing his Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include machine learning, computer vision, pattern recognition and deep learning.

    Xu-Yao Zhang received the BS degree in computational mathematics from Wuhan University, Wuhan, China, in 2008 and the PhD degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2013. He is currently an associate professor in the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He was a visiting researcher at CENPARMI of Concordia University, in 2012. From March 2015 to March 2016, he was a visiting scholar in the Montreal Institute for Learning Algorithms (MILA), University of Montreal, Canada. His research interests include machine learning, pattern recognition, handwriting recognition, and deep learning.

    Chuang Wang is an associate professor at Institute of Automation,Chinese Academy of Sciences since Sept 2019. He received his Ph.D. degree in Theoretical Physics from the Institute of Theoretical Physics, Chinese Academy of Science, Beijing, China, in 2015. He then joined the Paulson School of Engineering and Applied Sciences at Harvard University, first as a Postdoctoral Fellow (Feb 2015 - Jan 2018) and as a Research Associate (Feb 2018 - Aug 2019) in the Signals, Information, and Networks Group. My research interests include machine learning theory; probabilistic graphical models; high-dimensional signal and information processing; and physics-inspired optimization algorithms.

    Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, and now the Director of the Laboratory. He received the PhD degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1995. He was a postdoctoral fellow in Korea and Japan from March 1996 to March 1999. From 1999 to 2004, he was a researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to document analysis and recognition. He has published over 300 technical papers at journals and conferences. He is an Associate Editor-in-Chief of Pattern Recognition Journal and Acta Automatica Sinica, an Associate Editor of International Journal on Document Analysis and Recognition, Cognitive Computation, IEEE/CAA Journal of Automatica Sinica, and CAAI Trans. Intelligence Technology. He is a Fellow of the CAA, CAAI, the IAPR and the IEEE.

    View full text