Towards prior gap and representation gap for long-tailed recognition

doi:10.1016/j.patcog.2022.109012

Pattern Recognition

Volume 133, January 2023, 109012

https://doi.org/10.1016/j.patcog.2022.109012 Get rights and content

Highlights

•
A unified theoretical framework for long-tailed recognition is established.
•
Corresponding mitigation solutions for prior gap and representation gap are proposed.
•
Theoretically analyzing the existing methods and the proposed methods in terms of the impact on two gaps.
•
The proposed methods yield superior performance on five long-tailed benchmarks.

Abstract

Most deep learning models are elaborately designed for balanced datasets, and thus they inevitably suffer performance degradation in practical long-tailed recognition tasks, especially to the minority classes. There are two crucial issues in learning from imbalanced datasets: skew decision boundary and unrepresentative feature space. In this work, we establish a theoretical framework to analyze the sources of these two issues from Bayesian perspective, and find that they are closely related to the prior gap and the representation gap, respectively. Under this framework, we show that existing long-tailed recognition methods manage to remove either the prior gap or the presentation gap. Different from these methods, we propose to simultaneously remove the two gaps to achieve more accurate long-tailed recognition. Specifically, we propose the prior calibration strategy to remove the prior gap and introduce three strategies (representative feature extraction, optimization strategy adjustment and effective sample modeling) to mitigate the representation gap. Extensive experiments on five benchmark datasets validate the superiority of our method against the state-of-the-art competitors.

Introduction

In many real-world applications, $e . g .$ , character recognition [1], electronic commodity recognition [2] and scene instance segmentation [3], datasets are subject to long-tailed distribution, where the tail portion makes up the multitude of classes with scarce samples while the head portion covers very few classes with abundant samples [4], [5], [6]. When confronted with long-tailed datasets, existing deep learning methods, usually trained with equal weight and sampling rate for all samples, are likely to suffer a noticeable performance drop. As shown in Fig. 1(a), the imbalanced distribution enforces the decision boundary skew towards the minority classes. Meanwhile, the minority classes are not well characterized in the feature space. Therefore, existing deep methods tend to fail to recognize the invisible samples of the minority classes.

To remedy the above two issues, a great number of previous methods tried to balance decision boundaries or enrich the feature space of minority classes. Existing works can be roughly categorized into three groups: re-sampling based methods [7], [8], [9], re-weighting based methods [10], [11], [12] and transfer learning strategies [13], [14], [15]. Re-sampling based methods oversample the minority classes and undersample the majority classes to highlight the significance of minority classes. Re-weighting based methods assign different weights for various classes to balance the label distribution. Fig. 1(b) shows that though these two kinds of methods promote a uniform shift of classification boundary, they hurt the feature representation, resulting critical under-fitting of majority classes and over-fitting of minority classes. Transfer learning strategies enhance the feature representation of minority classes by transferring knowledge from majority classes with more complex models. However, such transfer by data enhancement only overcomes the skewed decision boundary partially.

In this work, we propose a theoretical framework for long-tailed recognition, considering jointly the unbalanced decision boundary and the uncharacterized feature space. Specifically, we construct the relationship between neural networks and the Bayesian classifier from Bayesian perspective. By comparing the ideal Bayesian classifier with the actual classifier learned from the training set, we find that the discrepancies lie in two aspects: prior gap and representation gap, revealing the reasons for unbalanced decision boundary (corresponding to prior gap) and uncharacterized feature space (corresponding to representation gap).

In turn, rethinking the long-tailed problem from these two gaps will further facilitate the understanding of recent methods [6], [8], [13], [16]. The traditional re-balancing methods, including re-sampling based methods and re-weighting based methods, try to eliminate the prior gap, whereas they amplify the representation gap in the optimization stage. Recent state-of-the-art models, such as BBN [8], cRT [6], DRW [12] and LWS [6], benefit from easing the prior gap by re-weighting or re-sampling, while less enlarging the representation gap by keeping the feature encoder unchanged or by fine-tuning. Transfer learning strategies [13], [14], [15] generate samples of minority classes, aiming to alleviate the prior gap and the representation gap simultaneously, but no specific evidence supports that they work well on both gaps. Recent threshold rescaling methods [16], [17], [18] try to estimate the prior gap empirically but ignore the effect of representation gap. In summary, these existing methods are empirical and lacking in a unified theoretical framework, thus lead to suboptimal results. In contrast, this paper concentrates on two gaps explicitly and separately, and tries to relieve them simultaneously.

Under our theoretical framework, to alleviate the prior gap, we propose the prior calibration strategy. It can estimate prior gap accurately without affecting the learned representation distribution, as demonstrated in Fig. 1(c). To relieve the representation gap, we propose three effective strategies applied in the training phase and test phase: representative feature extraction, optimization strategy adjustment and effective sample modeling. The representative feature extraction strategy aims to extract richer features from minority classes via self-supervision training. The optimization strategy adjusts the training batch size and the training epoch according to the characteristics of optimization process of long-tailed learning. The effective sample modeling strategy uses the valid sample number proposed by [8] to estimate the representation gap at the test phase. As shown in Fig. 1(d), when removing the prior gap and relieving the representation gap simultaneously, the classification performance improves significantly. We conducted extensive experiments on five benchmark datasets, namely CIFAR10-IMB, CIFAR100-IMB, ImageNet_LT, iNaturalist2018 and Place_LT, and the results verify our analysis with respect to the joint impact of two gaps, and validate the superiority of our proposed method.

In summary, the main contributions are as follows:

•
We establish a theoretical framework for long-tailed recognition from Bayesian perspective, which unifies the prior gap and the representation gap. It contributes to a deeper understanding of existing methods and provides guidelines for the future work.
•
We propose corresponding mitigation solutions for prior gap and representation gap, and theoretically analyze the existing methods and the proposed methods in terms of the impact on these two gaps.
•
Our experimental results demonstrate that the proposed method removes the prior gap and relieves the representation gap, and yields superior performance on five long-tailed benchmarks.

The remainder of this paper is organized as follows: Section 2 reviews related works. Section 3 introduces the preliminary formulations. Section 4 describes the theoretical framework unifying prior gap and representation gap. Section 5 proposes mitigation strategies to relieve two gaps, and analyzes the validity of proposed strategies. Section 6 provides experimental results and ablation studies on five long-tailed datasets. Section 7 draws concluding remarks.

Section snippets

Related work

The long-tailed recognition is receiving increasing attention in recent years because recognition methods based on deep learning produce serious performance degradation on long-tailed datasets. Current solutions to long-tailed learning mainly fall into three groups: re-sampling based methods, re-weighting based methods and transfer learning strategies. Besides, recent threshold rescaling approaches have shown stronger performance but are much simpler.

Re-sampling based methods (also called

Preliminary

In this section, we define notations used throughout this paper. Let $D = {(x, ω)}$ be a data set, where $x$ is the sample and $ω$ is the corresponding label. $D$ is divided by class as $D_{i} = {(x, ω) | ω = i} \subset D, i = 1 \dots K$ with $K$ being the number of classes. The subscript of sample is omitted here for simplification. $| \cdot |$ denotes the sample cardinality of set, then it satisfies that $| D | = \sum_{i = 1}^{K} | D_{i} |$ . Without loss of generality, let the classes sorted by sample cardinality in ascending order, $i . e .$ , $| D_{1} | \leq | D_{2} | \leq \dots \leq | D_{K} |$ .

As to

Analysis

We make a theoretical connection between deep neural networks and Bayesian classifier, pointing out the prior gap and the representation gap in long-tailed learning.

Under ideal conditions (the dataset is abundant enough to represent the real distribution, and the network model is complex enough), the output of neural networks approximates the Bayesian posterior probability. This proposition has been proven in detail by Richard et al. [26]. Ideally, the neural network classifier outputs

Strategies towards two gaps

The analysis in Section 4 specifies the prior gap and representation gap existing in long-tailed learning. Although these two gaps are closely coupled, we attempt to deal with them separately, while taking account into the holistic effect and performance.

Datasets and evaluation metric

We evaluate the performance of proposed methods on five benchmark datasets. The datasets CIFAR10-IMB and CIFAR100-IMB are all constructed in the long-tailed imbalance mode, same as [12] with imbalance factors 10, 50, 100. In particular, the sample number drops as an exponential function $n = n_{i} μ^{i}$ , where $n_{i}$ is the original class cardinality of class $i$ , and $μ \in (0, 1)$ . The imbalance factor is defined as the ratio of the largest class cardinality to the smallest class cardinality, $i . e .$ , $| D_{K} | / | D_{1} |$ . The

Conclusion

In this paper, we identify prior gap and representation gap in long-tailed recognition from the perspectives of Bayesian formula and model optimization. Based on this, we propose corresponding mitigation strategies for these two gaps respectively, and make intensive analyses for diverse existing and proposed methods. Although it is hard to disentangle two gaps completely, our method significantly improves the performance on a variety of benchmarks of long-tailed recognition tasks. Besides, fair

Data Availability

Towards Prior Gap and Representation Gap for Long-Tailed Recognition (Mendeley Data)

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the National Key Research and Development Program Grant 2018AAA0100400, the National Natural Science Foundation of China (NSFC) Grants U20A20223, 62076236 and 61721004.

Ming-Liang Zhang received the B.S. degrees in computational mathematics from Hefei University of Technology, Hefei, China, in 2018. He is currently pursuing his Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include machine learning, computer vision, pattern recognition and deep learning.

References (56)

M. Koziarski
Radial-based undersampling for imbalanced data classification
Pattern Recognit.
(2020)
Y. Zhou et al.
Deep super-class learning for long-tail distributed image classification
Pattern Recognit.
(2018)
C. Santiago et al.
Low: training deep neural networks by learning optimal sample weights
Pattern Recognit.
(2021)
M. Buda et al.
A systematic study of the class imbalance problem in convolutional neural networks
Neural Netw.
(2018)
M.D. Richard et al.
Neural network classifiers estimate bayesian a posteriori probabilities
Neural Comput
(1991)
T. Yuan et al.
A large chinese text dataset in the wild
J. Comput. Sci. Technol.
(2019)
L. Cheng et al.
Weakly supervised learning with side information for noisy labeled images
Proc. European Conference on Computer Vision
(2020)
A. Gupta et al.
LVIS: a dataset for large vocabulary instance segmentation
Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(2019)
N.V. Chawla et al.
Smote: synthetic minority over-sampling technique
J. Artif. Intell. Res.
(2002)
Z. Liu et al.
Large-scale long-tailed recognition in an open world
Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(2019)

B. Kang et al.

Decoupling representation and classifier for long-tailed recognition

Proc. International Conference on Learning Representations

(2020)

B. Zhou et al.

BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(2020)

Y. Wang et al.

Dynamic curriculum learning for imbalanced data classification

Proc. IEEE International Conference on Computer Vision

(2019)

K. Cao et al.

Learning imbalanced datasets with label-distribution-aware margin loss

Proc. Conference on Neural Information Processing Systems

(2019)

P. Chu et al.

Feature space augmentation for long-tailed data

Proc. European Conference on Computer Vision

(2020)

J. Liu et al.

Deep representation learning on long-tailed data: alearnable embedding augmentation perspective

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(2020)

X. Yin et al.

Feature transfer learning for face recognition with under-represented data

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(2019)

A.K. Menon et al.

Long-tail learning via logit adjustment

Proc. International Conference on Learning Representations

(2021)

J. Tian et al.

Posterior re-calibration for imbalanced datasets

Proc. Conference on Neural Information Processing Systems

(2020)

Y. Ding et al.

Adaptive exploration for unsupervised person re-identification

ACM Trans. Multimed. Comput. Commun. Appl.

(2020)

H. Zhang et al.

Mixup: Beyond empirical risk minimization

Proc. International Conference on Learning Representations

(2017)

B. Liu et al.

Semi-supervised long-tailed recognition using alternate sampling

CoRR

(2021)

Y. Cui et al.

Class-balanced loss based on effective number of samples

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(2019)

M. Ren et al.

Learning to reweight examples for robust deep learning

Proc. International Conference on Machine Learning

(2018)

M.A. Jamal et al.

Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(2020)

R. Singh et al.

Metamed: few-shot medical image classification using gradient-based meta-learning

Pattern Recognit.

(2021)

S. Lawrence et al.

Neural network classification and prior class probabilities

Neural Networks: Tricks of the Trade

(1998)

J. Chen et al.

Decision threshold adjustment in class prediction

SAR QSAR Environ. Res.

(2006)

Cited by (10)

NCL++: Nested Collaborative Learning for long-tailed visual recognition
2024, Pattern Recognition
Long-tailed visual recognition has received increasing attention in recent years. Due to the extremely imbalanced data distribution in long-tailed learning, the learning process shows great uncertainties. For example, the predictions of different experts on the same image vary remarkably despite the same training settings. To alleviate the uncertainty, we propose a Nested Collaborative Learning (NCL++) which tackles the long-tailed learning problem by a collaborative learning. To be specific, the collaborative learning consists of two folds, namely inter-expert collaborative learning (InterCL) and intra-expert collaborative learning (IntraCL). InterCL learns multiple experts collaboratively and concurrently, aiming to transfer the knowledge among different experts. IntraCL is similar to InterCL, but it aims to conduct the collaborative learning on multiple augmented copies of the same image within the single expert. To achieve the collaborative learning in long-tailed learning, the balanced online distillation is proposed to force the consistent predictions among different experts and augmented copies, which reduces the learning uncertainties. Moreover, in order to improve the meticulous distinguishing ability on the confusing categories, we further propose a Hard Category Mining (HCM), which selects the negative categories with high predicted scores as the hard categories. Then, the collaborative learning is formulated in a nested way, in which the learning is conducted on not just all categories from a full perspective but some hard categories from a partial perspective. Extensive experiments manifest the superiority of our method with outperforming the state-of-the-art whether with using a single model or an ensemble. The code will be publicly released.
LCReg: Long-tailed image classification with Latent Categories based Recognition
2024, Pattern Recognition
In this work, we tackle the challenging problem of long-tailed image recognition. Previous long-tailed recognition approaches mainly focus on data augmentation or re-balancing strategies for the tail classes to give them more attention during model training. However, these methods are limited by the small number of training images for the tail classes, which results in poor feature representations. To address this issue, we propose the Latent Categories based long-tail Recognition (LCReg) method. Our hypothesis is that common latent features shared by head and tail classes can be used to improve feature representation. Specifically, we learn a set of class-agnostic latent features shared by both head and tail classes, and then use semantic data augmentation on the latent features to implicitly increase the diversity of the training sample. We conduct extensive experiments on five long-tailed image recognition datasets, and the results show that our proposed method significantly improves the baselines.
Feature fusion network for long-tailed visual recognition
2023, Pattern Recognition
Deep learning has achieved remarkable success in recent years; however, deep learning methods face significant challenges on long-tailed datasets, which are prevalent in real-world scenarios. In a long-tailed dataset, there are many more samples in the head classes than in the tail classes, and this class imbalance makes it difficult to learn a good feature representation for both head and tail classes simultaneously, particularly when using a single-stage method. Although the existing two-stage methods can alleviate the problem of single-stage methods not performing well on the tail classes by classifier retraining in the second stage, this does not resolve the problem of insufficient learning of head and tail features. Thus, in this paper, we propose a two-stage feature fusion network (FFN). The proposed FFN addresses this issue using one network for the head classes and another network for the tail classes, each of which is trained with a different loss function. This allows the feature learning module to effectively distinguish between the head and tail classes in the embedding space. The classifier learning module fuses the features obtained from the feature learning module, and the classifier is fine-tuned to classify the input images. Different from traditional two-stage methods, the proposed utilizes different loss functions for the head and tail classes; thus, the classifier can achieve balanced results between the head and tail classes. We conduct extensive experiments on three benchmark datasets comparing the proposed FFN with six state-of-the-art methods including two baseline methods, the experimental results demonstrate that the FFN achieves significant improvement on all three benchmark datasets. The code is publicly available at https://github.com/zxsong999/Feature-Fusion-Network.pytorch.
Active diversification of head-class features in bilateral-expert models for enhanced tail-class optimization in long-tailed classification
2023, Engineering Applications of Artificial Intelligence
Training deep learning models on long-tailed datasets is a challenging task since the classification performance of tail classes with fewer samples is always unsatisfactory. Currently, many long-tailed methods have achieved success. However, some methods always improve tail-class performance at the expense of head-class performance due to limited model capability. To address this issue, we propose a novel algorithm-level method inspired by information theory to balance the information space of each class and boost tail-class performance while minimizing head-class sacrifice. Our method involves actively eliminating the redundant feature information of head classes to save space for tail classes during training. Specifically, we use a bilateral-expert model and design a duplicate information disentanglement (DID) module that can extract duplicate and redundant information from bilateral-expert features. This allows us to develop a head diversity loss to decrease the extracted duplicate and redundant information of head classes and a tail distillation loss to increase the label information of tail classes. The joint result of these two losses allows our model to fully leverage the information space for improved tail-class performance without compromising head-class performance. The effectiveness and practicability of our method are verified by five datasets with long-tailed distributions for visual recognition or fault diagnosis tasks. Experimental results demonstrate that our method outperforms currently available mainstream methods, which we attribute to the effectiveness of our proposed DID module and the incorporation of two long-tailed losses.
Towards better long-tailed oracle character recognition with adversarial data augmentation
2023, Pattern Recognition
Deciphering oracle bone script is of great significance to the study of ancient Chinese culture as well as archaeology. Although recent studies on oracle character recognition have made substantial progress, they still suffer from the long-tailed data situation that results in a noticeable performance drop on the tail classes. To mitigate this issue, we propose a generative adversarial framework to augment oracle characters in the problematic classes. In this framework, the generator produces synthetic data through convex combinations of all the available samples in the corresponding classes, and is further optimized through adversarial learning with the classifier and simultaneously the discriminator. Meanwhile, we introduce Repatch to generalize samples in the generator. Since tail classes do not have sufficient data for convex combinations, we propose the TailMix mechanism to generate suitable tail class samples from other classes. Experimental results show that our proposed algorithm obtains remarkable performance in oracle character recognition and achieves new state-of-the-art average (total) accuracy with 86.03% (89.46%), 86.54% (93.86%), 95.22% (96.17%) on the three datasets Oracle-AYNU, OBC306 and Oracle-20K, respectively.
Joint weighted knowledge distillation and multi-scale feature distillation for long-tailed recognition
2024, International Journal of Machine Learning and Cybernetics

View all citing articles on Scopus

Xu-Yao Zhang received the BS degree in computational mathematics from Wuhan University, Wuhan, China, in 2008 and the PhD degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2013. He is currently an associate professor in the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He was a visiting researcher at CENPARMI of Concordia University, in 2012. From March 2015 to March 2016, he was a visiting scholar in the Montreal Institute for Learning Algorithms (MILA), University of Montreal, Canada. His research interests include machine learning, pattern recognition, handwriting recognition, and deep learning.

Chuang Wang is an associate professor at Institute of Automation,Chinese Academy of Sciences since Sept 2019. He received his Ph.D. degree in Theoretical Physics from the Institute of Theoretical Physics, Chinese Academy of Science, Beijing, China, in 2015. He then joined the Paulson School of Engineering and Applied Sciences at Harvard University, first as a Postdoctoral Fellow (Feb 2015 - Jan 2018) and as a Research Associate (Feb 2018 - Aug 2019) in the Signals, Information, and Networks Group. My research interests include machine learning theory; probabilistic graphical models; high-dimensional signal and information processing; and physics-inspired optimization algorithms.

Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, and now the Director of the Laboratory. He received the PhD degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1995. He was a postdoctoral fellow in Korea and Japan from March 1996 to March 1999. From 1999 to 2004, he was a researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to document analysis and recognition. He has published over 300 technical papers at journals and conferences. He is an Associate Editor-in-Chief of Pattern Recognition Journal and Acta Automatica Sinica, an Associate Editor of International Journal on Document Analysis and Recognition, Cognitive Computation, IEEE/CAA Journal of Automatica Sinica, and CAAI Trans. Intelligence Technology. He is a Fellow of the CAA, CAAI, the IAPR and the IEEE.

View full text

Towards prior gap and representation gap for long-tailed recognition

Highlights

Abstract

Introduction

Section snippets

Related work

Preliminary

Analysis

Strategies towards two gaps

Datasets and evaluation metric

Conclusion

Data Availability

Declaration of Competing Interest

Acknowledgments

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neural Netw.

Neural Comput

A large chinese text dataset in the wild

J. Comput. Sci. Technol.

Weakly supervised learning with side information for noisy labeled images

Proc. European Conference on Computer Vision

LVIS: a dataset for large vocabulary instance segmentation

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Smote: synthetic minority over-sampling technique

J. Artif. Intell. Res.

Large-scale long-tailed recognition in an open world

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Decoupling representation and classifier for long-tailed recognition

Proc. International Conference on Learning Representations

BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Dynamic curriculum learning for imbalanced data classification

Proc. IEEE International Conference on Computer Vision

Learning imbalanced datasets with label-distribution-aware margin loss

Proc. Conference on Neural Information Processing Systems

Feature space augmentation for long-tailed data

Proc. European Conference on Computer Vision

Deep representation learning on long-tailed data: alearnable embedding augmentation perspective

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Feature transfer learning for face recognition with under-represented data

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Long-tail learning via logit adjustment

Proc. International Conference on Learning Representations

Posterior re-calibration for imbalanced datasets

Proc. Conference on Neural Information Processing Systems

Adaptive exploration for unsupervised person re-identification

ACM Trans. Multimed. Comput. Commun. Appl.

Mixup: Beyond empirical risk minimization

Proc. International Conference on Learning Representations

Semi-supervised long-tailed recognition using alternate sampling

CoRR

Class-balanced loss based on effective number of samples

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Learning to reweight examples for robust deep learning

Proc. International Conference on Machine Learning

Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective

Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Metamed: few-shot medical image classification using gradient-based meta-learning

Pattern Recognit.

Neural network classification and prior class probabilities

Neural Networks: Tricks of the Trade

Decision threshold adjustment in class prediction

SAR QSAR Environ. Res.