Abstract
Long-tailed image recognition is a challenging task in real scenes with large-scale data. Popular strategies, such as loss reweighting and data resampling, aim to reduce the model bias toward head classes. Specifically, different loss reweighting approaches explore various endogenous or exogenous measures. In this paper, we study a new endogenous measure called discriminant quality (DQ) by considering validation accuracy and discriminant uncertainty. DQ takes advantage of continuous information over a period of time. It is more robust than instantaneous information because of the mitigation of measuring instability caused by random perturbations during training. Additionally, the weight of each class is automatically rebalanced based on DQ. Consequently, the class weight supports the design of a dynamic updating strategy for the significance of the DQ difference. Experiments on MNIST-LT, CIFAR-100-LT, ImageNet-LT, and Places-LT demonstrated the superiority of DQ over state-of-the-art ones in terms of prediction accuracy.
Similar content being viewed by others
References
Bengio S (2015) Sharing representations for long tail computer vision problems. In: ICMI, pp. 1–1 (2015). https://doi.org/10.1145/2818346.2818348
Cai J-R, Wang Y-Z, Hwang J-N (2021) Ace: ally complementary experts for solving long-tailed recognition in one-shot. In: ICCV, pp. 112–121
Cao K, Wei C, Gaidon A, Aréchiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: NeurIPS, pp. 1567–1578. http://arxiv.org/abs/1906.07413
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: ICML. https://doi.org/10.5555/3524938.3525087
Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp. 4109–4118
Cui Y, Jia M-L, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: CVPR, pp. 9268–9277. https://doi.org/10.1109/cvpr.2019.00949
Cui J-Q, Zhong Z-S, Liu S, Yu B, Jia J-Y (2021) Parametric contrastive learning. In: ICCV, pp. 715–724
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, vol. 1, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Duggal R, Freitas S, Dhamnani S, Chau DH, Sun J (2020) ELF: an early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol. 70, pp. 1126–1135
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2018) Accurate, large minibatch SGD: training imagenet in 1 hour. https://doi.org/10.48550/arXiv.1706.02677
He H-B, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
He K-M, Zhang X-Y, Ren S-Q, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
He K, Fan H-Q, Wu Y-X, Xie S-N, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR
Hong Y, Han S, Choi K, Seo S, Kim B, Chang B (2021) Disentangling label distribution for long-tailed visual recognition. In: CVPR, pp. 6626–6636
Huang C, Li Y, Loy CC, Tang X-O (2016) Learning deep representation for imbalanced classification. In: CVPR
Jamal MA, Brown M, Yang M-H, Wang L, Gong B (2020) Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: CVPR
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(27):1–54. https://doi.org/10.1186/s40537-019-0192-5
Kang B-Y, Xie S-N, Rohrbach M, Yan Z-C, Gordo A, Feng J-S, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: ICLR. https://openreview.net/forum?id=r1gRTCVFvB
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: NeurIPS, vol. 33, pp. 18661–18673
King G, Zeng L-C (2001) Logistic regression in rare events data. Soc Sci Electron Publ 9(2):137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868
Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Leng Z-Q, Tan M-X, Liu C-X, Cubuk ED, Shi J, Cheng S-Y, Anguelov D (2022) Polyloss: a polynomial expansion perspective of classification loss functions. In: ICLR
Lin T-Y, Goyal P, Girshick R, He K-M, Dollár P (2017) Focal loss for dense object detection. In: ICCV, pp. 2980–2988. https://doi.org/10.1109/iccv.2017.324
Liu Z-W, Miao Z-Q, Zhang X-H, Wang J-Y, Guo B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR
Liu B, Li H-X, Kang H, Hua G, Vasconcelos N (2021) Gistnet: a geometric structure transfer network for long-tailed recognition. In: ICCV, pp. 8189–8198
Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, van der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: ECCV, pp. 185–201
Ouyang W-L, Wang X-G, Zhang C, Yang X-K (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: CVPR, pp. 864–873. https://doi.org/10.1109/CVPR.2016.100
Ren M-Y, Zeng W-Y, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: ICML
Ren J-W, Yu C-J, Sheng S, Ma X, Zhao H-Y, Yi S, Li H-S (2020) Balanced meta-softmax for long-tailed visual recognition. In: NeurIPS, vol. 33, pp. 4175–4186
Samuel D, Chechik G (2021) Distributional robustness loss for long-tail learning. In: ICCV, pp. 9495–9504
Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: ECCV, pp. 467–482
Shu J, Xie Q, Yi L-X, Zhao Q, Zhou S-P, Xu Z-B, Meng D-Y (2019) Meta-weight-net: Learning an explicit mapping for sample weighting. In: NeurIPS, pp. 1919–1930
Sinha S, Ohashi H (2022) Difficulty-net: learning to predict difficulty for long-tailed recognition. arXiv preprint arXiv:2209.02960
Sinha S, Ohashi H, Nakamura K (2020) Class-wise difficulty-balanced loss for solving class-imbalance. In: ACCV. https://doi.org/10.48550/arXiv.2010.01824
Sinha S, Ohashi H, Nakamura K (2022) Class-difficulty based methods for long-tailed visual recognition. Int J Comput Vis 130(10):2517–2531
Tan J-R, Wang C-B, Li B-Y, Li Q-Q, Ouyang W-L, Yin C-Q, Yan J-J (2020) Equalization loss for long-tailed object recognition. In: CVPR, pp. 11662–11671. https://doi.org/10.1109/CVPR42600.2020.01168
Tang K-H, Huang J-Q, Zhang H-W (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33:1513–1524
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Wan W-T, Chen J-S, Li T-P, Huang Y-Q, Tian J-Q, Yu C, Xue Y-Z (2019) Information entropy based feature pooling for convolutional neural networks. In: ICCV
Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: NeuIPS. 30: 1–11
Wang M, Lin Y, Min F, Liu D (2019) Cost-sensitive active learning through statistical methods. Inf Sci 501:460–482. https://doi.org/10.1016/j.ins.2019.06.015
Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: CVPR, pp. 943–952
Wu Y-X, Min X-Y, Min F, Wang M (2019) Cost-sensitive active learning with a label uniform distribution model. Int J Approx Reason 105:49–65. https://doi.org/10.1016/j.ijar.2018.11.004
Wu Y-X, Hu Z-N, Wang Y-Y, Min F (2022) Rare potential poor household identification with a focus embedded logistic regression. IEEE Access 10:32954–32972. https://doi.org/10.1109/ACCESS.2022.3161574
Yang Y-Z, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. NeuIPS 33:19290–19301
Yu S-H, Guo J-F, Zhang R-Q, Fan Y-X, Wang Z-Z, Cheng X-Q (2022) A re-balancing strategy for class-imbalanced classification based on instance difficulty. In: CVPR, pp. 70–79
Zhang Y-F, Kang B-Y, Hooi B, Yan S-C, Feng J-S (2021) Deep long-tailed learning: a survey. IEEE Trans Pattern Anal Mach Intell 1:1–20. https://doi.org/10.1109/TPAMI.2023.3268118
Zhang S-Y, Li Z-M, Yan S-P, He X-M, Sun J (2021) Distribution alignment: a unified framework for long-tail visual recognition. In: CVPR, pp. 2361–2370
Zhong Z-S, Cui J-Q, Liu S, Jia J-Y (2021) Improving calibration for long-tailed recognition. In: CVPR, pp. 16489–16498
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009
Zhu L-C, Yang Y (2020) Inflated episodic memory with region self-attention for long-tailed visual recognition. In: CVPR
Funding
This work was supported in part by Central Government Funds of Guiding Local Scientific and Technological Development (Grant Number 2021ZYD0003), the National Natural Science Foundation of China (Grant Number 62006200), the Sichuan Science and Technology Program of China (Grant Number 2021YFS0407), and the Sichuan Provincial Transfer Payment Program of China (Grant Number R21ZYZF0006). We thank Zhi-Heng Zhang for his valuable suggestions. We thank Liwen Bianji (Edanz) (www.liwenbianji.cn/) for editing the English text of a draft of this manuscript.
Author information
Authors and Affiliations
Contributions
YXW proposed the original methodology and wrote the main manuscript. FM supervised the paper writing and method improvement. BWZ and XJW wrote the code and undertook the experiments. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Discussion of cross-entropy based losses
Appendix: Discussion of cross-entropy based losses
We discuss seven types of cross-entropy based losses: TWL King and Zeng (2001), FL Lin et al. (2017), CBL Cui et al. (2019), EQL Tan et al. (2020), PolyLoss Leng et al. (2022), CDB Sinha et al. (2020), and DQBL. They fall into two categories: class-wise weighted loss (CWL) and sample-wise weighted loss (SWL). CWL examines some properties of each class in the training data or validation data, and the weights are assigned class by class, whereas SWL does not ignore the classification probability (i.e., p) of an instance, and p is the basis for adjusting the sample weight during the training process. For instance, TWL assigns fixed weights to each class using the number of training samples, and aims to compensate for differences between the sample distribution and population distribution. CBL considers the effective number of instances, thereby adjusting the weights of each class. CDB measures the class difficulty by considering the overall evaluation accuracy. In DQBL, the class weights are adjusted based on DQ. FL and PolyLoss belong to the SWL family because they adjust the sample weights according to p. FL widens the gap of the relative weights between easy and hard samples, thus improving the classification accuracy of hard samples. PolyLoss considers the sample weights as the polynomial functions of p; additionally, the weights are a type of sum, that is, the Taylor expansion of CEL or FL.
Next, we examine the flexibility of adjusting the weights for each loss, which is characterized by the weight adjusting frequency during training. Regarding the two SWLs, that is, FL and PolyLoss, they both adjust the weight batch by batch. To narrow CWL down, TWL and CBL assign a fixed weight during the entire training process. These fixed weighting losses only consider the endogenous property of the training data, that is, the distribution of each class. Softmax EQL also belongs to CWL. Because of the random factor, softmax EQL modifies the class weights batch by batch. In the machine learning domain, a general hypothesis is that the distributions of training and validation data are consistent. In this paper, we satisfied the hypothesis because the training and validation data were randomly drawn from the same sample space. Although the distribution of validation data seems to be of little use, there are still some exogenous properties of these data, such as class difficulty Sinha et al. (2020) and misclassification cost Wu et al. (2019). In DQBL, we consider both validation accuracy and discriminant uncertainty to measure DQ throughout training. The class weight of DQBL is rebalanced epoch by epoch, which is similar to the reweighting mechanism of CDB. To summarize, the comparison among these losses in terms of flexibility is:
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, YX., Min, F., Zhang, BW. et al. Long-tailed image recognition through balancing discriminant quality. Artif Intell Rev 56 (Suppl 1), 833–856 (2023). https://doi.org/10.1007/s10462-023-10544-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-023-10544-x