Skip to main content
Log in

Long-tailed image recognition through balancing discriminant quality

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Long-tailed image recognition is a challenging task in real scenes with large-scale data. Popular strategies, such as loss reweighting and data resampling, aim to reduce the model bias toward head classes. Specifically, different loss reweighting approaches explore various endogenous or exogenous measures. In this paper, we study a new endogenous measure called discriminant quality (DQ) by considering validation accuracy and discriminant uncertainty. DQ takes advantage of continuous information over a period of time. It is more robust than instantaneous information because of the mitigation of measuring instability caused by random perturbations during training. Additionally, the weight of each class is automatically rebalanced based on DQ. Consequently, the class weight supports the design of a dynamic updating strategy for the significance of the DQ difference. Experiments on MNIST-LT, CIFAR-100-LT, ImageNet-LT, and Places-LT demonstrated the superiority of DQ over state-of-the-art ones in terms of prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bengio S (2015) Sharing representations for long tail computer vision problems. In: ICMI, pp. 1–1 (2015). https://doi.org/10.1145/2818346.2818348

  • Cai J-R, Wang Y-Z, Hwang J-N (2021) Ace: ally complementary experts for solving long-tailed recognition in one-shot. In: ICCV, pp. 112–121

  • Cao K, Wei C, Gaidon A, Aréchiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: NeurIPS, pp. 1567–1578. http://arxiv.org/abs/1906.07413

  • Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: ICML. https://doi.org/10.5555/3524938.3525087

  • Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp. 4109–4118

  • Cui Y, Jia M-L, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: CVPR, pp. 9268–9277. https://doi.org/10.1109/cvpr.2019.00949

  • Cui J-Q, Zhong Z-S, Liu S, Yu B, Jia J-Y (2021) Parametric contrastive learning. In: ICCV, pp. 715–724

  • Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, vol. 1, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  • Duggal R, Freitas S, Dhamnani S, Chau DH, Sun J (2020) ELF: an early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979

  • Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol. 70, pp. 1126–1135

  • Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2018) Accurate, large minibatch SGD: training imagenet in 1 hour. https://doi.org/10.48550/arXiv.1706.02677

  • He H-B, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969

  • He K-M, Zhang X-Y, Ren S-Q, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

  • He K, Fan H-Q, Wu Y-X, Xie S-N, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR

  • Hong Y, Han S, Choi K, Seo S, Kim B, Chang B (2021) Disentangling label distribution for long-tailed visual recognition. In: CVPR, pp. 6626–6636

  • Huang C, Li Y, Loy CC, Tang X-O (2016) Learning deep representation for imbalanced classification. In: CVPR

  • Jamal MA, Brown M, Yang M-H, Wang L, Gong B (2020) Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: CVPR

  • Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(27):1–54. https://doi.org/10.1186/s40537-019-0192-5

    Article  Google Scholar 

  • Kang B-Y, Xie S-N, Rohrbach M, Yan Z-C, Gordo A, Feng J-S, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: ICLR. https://openreview.net/forum?id=r1gRTCVFvB

  • Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: NeurIPS, vol. 33, pp. 18661–18673

  • King G, Zeng L-C (2001) Logistic regression in rare events data. Soc Sci Electron Publ 9(2):137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868

    Article  Google Scholar 

  • Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  • Leng Z-Q, Tan M-X, Liu C-X, Cubuk ED, Shi J, Cheng S-Y, Anguelov D (2022) Polyloss: a polynomial expansion perspective of classification loss functions. In: ICLR

  • Lin T-Y, Goyal P, Girshick R, He K-M, Dollár P (2017) Focal loss for dense object detection. In: ICCV, pp. 2980–2988. https://doi.org/10.1109/iccv.2017.324

  • Liu Z-W, Miao Z-Q, Zhang X-H, Wang J-Y, Guo B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR

  • Liu B, Li H-X, Kang H, Hua G, Vasconcelos N (2021) Gistnet: a geometric structure transfer network for long-tailed recognition. In: ICCV, pp. 8189–8198

  • Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, van der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: ECCV, pp. 185–201

  • Ouyang W-L, Wang X-G, Zhang C, Yang X-K (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: CVPR, pp. 864–873. https://doi.org/10.1109/CVPR.2016.100

  • Ren M-Y, Zeng W-Y, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: ICML

  • Ren J-W, Yu C-J, Sheng S, Ma X, Zhao H-Y, Yi S, Li H-S (2020) Balanced meta-softmax for long-tailed visual recognition. In: NeurIPS, vol. 33, pp. 4175–4186

  • Samuel D, Chechik G (2021) Distributional robustness loss for long-tail learning. In: ICCV, pp. 9495–9504

  • Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: ECCV, pp. 467–482

  • Shu J, Xie Q, Yi L-X, Zhao Q, Zhou S-P, Xu Z-B, Meng D-Y (2019) Meta-weight-net: Learning an explicit mapping for sample weighting. In: NeurIPS, pp. 1919–1930

  • Sinha S, Ohashi H (2022) Difficulty-net: learning to predict difficulty for long-tailed recognition. arXiv preprint arXiv:2209.02960

  • Sinha S, Ohashi H, Nakamura K (2020) Class-wise difficulty-balanced loss for solving class-imbalance. In: ACCV. https://doi.org/10.48550/arXiv.2010.01824

  • Sinha S, Ohashi H, Nakamura K (2022) Class-difficulty based methods for long-tailed visual recognition. Int J Comput Vis 130(10):2517–2531

    Article  Google Scholar 

  • Tan J-R, Wang C-B, Li B-Y, Li Q-Q, Ouyang W-L, Yin C-Q, Yan J-J (2020) Equalization loss for long-tailed object recognition. In: CVPR, pp. 11662–11671. https://doi.org/10.1109/CVPR42600.2020.01168

  • Tang K-H, Huang J-Q, Zhang H-W (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33:1513–1524

    Google Scholar 

  • Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128

    Article  Google Scholar 

  • Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

  • Wan W-T, Chen J-S, Li T-P, Huang Y-Q, Tian J-Q, Yu C, Xue Y-Z (2019) Information entropy based feature pooling for convolutional neural networks. In: ICCV

  • Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: NeuIPS. 30: 1–11

  • Wang M, Lin Y, Min F, Liu D (2019) Cost-sensitive active learning through statistical methods. Inf Sci 501:460–482. https://doi.org/10.1016/j.ins.2019.06.015

    Article  MathSciNet  MATH  Google Scholar 

  • Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: CVPR, pp. 943–952

  • Wu Y-X, Min X-Y, Min F, Wang M (2019) Cost-sensitive active learning with a label uniform distribution model. Int J Approx Reason 105:49–65. https://doi.org/10.1016/j.ijar.2018.11.004

    Article  MathSciNet  MATH  Google Scholar 

  • Wu Y-X, Hu Z-N, Wang Y-Y, Min F (2022) Rare potential poor household identification with a focus embedded logistic regression. IEEE Access 10:32954–32972. https://doi.org/10.1109/ACCESS.2022.3161574

    Article  Google Scholar 

  • Yang Y-Z, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. NeuIPS 33:19290–19301

    Google Scholar 

  • Yu S-H, Guo J-F, Zhang R-Q, Fan Y-X, Wang Z-Z, Cheng X-Q (2022) A re-balancing strategy for class-imbalanced classification based on instance difficulty. In: CVPR, pp. 70–79

  • Zhang Y-F, Kang B-Y, Hooi B, Yan S-C, Feng J-S (2021) Deep long-tailed learning: a survey. IEEE Trans Pattern Anal Mach Intell 1:1–20. https://doi.org/10.1109/TPAMI.2023.3268118

    Article  Google Scholar 

  • Zhang S-Y, Li Z-M, Yan S-P, He X-M, Sun J (2021) Distribution alignment: a unified framework for long-tail visual recognition. In: CVPR, pp. 2361–2370

  • Zhong Z-S, Cui J-Q, Liu S, Jia J-Y (2021) Improving calibration for long-tailed recognition. In: CVPR, pp. 16489–16498

  • Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009

    Article  Google Scholar 

  • Zhu L-C, Yang Y (2020) Inflated episodic memory with region self-attention for long-tailed visual recognition. In: CVPR

Download references

Funding

This work was supported in part by Central Government Funds of Guiding Local Scientific and Technological Development (Grant Number 2021ZYD0003), the National Natural Science Foundation of China (Grant Number 62006200), the Sichuan Science and Technology Program of China (Grant Number 2021YFS0407), and the Sichuan Provincial Transfer Payment Program of China (Grant Number R21ZYZF0006). We thank Zhi-Heng Zhang for his valuable suggestions. We thank Liwen Bianji (Edanz) (www.liwenbianji.cn/) for editing the English text of a draft of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

YXW proposed the original methodology and wrote the main manuscript. FM supervised the paper writing and method improvement. BWZ and XJW wrote the code and undertook the experiments. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Yan-Xue Wu or Fan Min.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Discussion of cross-entropy based losses

Appendix: Discussion of cross-entropy based losses

We discuss seven types of cross-entropy based losses: TWL King and Zeng (2001), FL Lin et al. (2017), CBL Cui et al. (2019), EQL Tan et al. (2020), PolyLoss Leng et al. (2022), CDB Sinha et al. (2020), and DQBL. They fall into two categories: class-wise weighted loss (CWL) and sample-wise weighted loss (SWL). CWL examines some properties of each class in the training data or validation data, and the weights are assigned class by class, whereas SWL does not ignore the classification probability (i.e., p) of an instance, and p is the basis for adjusting the sample weight during the training process. For instance, TWL assigns fixed weights to each class using the number of training samples, and aims to compensate for differences between the sample distribution and population distribution. CBL considers the effective number of instances, thereby adjusting the weights of each class. CDB measures the class difficulty by considering the overall evaluation accuracy. In DQBL, the class weights are adjusted based on DQ. FL and PolyLoss belong to the SWL family because they adjust the sample weights according to p. FL widens the gap of the relative weights between easy and hard samples, thus improving the classification accuracy of hard samples. PolyLoss considers the sample weights as the polynomial functions of p; additionally, the weights are a type of sum, that is, the Taylor expansion of CEL or FL.

Next, we examine the flexibility of adjusting the weights for each loss, which is characterized by the weight adjusting frequency during training. Regarding the two SWLs, that is, FL and PolyLoss, they both adjust the weight batch by batch. To narrow CWL down, TWL and CBL assign a fixed weight during the entire training process. These fixed weighting losses only consider the endogenous property of the training data, that is, the distribution of each class. Softmax EQL also belongs to CWL. Because of the random factor, softmax EQL modifies the class weights batch by batch. In the machine learning domain, a general hypothesis is that the distributions of training and validation data are consistent. In this paper, we satisfied the hypothesis because the training and validation data were randomly drawn from the same sample space. Although the distribution of validation data seems to be of little use, there are still some exogenous properties of these data, such as class difficulty Sinha et al. (2020) and misclassification cost Wu et al. (2019). In DQBL, we consider both validation accuracy and discriminant uncertainty to measure DQ throughout training. The class weight of DQBL is rebalanced epoch by epoch, which is similar to the reweighting mechanism of CDB. To summarize, the comparison among these losses in terms of flexibility is:

$$\begin{aligned} (\text {FL}, \text {PolyLoss}, \text {EQL})> (\text {DQBL}, \text {CDB}) > \text {(TWL, CBL)}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, YX., Min, F., Zhang, BW. et al. Long-tailed image recognition through balancing discriminant quality. Artif Intell Rev 56 (Suppl 1), 833–856 (2023). https://doi.org/10.1007/s10462-023-10544-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10544-x

Keywords

Navigation