Long-tailed image recognition through balancing discriminant quality

Wu, Yan-Xue; Min, Fan; Zhang, Ben-Wen; Wang, Xian-Jie

doi:10.1007/s10462-023-10544-x

Long-tailed image recognition through balancing discriminant quality

Published: 07 July 2023

Volume 56, pages 833–856, (2023)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Yan-Xue Wu¹,
Fan Min^2,3,
Ben-Wen Zhang⁴ &
…
Xian-Jie Wang⁵

170 Accesses
Explore all metrics

Abstract

Long-tailed image recognition is a challenging task in real scenes with large-scale data. Popular strategies, such as loss reweighting and data resampling, aim to reduce the model bias toward head classes. Specifically, different loss reweighting approaches explore various endogenous or exogenous measures. In this paper, we study a new endogenous measure called discriminant quality (DQ) by considering validation accuracy and discriminant uncertainty. DQ takes advantage of continuous information over a period of time. It is more robust than instantaneous information because of the mitigation of measuring instability caused by random perturbations during training. Additionally, the weight of each class is automatically rebalanced based on DQ. Consequently, the class weight supports the design of a dynamic updating strategy for the significance of the DQ difference. Experiments on MNIST-LT, CIFAR-100-LT, ImageNet-LT, and Places-LT demonstrated the superiority of DQ over state-of-the-art ones in terms of prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing Balance from Imbalance for Long-Tailed Image Recognition

A dual progressive strategy for long-tailed visual recognition

Article 06 November 2023

SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification

References

Bengio S (2015) Sharing representations for long tail computer vision problems. In: ICMI, pp. 1–1 (2015). https://doi.org/10.1145/2818346.2818348
Cai J-R, Wang Y-Z, Hwang J-N (2021) Ace: ally complementary experts for solving long-tailed recognition in one-shot. In: ICCV, pp. 112–121
Cao K, Wei C, Gaidon A, Aréchiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: NeurIPS, pp. 1567–1578. http://arxiv.org/abs/1906.07413
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: ICML. https://doi.org/10.5555/3524938.3525087
Cui Y, Song Y, Sun C, Howard A, Belongie S (2018) Large scale fine-grained categorization and domain-specific transfer learning. In: CVPR, pp. 4109–4118
Cui Y, Jia M-L, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: CVPR, pp. 9268–9277. https://doi.org/10.1109/cvpr.2019.00949
Cui J-Q, Zhong Z-S, Liu S, Yu B, Jia J-Y (2021) Parametric contrastive learning. In: ICCV, pp. 715–724
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, vol. 1, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Duggal R, Freitas S, Dhamnani S, Chau DH, Sun J (2020) ELF: an early-exiting framework for long-tailed classification. arXiv preprint arXiv:2006.11979
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, vol. 70, pp. 1126–1135
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2018) Accurate, large minibatch SGD: training imagenet in 1 hour. https://doi.org/10.48550/arXiv.1706.02677
He H-B, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
He K-M, Zhang X-Y, Ren S-Q, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
He K, Fan H-Q, Wu Y-X, Xie S-N, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR
Hong Y, Han S, Choi K, Seo S, Kim B, Chang B (2021) Disentangling label distribution for long-tailed visual recognition. In: CVPR, pp. 6626–6636
Huang C, Li Y, Loy CC, Tang X-O (2016) Learning deep representation for imbalanced classification. In: CVPR
Jamal MA, Brown M, Yang M-H, Wang L, Gong B (2020) Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: CVPR
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(27):1–54. https://doi.org/10.1186/s40537-019-0192-5
Article Google Scholar
Kang B-Y, Xie S-N, Rohrbach M, Yan Z-C, Gordo A, Feng J-S, Kalantidis Y (2020) Decoupling representation and classifier for long-tailed recognition. In: ICLR. https://openreview.net/forum?id=r1gRTCVFvB
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. In: NeurIPS, vol. 33, pp. 18661–18673
King G, Zeng L-C (2001) Logistic regression in rare events data. Soc Sci Electron Publ 9(2):137–163. https://doi.org/10.1093/oxfordjournals.pan.a004868
Article Google Scholar
Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Leng Z-Q, Tan M-X, Liu C-X, Cubuk ED, Shi J, Cheng S-Y, Anguelov D (2022) Polyloss: a polynomial expansion perspective of classification loss functions. In: ICLR
Lin T-Y, Goyal P, Girshick R, He K-M, Dollár P (2017) Focal loss for dense object detection. In: ICCV, pp. 2980–2988. https://doi.org/10.1109/iccv.2017.324
Liu Z-W, Miao Z-Q, Zhang X-H, Wang J-Y, Guo B-Q, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: CVPR
Liu B, Li H-X, Kang H, Hua G, Vasconcelos N (2021) Gistnet: a geometric structure transfer network for long-tailed recognition. In: ICCV, pp. 8189–8198
Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, van der Maaten L (2018) Exploring the limits of weakly supervised pretraining. In: ECCV, pp. 185–201
Ouyang W-L, Wang X-G, Zhang C, Yang X-K (2016) Factors in finetuning deep model for object detection with long-tail distribution. In: CVPR, pp. 864–873. https://doi.org/10.1109/CVPR.2016.100
Ren M-Y, Zeng W-Y, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: ICML
Ren J-W, Yu C-J, Sheng S, Ma X, Zhao H-Y, Yi S, Li H-S (2020) Balanced meta-softmax for long-tailed visual recognition. In: NeurIPS, vol. 33, pp. 4175–4186
Samuel D, Chechik G (2021) Distributional robustness loss for long-tail learning. In: ICCV, pp. 9495–9504
Shen L, Lin Z, Huang Q (2016) Relay backpropagation for effective learning of deep convolutional neural networks. In: ECCV, pp. 467–482
Shu J, Xie Q, Yi L-X, Zhao Q, Zhou S-P, Xu Z-B, Meng D-Y (2019) Meta-weight-net: Learning an explicit mapping for sample weighting. In: NeurIPS, pp. 1919–1930
Sinha S, Ohashi H (2022) Difficulty-net: learning to predict difficulty for long-tailed recognition. arXiv preprint arXiv:2209.02960
Sinha S, Ohashi H, Nakamura K (2020) Class-wise difficulty-balanced loss for solving class-imbalance. In: ACCV. https://doi.org/10.48550/arXiv.2010.01824
Sinha S, Ohashi H, Nakamura K (2022) Class-difficulty based methods for long-tailed visual recognition. Int J Comput Vis 130(10):2517–2531
Article Google Scholar
Tan J-R, Wang C-B, Li B-Y, Li Q-Q, Ouyang W-L, Yin C-Q, Yan J-J (2020) Equalization loss for long-tailed object recognition. In: CVPR, pp. 11662–11671. https://doi.org/10.1109/CVPR42600.2020.01168
Tang K-H, Huang J-Q, Zhang H-W (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33:1513–1524
Google Scholar
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128
Article Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
MATH Google Scholar
Wan W-T, Chen J-S, Li T-P, Huang Y-Q, Tian J-Q, Yu C, Xue Y-Z (2019) Information entropy based feature pooling for convolutional neural networks. In: ICCV
Wang Y-X, Ramanan D, Hebert M (2017) Learning to model the tail. In: NeuIPS. 30: 1–11
Wang M, Lin Y, Min F, Liu D (2019) Cost-sensitive active learning through statistical methods. Inf Sci 501:460–482. https://doi.org/10.1016/j.ins.2019.06.015
Article MathSciNet MATH Google Scholar
Wang P, Han K, Wei X-S, Zhang L, Wang L (2021) Contrastive learning based hybrid networks for long-tailed image classification. In: CVPR, pp. 943–952
Wu Y-X, Min X-Y, Min F, Wang M (2019) Cost-sensitive active learning with a label uniform distribution model. Int J Approx Reason 105:49–65. https://doi.org/10.1016/j.ijar.2018.11.004
Article MathSciNet MATH Google Scholar
Wu Y-X, Hu Z-N, Wang Y-Y, Min F (2022) Rare potential poor household identification with a focus embedded logistic regression. IEEE Access 10:32954–32972. https://doi.org/10.1109/ACCESS.2022.3161574
Article Google Scholar
Yang Y-Z, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. NeuIPS 33:19290–19301
Google Scholar
Yu S-H, Guo J-F, Zhang R-Q, Fan Y-X, Wang Z-Z, Cheng X-Q (2022) A re-balancing strategy for class-imbalanced classification based on instance difficulty. In: CVPR, pp. 70–79
Zhang Y-F, Kang B-Y, Hooi B, Yan S-C, Feng J-S (2021) Deep long-tailed learning: a survey. IEEE Trans Pattern Anal Mach Intell 1:1–20. https://doi.org/10.1109/TPAMI.2023.3268118
Article Google Scholar
Zhang S-Y, Li Z-M, Yan S-P, He X-M, Sun J (2021) Distribution alignment: a unified framework for long-tail visual recognition. In: CVPR, pp. 2361–2370
Zhong Z-S, Cui J-Q, Liu S, Jia J-Y (2021) Improving calibration for long-tailed recognition. In: CVPR, pp. 16489–16498
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009
Article Google Scholar
Zhu L-C, Yang Y (2020) Inflated episodic memory with region self-attention for long-tailed visual recognition. In: CVPR

Download references

Funding

This work was supported in part by Central Government Funds of Guiding Local Scientific and Technological Development (Grant Number 2021ZYD0003), the National Natural Science Foundation of China (Grant Number 62006200), the Sichuan Science and Technology Program of China (Grant Number 2021YFS0407), and the Sichuan Provincial Transfer Payment Program of China (Grant Number R21ZYZF0006). We thank Zhi-Heng Zhang for his valuable suggestions. We thank Liwen Bianji (Edanz) (www.liwenbianji.cn/) for editing the English text of a draft of this manuscript.

Author information

Authors and Affiliations

School of Information and Engineering, Sichuan Tourism University, Hongling Road, Chengdu, 610100, Sichuan, China
Yan-Xue Wu
School of Computer Science, Southwest Petroleum University, Xindu Road, Chengdu, 610500, Sichuan, China
Fan Min
Lab of Machine Learning, Southwest Petroleum University, Xindu Road, Chengdu, 610500, Sichuan, China
Fan Min
Department of Computer Science, Sichuan University for Nationalities, Culture Road, Kangding, 626103, Garzê Tibetan Autonomous Prefecture, China
Ben-Wen Zhang
College of Computer Science and Engineering, Chongqing University of Technology, Banan District, Chongqing, 400054, China
Xian-Jie Wang

Authors

Yan-Xue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fan Min
View author publications
You can also search for this author in PubMed Google Scholar
Ben-Wen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YXW proposed the original methodology and wrote the main manuscript. FM supervised the paper writing and method improvement. BWZ and XJW wrote the code and undertook the experiments. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Yan-Xue Wu or Fan Min.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Discussion of cross-entropy based losses

We discuss seven types of cross-entropy based losses: TWL King and Zeng (2001), FL Lin et al. (2017), CBL Cui et al. (2019), EQL Tan et al. (2020), PolyLoss Leng et al. (2022), CDB Sinha et al. (2020), and DQBL. They fall into two categories: class-wise weighted loss (CWL) and sample-wise weighted loss (SWL). CWL examines some properties of each class in the training data or validation data, and the weights are assigned class by class, whereas SWL does not ignore the classification probability (i.e., p) of an instance, and p is the basis for adjusting the sample weight during the training process. For instance, TWL assigns fixed weights to each class using the number of training samples, and aims to compensate for differences between the sample distribution and population distribution. CBL considers the effective number of instances, thereby adjusting the weights of each class. CDB measures the class difficulty by considering the overall evaluation accuracy. In DQBL, the class weights are adjusted based on DQ. FL and PolyLoss belong to the SWL family because they adjust the sample weights according to p. FL widens the gap of the relative weights between easy and hard samples, thus improving the classification accuracy of hard samples. PolyLoss considers the sample weights as the polynomial functions of p; additionally, the weights are a type of sum, that is, the Taylor expansion of CEL or FL.

Next, we examine the flexibility of adjusting the weights for each loss, which is characterized by the weight adjusting frequency during training. Regarding the two SWLs, that is, FL and PolyLoss, they both adjust the weight batch by batch. To narrow CWL down, TWL and CBL assign a fixed weight during the entire training process. These fixed weighting losses only consider the endogenous property of the training data, that is, the distribution of each class. Softmax EQL also belongs to CWL. Because of the random factor, softmax EQL modifies the class weights batch by batch. In the machine learning domain, a general hypothesis is that the distributions of training and validation data are consistent. In this paper, we satisfied the hypothesis because the training and validation data were randomly drawn from the same sample space. Although the distribution of validation data seems to be of little use, there are still some exogenous properties of these data, such as class difficulty Sinha et al. (2020) and misclassification cost Wu et al. (2019). In DQBL, we consider both validation accuracy and discriminant uncertainty to measure DQ throughout training. The class weight of DQBL is rebalanced epoch by epoch, which is similar to the reweighting mechanism of CDB. To summarize, the comparison among these losses in terms of flexibility is:

$$\begin{aligned} (\text {FL}, \text {PolyLoss}, \text {EQL})> (\text {DQBL}, \text {CDB}) > \text {(TWL, CBL)}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, YX., Min, F., Zhang, BW. et al. Long-tailed image recognition through balancing discriminant quality. Artif Intell Rev 56 (Suppl 1), 833–856 (2023). https://doi.org/10.1007/s10462-023-10544-x

Download citation

Published: 07 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10462-023-10544-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Long-tailed image recognition through balancing discriminant quality

Abstract

Access this article

Similar content being viewed by others

Constructing Balance from Imbalance for Long-Tailed Image Recognition

A dual progressive strategy for long-tailed visual recognition

SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix: Discussion of cross-entropy based losses

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Long-tailed image recognition through balancing discriminant quality

Abstract

Access this article

Similar content being viewed by others

Constructing Balance from Imbalance for Long-Tailed Image Recognition

A dual progressive strategy for long-tailed visual recognition

SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix: Discussion of cross-entropy based losses

Appendix: Discussion of cross-entropy based losses

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation