Skip to main content
Log in

Prototype-based classifier learning for long-tailed visual recognition

  • Research Paper
  • Special Focus on Deep Learning for Computer Vision
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In this paper, we tackle the long-tailed visual recognition problem from the categorical prototype perspective by proposing a prototype-based classifier learning (PCL) method. Specifically, thanks to the generalization ability and robustness, categorical prototypes reveal their advantages of representing the category semantics. Coupled with their class-balance characteristic, categorical prototypes also show potential for handling data imbalance. In our PCL, we propose to generate the categorical classifiers based on the prototypes by performing a learnable mapping function. To further alleviate the impact of imbalance on classifier generation, two kinds of classifier calibration approaches are designed from both prototype-level and example-level aspects. Extensive experiments on five benchmark datasets, including the large-scale iNaturalist, Places-LT, and ImageNet-LT, justify that the proposed PCL can outperform state-of-the-arts. Furthermore, validation experiments can demonstrate the effectiveness of tailored designs in PCL for long-tailed problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kendall M G, Stuart A, Ord J K, et al. Kendall’s Advanced Theory of Statistics. Volume 1. Distribution Theory. 5th ed. New York: Oxford University Press, 1987

    MATH  Google Scholar 

  2. van Horn G, Perona P. The devil is in the tails: fine-grained classification in the wild. 2017. ArXiv:1709.01450

  3. van Horn G, Mac Aodha O, Song Y, et al. The iNaturalist species classification and detection dataset. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8769–8778

  4. Gupta A, Dollár P, Girshick R. LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 5356–5364

  5. Wei X S, Cui Q, Yang L, et al. RPC: a large-scale retail product checkout dataset. 2019. ArXiv:1901.07249

  6. Shen L, Lin Z, Huang Q. Relay backpropagation for effective learning of deep convolutional neural networks. In: Proceedings of European Conference on Computer Vision, 2016. 467–482

  7. Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intell Data Anal, 2002, 6: 429–449

    Article  MATH  Google Scholar 

  8. Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proceedings of IEEE International Conference on Data Mining, 2006. 970–974

  9. Huang C, Li Y, Loy C C, et al. Learning deep representation for imbalanced classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5375–5384

  10. Zhou B, Cui Q, Wei X S, et al. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 9719–9728

  11. Kang B, Xie S, Rohrbach M, et al. Decoupling representation and classifier for long-tailed recognition. In: Proceedings of International Conference on Learning Representations, 2020. 1–16

  12. Kim J, Jeong J, Shin J. M2m: imbalanced classification via major-to-minor translation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 13896–13905

  13. Chu P, Bian X, Liu S, et al. Feature space augmentation for long-tailed data. In: Proceedings of European Conference on Computer Vision, 2020. 694–710

  14. He Y Y, Wu J, Wei X S. Distilling virtual examples for long-tailed recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2021. 235–244

  15. Menon A K, Jayasumana S, Rawat A S, et al. Long-tail learning via logit adjustment. In: Proceedings of International Conference on Learning Representations, 2020. 1–13

  16. Cao K, Wei C, Gaidon A, et al. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of International Conference on Neural Information Processing Systems, 2019. 1–18

  17. Tan J, Wang C, Li B, et al. Equalization loss for long-tailed object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 11662–11671

  18. Viéville T, Crahay S. Using an Hebbian learning rule for multi-class SVM classifiers. J Comput Neurosci, 2004, 17: 271–287

    Article  Google Scholar 

  19. Yang H M, Zhang X Y, Yin F, et al. Robust classification with convolutional prototype learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3474–3482

  20. Xiao M, Kortylewski A, Wu R, et al. TDMPNet: prototype network with recurrent top-down modulation for robust object classification under partial occlusion. In: Proceedings of European Conference on Computer Vision, 2020. 447–463

  21. Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, TR-2009-1618. 2009

  22. Zhou B, Lapedriza A, Khosla A, et al. Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell, 2017, 40: 1452–1464

    Article  Google Scholar 

  23. Liu Z, Miao Z, Zhan X, et al. Large-scale long-tailed recognition in an open world. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2537–2546

  24. Zhou Z-H, Liu X-Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng, 2006, 18: 63–77

    Article  Google Scholar 

  25. Liu X-Y, Wu J X, Zhou Z-H. Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B, 2009, 39: 539–550

    Article  Google Scholar 

  26. Zhang M L, Li Y K, Liu X Y. Towards class-imbalance aware multi-label learning. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. 4041–4047

  27. Zhang J, Liu L, Wang P, et al. Exploring the auxiliary learning for long-tailed visual recognition. Neurocomputing, 2021, 449: 303–314

    Article  Google Scholar 

  28. Buda M, Maki A, Mazurowski M A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 2018, 106: 249–259

    Article  Google Scholar 

  29. Byrd J, Lipton Z. What is the effect of importance weighting in deep learning? In: Proceedings of International Conference on Machine Learning, 2019. 872–881

  30. He H B, Garcia E A. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 2009, 21: 1263–1284

    Article  Google Scholar 

  31. Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321–357

    Article  MATH  Google Scholar 

  32. Cui Y, Jia M, Lin T Y, et al. Class-balanced loss based on effective number of samples. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 9268–9277

  33. Wang Y X, Ramanan D, Hebert M. Learning to model the tail. In: Proceedings of International Conference on Neural Information Processing Systems, 2017. 7029–7039

  34. Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of International Conference on Neural Information Processing Systems, 2013. 3111–3119

  35. Shu J, Xie Q, Yi L, et al. Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of International Conference on Neural Information Processing Systems, 2019. 1917–1928

  36. Salman K, Munawar H, Waqas Z S, et al. Striking the right balance with uncertainty. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 103–112

  37. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444

    Article  Google Scholar 

  38. Zhu L, Yang Y. Inflated episodic memory with region self-attention for long-tailed visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 4343–4352

  39. Zhang Y, Wei X S, Zhou B, et al. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In: Proceedings of American Association for Artificial Intelligence, 2021. 3447–3455

  40. Liu J, Sun Y, Han C, et al. Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 2970–2979

  41. Zhou B, Khosla A, lapedriza A, et al. Learning deep features for discriminative localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2921–2929

  42. Manning C, Raghavan P, Schutze H. Vector space classification. Cambridge: Cambridge University Press, 2008

    Book  Google Scholar 

  43. Tibshirani R, Hastie T. Diagnosis of multiple cancer types by shrunken centroids of gene expression. In: Proceedings of the National Academy of Sciences, 2002. 6567–6572

  44. Wang P, Liu L, Shen C, et al. Multi-attention network for one shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2721–2729

  45. Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proceedings of International Conference on Neural Information Processing Systems, 2017. 1–11

  46. Edwards H, Storkey A. Towards a neural statistician. In: Proceedings of International Conference on Learning Representations, 2017. 1–14

  47. Fort S. Gaussian prototypical networks for few-shot learning on Omniglot. 2017. ArXiv:1708.02735

  48. Hecht T, Gepperth A. Computational Advantages of Deep Prototype-Based Learning. Technical Report, hal-01418135. 2016

  49. Banerjee A, Merugu S, Dhillon I S, et al. Clustering with Bregman divergences. J Mach Learn Res, 2005, 61: 1705–1749

    MathSciNet  MATH  Google Scholar 

  50. Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of International Conference on Machine Learning, 2005. 625–632

  51. Wei X S, Song Y Z, Aodha O M, et al. Fine-grained image analysis with deep learning: a survey. IEEE Trans Pattern Anal Mach Intell, 2021. doi: https://doi.org/10.1109/TPAMI.2021.3126648

  52. Wei X S, Luo J H, Wu J, et al. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process, 2017, 26: 2868–2881

    Article  MathSciNet  MATH  Google Scholar 

  53. Wei X S, Wang P, Liu L, et al. Piecewise classifier mappings: learning fine-grained learners for novel categories with few examples. IEEE Trans Image Process, 2019, 28: 6116–6125

    Article  MathSciNet  MATH  Google Scholar 

  54. Wei X S, Shen Y, Sun X, et al. A2-Net: learning attribute-aware hash codes for large-scale fine-grained image retrieval. In: Proceedings of International Conference on Neural Information Processing Systems, 2021. 5720–5730

  55. Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248–255

  56. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778

  57. Zhang S, Li Z, Yan S, et al. Distribution alignment: a unified framework for long-tail visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2021. 2361–2370

  58. Khosla P, Teterwak P, Wang C, et al. Supervised contrastive learning. In: Proceedings of International Conference on Neural Information Processing Systems, 2020. 18661–18673

  59. Clevert D A, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of International Conference on Learning Representations, 2015. 1–14

  60. Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2980–2988

  61. Jamal M A, Brown M, Yang M H, et al. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 7610–7619

  62. Tang K, Huang J, Zhang H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Proceedings of International Conference on Neural Information Processing Systems, 2020. 1513–1524

  63. Wang P, Han K, Wei X S, et al. Contrastive learning based hybrid networks for long-tailed image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2021. 943–952

  64. Zhang X, Fang Z, Wen Y, et al. Range loss for deep face recognition with long-tailed training data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5409–5418

  65. Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4367–4375

  66. Liu Z, Miao Z, Zhan X, et al. Deep metric learning via lifted structured feature embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2537–2546

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2021YFA1001100), National Natural Science Foundation of China (Grant Nos. 61925201, 62132001, U21B2025, 61871226), Natural Science Foundation of Jiangsu Province of China (Grant No. BK20210340), Fundamental Research Funds for the Central Universities (Grant No. 30920041111), CAAI-Huawei MindSpore Open Fund, and Beijing Academy of Artificial Intelligence (BAAI). We gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks), and Ascend AI Processor used for this research.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liang Xiao or Yuxin Peng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, XS., Xu, SL., Chen, H. et al. Prototype-based classifier learning for long-tailed visual recognition. Sci. China Inf. Sci. 65, 160105 (2022). https://doi.org/10.1007/s11432-021-3489-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3489-1

Keywords

Navigation