Abstract
In this paper, we tackle the long-tailed visual recognition problem from the categorical prototype perspective by proposing a prototype-based classifier learning (PCL) method. Specifically, thanks to the generalization ability and robustness, categorical prototypes reveal their advantages of representing the category semantics. Coupled with their class-balance characteristic, categorical prototypes also show potential for handling data imbalance. In our PCL, we propose to generate the categorical classifiers based on the prototypes by performing a learnable mapping function. To further alleviate the impact of imbalance on classifier generation, two kinds of classifier calibration approaches are designed from both prototype-level and example-level aspects. Extensive experiments on five benchmark datasets, including the large-scale iNaturalist, Places-LT, and ImageNet-LT, justify that the proposed PCL can outperform state-of-the-arts. Furthermore, validation experiments can demonstrate the effectiveness of tailored designs in PCL for long-tailed problems.
Similar content being viewed by others
References
Kendall M G, Stuart A, Ord J K, et al. Kendall’s Advanced Theory of Statistics. Volume 1. Distribution Theory. 5th ed. New York: Oxford University Press, 1987
van Horn G, Perona P. The devil is in the tails: fine-grained classification in the wild. 2017. ArXiv:1709.01450
van Horn G, Mac Aodha O, Song Y, et al. The iNaturalist species classification and detection dataset. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8769–8778
Gupta A, Dollár P, Girshick R. LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 5356–5364
Wei X S, Cui Q, Yang L, et al. RPC: a large-scale retail product checkout dataset. 2019. ArXiv:1901.07249
Shen L, Lin Z, Huang Q. Relay backpropagation for effective learning of deep convolutional neural networks. In: Proceedings of European Conference on Computer Vision, 2016. 467–482
Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intell Data Anal, 2002, 6: 429–449
Liu X Y, Zhou Z H. The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proceedings of IEEE International Conference on Data Mining, 2006. 970–974
Huang C, Li Y, Loy C C, et al. Learning deep representation for imbalanced classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5375–5384
Zhou B, Cui Q, Wei X S, et al. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 9719–9728
Kang B, Xie S, Rohrbach M, et al. Decoupling representation and classifier for long-tailed recognition. In: Proceedings of International Conference on Learning Representations, 2020. 1–16
Kim J, Jeong J, Shin J. M2m: imbalanced classification via major-to-minor translation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 13896–13905
Chu P, Bian X, Liu S, et al. Feature space augmentation for long-tailed data. In: Proceedings of European Conference on Computer Vision, 2020. 694–710
He Y Y, Wu J, Wei X S. Distilling virtual examples for long-tailed recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2021. 235–244
Menon A K, Jayasumana S, Rawat A S, et al. Long-tail learning via logit adjustment. In: Proceedings of International Conference on Learning Representations, 2020. 1–13
Cao K, Wei C, Gaidon A, et al. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of International Conference on Neural Information Processing Systems, 2019. 1–18
Tan J, Wang C, Li B, et al. Equalization loss for long-tailed object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 11662–11671
Viéville T, Crahay S. Using an Hebbian learning rule for multi-class SVM classifiers. J Comput Neurosci, 2004, 17: 271–287
Yang H M, Zhang X Y, Yin F, et al. Robust classification with convolutional prototype learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3474–3482
Xiao M, Kortylewski A, Wu R, et al. TDMPNet: prototype network with recurrent top-down modulation for robust object classification under partial occlusion. In: Proceedings of European Conference on Computer Vision, 2020. 447–463
Krizhevsky A, Hinton G. Learning Multiple Layers of Features From Tiny Images. Technical Report, TR-2009-1618. 2009
Zhou B, Lapedriza A, Khosla A, et al. Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell, 2017, 40: 1452–1464
Liu Z, Miao Z, Zhan X, et al. Large-scale long-tailed recognition in an open world. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2537–2546
Zhou Z-H, Liu X-Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng, 2006, 18: 63–77
Liu X-Y, Wu J X, Zhou Z-H. Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B, 2009, 39: 539–550
Zhang M L, Li Y K, Liu X Y. Towards class-imbalance aware multi-label learning. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. 4041–4047
Zhang J, Liu L, Wang P, et al. Exploring the auxiliary learning for long-tailed visual recognition. Neurocomputing, 2021, 449: 303–314
Buda M, Maki A, Mazurowski M A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 2018, 106: 249–259
Byrd J, Lipton Z. What is the effect of importance weighting in deep learning? In: Proceedings of International Conference on Machine Learning, 2019. 872–881
He H B, Garcia E A. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 2009, 21: 1263–1284
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321–357
Cui Y, Jia M, Lin T Y, et al. Class-balanced loss based on effective number of samples. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 9268–9277
Wang Y X, Ramanan D, Hebert M. Learning to model the tail. In: Proceedings of International Conference on Neural Information Processing Systems, 2017. 7029–7039
Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of International Conference on Neural Information Processing Systems, 2013. 3111–3119
Shu J, Xie Q, Yi L, et al. Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of International Conference on Neural Information Processing Systems, 2019. 1917–1928
Salman K, Munawar H, Waqas Z S, et al. Striking the right balance with uncertainty. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 103–112
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444
Zhu L, Yang Y. Inflated episodic memory with region self-attention for long-tailed visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 4343–4352
Zhang Y, Wei X S, Zhou B, et al. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In: Proceedings of American Association for Artificial Intelligence, 2021. 3447–3455
Liu J, Sun Y, Han C, et al. Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 2970–2979
Zhou B, Khosla A, lapedriza A, et al. Learning deep features for discriminative localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2921–2929
Manning C, Raghavan P, Schutze H. Vector space classification. Cambridge: Cambridge University Press, 2008
Tibshirani R, Hastie T. Diagnosis of multiple cancer types by shrunken centroids of gene expression. In: Proceedings of the National Academy of Sciences, 2002. 6567–6572
Wang P, Liu L, Shen C, et al. Multi-attention network for one shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2721–2729
Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proceedings of International Conference on Neural Information Processing Systems, 2017. 1–11
Edwards H, Storkey A. Towards a neural statistician. In: Proceedings of International Conference on Learning Representations, 2017. 1–14
Fort S. Gaussian prototypical networks for few-shot learning on Omniglot. 2017. ArXiv:1708.02735
Hecht T, Gepperth A. Computational Advantages of Deep Prototype-Based Learning. Technical Report, hal-01418135. 2016
Banerjee A, Merugu S, Dhillon I S, et al. Clustering with Bregman divergences. J Mach Learn Res, 2005, 61: 1705–1749
Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of International Conference on Machine Learning, 2005. 625–632
Wei X S, Song Y Z, Aodha O M, et al. Fine-grained image analysis with deep learning: a survey. IEEE Trans Pattern Anal Mach Intell, 2021. doi: https://doi.org/10.1109/TPAMI.2021.3126648
Wei X S, Luo J H, Wu J, et al. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process, 2017, 26: 2868–2881
Wei X S, Wang P, Liu L, et al. Piecewise classifier mappings: learning fine-grained learners for novel categories with few examples. IEEE Trans Image Process, 2019, 28: 6116–6125
Wei X S, Shen Y, Sun X, et al. A2-Net: learning attribute-aware hash codes for large-scale fine-grained image retrieval. In: Proceedings of International Conference on Neural Information Processing Systems, 2021. 5720–5730
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248–255
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Zhang S, Li Z, Yan S, et al. Distribution alignment: a unified framework for long-tail visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2021. 2361–2370
Khosla P, Teterwak P, Wang C, et al. Supervised contrastive learning. In: Proceedings of International Conference on Neural Information Processing Systems, 2020. 18661–18673
Clevert D A, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of International Conference on Learning Representations, 2015. 1–14
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2980–2988
Jamal M A, Brown M, Yang M H, et al. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 7610–7619
Tang K, Huang J, Zhang H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Proceedings of International Conference on Neural Information Processing Systems, 2020. 1513–1524
Wang P, Han K, Wei X S, et al. Contrastive learning based hybrid networks for long-tailed image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2021. 943–952
Zhang X, Fang Z, Wen Y, et al. Range loss for deep face recognition with long-tailed training data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5409–5418
Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4367–4375
Liu Z, Miao Z, Zhan X, et al. Deep metric learning via lifted structured feature embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2537–2546
Acknowledgements
This work was supported by National Key R&D Program of China (Grant No. 2021YFA1001100), National Natural Science Foundation of China (Grant Nos. 61925201, 62132001, U21B2025, 61871226), Natural Science Foundation of Jiangsu Province of China (Grant No. BK20210340), Fundamental Research Funds for the Central Universities (Grant No. 30920041111), CAAI-Huawei MindSpore Open Fund, and Beijing Academy of Artificial Intelligence (BAAI). We gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks), and Ascend AI Processor used for this research.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wei, XS., Xu, SL., Chen, H. et al. Prototype-based classifier learning for long-tailed visual recognition. Sci. China Inf. Sci. 65, 160105 (2022). https://doi.org/10.1007/s11432-021-3489-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3489-1