Abstract
Irrespective of the attention that long-tailed classification has received over recent years, expectedly, the performance of the tail classes suffers more than the remaining classes. We address this problem by means of a novel data augmentation technique called Targeted Mixup. This is about mixing class samples based on the model’s performance regarding each class. Instances of classes that are difficult to distinguish are randomly chosen and linearly interpolated to produce a new sample such that the model can pay attention to those two classes. The expectation is that the model can learn the distinguishing features to improve classification of instances belonging to their respective classes. To prove the efficiency of our proposed methods empirically, we performed experiments using CIFAR-100-LT, Places-LT, and Speech Commands-LT datasets. From the results of the experiments, there was an improvement on the few-shot classes without sacrificing too much of the model performance on the many-shot and medium-shot classes. In fact, there was an increase in the overall accuracy as well.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Publicly available datasets were analyzed in this study. These data can be found here (accessed on 9 October 2024):
− https://www.cs.toronto.edu/~kriz/cifar.html
− https://liuziwei7.github.io/projects/LongTail.html
Speech Commands-LT, the long-tailed version of the Speech Commands dataset we have made, is available in the following (accessed on 1 April 2024):
References
Ahn S, Ko J, Yun SY (2023) Cuda: Curriculum of data augmentation for long-tailed recognition. In: The eleventh international conference on learning representations
Avramova V (2015) Curriculum learning with deep convolutional neural networks
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chu P, Bian, X, Liu S, Ling H (2020) Feature space augmentation for long-tailed data. In: Computer vision – ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX, pp 694–710
Cui Y, Jia M, Lin, TY, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9268–9277
Darkwah Jnr Y, Kang DK (2023) Triplet class-wise difficulty-based loss for long tail classification. Int J Internet Broadcast Commun 15(3):66–72
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Dvornik N, Mairal J, Schmid C (2018) Modeling visual context is key to augmenting object detection datasets. In: Proceedings of the european conference on computer vision (ECCV), pp 364–380
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Inoue H (2018) Data augmentation by pairing samples for images classification. arXiv:1801.02929
Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2019) Decoupling representation and classifier for long-tailed recognition. In: Proceedings of the international conference on learning representations
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. University of Toronto, Tech. rep
Li J, Wang QF, Huang K, Yang X, Zhang R, Goulermas JY (2023) Towards better long-tailed oracle character recognition with adversarial data augmentation. Pattern Recognit 140:109534
Lin TY, Goyal P, Girshick R, He K, Dollár, P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu SX (2019) Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2537–2546
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Shuai X, Shen Y, Jiang S, Zhao Z, Yan Z, Xing G (2022) BalanceFL: Addressing class imbalance in long-tail federated learning. In: Proceedings of the 21st ACM/IEEE international conference on information processing in sensor networks (IPSN), pp 271–284. IEEE
Sinha S, Ohashi H, Nakamura K (2020) Class-wise difficulty-balanced loss for solving class-imbalance. In: Proceedings of the asian conference on computer vision
Sinha S, Ohashi H, Nakamura K (2022) Class-difficulty based methods for long-tailed visual recognition. Int J Comput Vis 130(10):2517–2531
Tan J, Wang C, Li B, Li Q, Ouyang W, Yin C, Yan J (2020) Equalization loss for long-tailed object recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11662–11671
Verma V, Lamb A, Beckham C, Najafi A, Mitliagkas I, Lopez-Paz D, Bengio Y (2019) Manifold mixup: Better representations by interpolating hidden states. In: International conference on machine learning, pp 6438–6447. PMLR
Wang X, Chen Y, Zhu W (2022) A survey on curriculum learning. IEEE Trans Pattern Anal & Mach Intell 44(09):4555–4576
Warden P (2018) Speech commands: A dataset for limited-vocabulary speech recognition. arXiv:1804.03209
Winston J (2022) Warmup margin: Improving the performance of triplet loss with incremental margin. Master’s thesis, Dongseo University
Xiang X, Zhang Z, Chen X (2024) Curricular-balanced long-tailed learning. Neurocomputing 571:127121
Xu M, Yoon S, Fuentes A, Park DS (2023) A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit, pp 109347
Xu Z, Meng A, Shi Z, Yang W, Chen Z, Huang L (2021) Continuous copy-paste for one-stage multi-object tracking and segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15323–15332
Yao H, Wang Y, Zhang L, Zou JY, Finn C (2022) C-mixup: Improving generalization in regression. Adv Neural Inf Process Syst 35:3361–3376
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6023–6032
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: Beyond empirical risk minimization. In: Proceedings of the international conference on learning representations
Zhou B, Cui Q, Wei XS, Chen ZM (2020) BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9719–9728
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464
Acknowledgements
The authors express gratitude to the Dongseo University Machine Learning/Deep Learning Research Lab members and the anonymous reviewers for their valuable insights and feedback on earlier versions of this paper.
Funding
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2022R1A2C2012243).
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the paper as follows:
− Conceptualization: Y. Darkwah Jnr., D.-K. Kang
− Methodology: Y. Darkwah Jnr.
− Software: Y. Darkwah Jnr.
− Validation: D.-K. Kang
− Formal analysis: Y. Darkwah Jnr.
− Investigation: Y. Darkwah Jnr.
− Resources: D.-K. Kang
− Data curation: Y. Darkwah Jnr.
− Writing-original draft preparation: Y. Darkwah Jnr.
− Writing-review and editing: D.-K. Kang
− Visualization: Y. Darkwah Jnr.
− Supervision: D.-K. Kang
− Project administration: D.-K. Kang
− Funding acquisition: D.-K. Kang
All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing Interests
The author(s) declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Darkwah Jnr., Y., Kang, DK. Enhancing few-shot learning using targeted mixup. Appl Intell 55, 279 (2025). https://doi.org/10.1007/s10489-024-06157-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06157-8