Skip to main content
Log in

KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Imbalanced learning has become a research emphasis in recent years because of the growing number of class-imbalance classification problems in real applications. It is particularly challenging when the imbalanced rate is very high. Sampling, including under-sampling and over-sampling, is an intuitive and popular way in dealing with class-imbalance problems, which tries to regroup the original dataset and is also proved to be efficient. The main deficiency is that under-sampling methods usually ignore many majority class examples while over-sampling methods may easily cause over-fitting problem. In this paper, we propose a new algorithm dubbed KA-Ensemble ensembling under-sampling and over-sampling to overcome this issue. Our KA-Ensemble explores EasyEnsemble framework by under-sampling the majority class randomly and over-sampling the minority class via kernel based adaptive synthetic (Kernel-ADASYN) at meanwhile, yielding a group of balanced datasets to train corresponding classifiers separately, and the final result will be voted by all these trained classifiers. Through combining under-sampling and over-sampling in this way, KA-Ensemble is good at solving class-imbalance problems with large imbalanced rate. We evaluated our proposed method with state-of-the-art sampling methods on 9 image classification datasets with different imbalanced rates ranging from less than 2 to more than 15, and the experimental results show that our KA-Ensemble performs better in terms of accuracy (ACC), F-Measure, G-Mean, and area under curve (AUC). Moreover, it can be used in both dichotomy and multi-classification on both image classification and other class-imbalance problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Cenggoro TW et al (2018) Deep learning for imbalance data classification using class expert generative adversarial network. Procedia Comput Sci 135:60–67

    Google Scholar 

  2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  3. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119

  4. Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw 8(6):1564

    Google Scholar 

  5. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130

    MATH  Google Scholar 

  6. Drummond C, Holte RC et al (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: ICML Workshop on Learning from Imbalanced Datasets II, vol 11. Citeseer, pp 1–8

  7. Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36

    MathSciNet  Google Scholar 

  8. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: misclassification cost-sensitive boosting. In: International conference on machine learning, pp 97–105

  9. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    MathSciNet  MATH  Google Scholar 

  10. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  11. Guo H, Viktor HL (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM Sigkdd Explor Newsl 6(1):30–39

    Google Scholar 

  12. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887

  13. Hart P (1968) The condensed nearest neighbor rule (Corresp.) IEEE Trans Inf Theory 14(3):515–516

    Google Scholar 

  14. He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284

    Google Scholar 

  15. He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, New York

    MATH  Google Scholar 

  16. He H, Bai Y, Garcia EA, Li S (2008) “ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks. IEEE, pp 1322–1328

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B (Cybern) 42(2):513–529

    Google Scholar 

  19. Huang C, Li Y, Change Loy C, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384

  20. Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM Sigkdd Explor Newsl 6(1):40–49

    Google Scholar 

  21. Kaur P, Negi V (2016) Techniques based upon boosting to counter class imbalance problem—a survey. In: International conference on computing for sustainable global development. IEEE, pp 2620–2623

  22. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progr Artif Intell 5(4):221–232

    Google Scholar 

  23. Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: International conference on machine learning, vol 97, Nashville, pp 179–186

  24. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444

    Google Scholar 

  25. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2018) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2018.2858826

  26. Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybern) 39(2):539–550

    Google Scholar 

  27. Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375

    Google Scholar 

  28. Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322

    Google Scholar 

  29. Lu H, Wang D, Li Y, Li J, Li X, Kim H, Serikawa S, Humar I (2019) CONet: a cognitive ocean network. IEEE Wirel Commun 26(3):1–10

    Google Scholar 

  30. Ma J, Li S, Qin H, Hao A (2017) Unsupervised multi-class co-segmentation via joint-cut over l1-manifold hyper-graph of discriminative image regions. IEEE Trans Image Process 26(3):1216–1230

    MathSciNet  MATH  Google Scholar 

  31. Mao W, He L, Yan Y, Wang J (2017) Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine. Mech Syst Signal Process 83:450–473

    Google Scholar 

  32. Mao W, Liu Y, Ding L, Li Y (2019) Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: a comparative study. IEEE Access 7:9515–9530

    Google Scholar 

  33. Mohamed AM, Busch-Vishniac I (1995) Imbalance compensation and automation balancing in magnetic bearing systems using the Q-parameterization theory. IEEE Trans Control Syst Technol 3(2):202–211

    Google Scholar 

  34. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21

    Google Scholar 

  35. Ou W, Yuan D, Liu Q, Cao Y (2018) Object tracking based on online representative sample selection via non-negative least square. Multimed Tools Appl 77 (9):10569–10587

    Google Scholar 

  36. Peng L, Zhang H, Yang B, Chen Y (2014) A new approach for imbalanced data classification based on data gravitation. Inf Sci 288:347–373

    Google Scholar 

  37. Press SJ, Wilson S (1978) Choosing between logistic regression and discriminant analysis. J Am Stat Assoc 73(364):699–705

    MATH  Google Scholar 

  38. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  39. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    MATH  Google Scholar 

  40. Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electr Eng 40(1):41–50

    Google Scholar 

  41. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Google Scholar 

  42. Tang B, He H (2015) KernelADASYN: kernel based adaptive synthetic data generation for imbalanced learning. In: IEEE congress on evolutionary computation. IEEE, pp 664–671

  43. Wang Q, Li S, Qin H, Hao A (2016) Super-resolution of multi-observed RGB-D images based on nonlocal regression and total variation. IEEE Trans Image Process 25(3):1425–1440

    MathSciNet  MATH  Google Scholar 

  44. Wang C, Yu Z, Zheng H, Wang N, Zheng B (2017) CGAN-Plankton: towards large-scale imbalanced class generation and fine-grained classification. In: IEEE international conference on image processing. IEEE, pp 855–859

  45. Weiss GM (2004) Mining with rarity: a unifying framework. ACM Sigkdd Explor Newsl 6(1):7–19

    Google Scholar 

  46. Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727

    Google Scholar 

  47. Yu H, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101:309–318

    Google Scholar 

  48. Zhou Q, Zheng B, Zhu W, Latecki LJ (2016) Multi-scale context for scene labeling via flexible segmentation graph. Pattern Recogn 59:312–324

    Google Scholar 

  49. Zhou Y, Bai X, Liu W, Latecki LJ (2016) Similarity fusion for visual tracking. Int J Comput Vis 118(3):337–363

    MathSciNet  Google Scholar 

  50. Zhou Q, Yang W, Gao G, Ou W, Lu H, Chen J, Latecki LJ (2019) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22(2):555–570

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bin Wei or Haiyong Zheng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China under Grants 61771440, 41776113 and 41674037, in part by the Qingdao Municipal Science and Technology Program under Grant 17-1-1-5-jch, in part by the China Scholarship Council under Grant 201806335022, and in part by the Foundation of Shandong provincial Key Laboratory of Digital Medicine and Computer Assisted Surgery under Grant SDKL-DMCAS-2018-07.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, H., Wei, B., Gu, Z. et al. KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling. Multimed Tools Appl 79, 14871–14888 (2020). https://doi.org/10.1007/s11042-019-07856-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07856-y

Keywords

Navigation