KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

Ding, Hao; Wei, Bin; Gu, Zhaorui; Yu, Zhibin; Zheng, Haiyong; Zheng, Bing; Li, Juan

doi:10.1007/s11042-019-07856-y

KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

Published: 26 June 2019

Volume 79, pages 14871–14888, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hao Ding¹,
Bin Wei²,
Zhaorui Gu¹,
Zhibin Yu¹,
Haiyong Zheng ORCID: orcid.org/0000-0002-8027-0734^1,3,
Bing Zheng¹ &
…
Juan Li⁴

706 Accesses
11 Citations
Explore all metrics

Abstract

Imbalanced learning has become a research emphasis in recent years because of the growing number of class-imbalance classification problems in real applications. It is particularly challenging when the imbalanced rate is very high. Sampling, including under-sampling and over-sampling, is an intuitive and popular way in dealing with class-imbalance problems, which tries to regroup the original dataset and is also proved to be efficient. The main deficiency is that under-sampling methods usually ignore many majority class examples while over-sampling methods may easily cause over-fitting problem. In this paper, we propose a new algorithm dubbed KA-Ensemble ensembling under-sampling and over-sampling to overcome this issue. Our KA-Ensemble explores EasyEnsemble framework by under-sampling the majority class randomly and over-sampling the minority class via kernel based adaptive synthetic (Kernel-ADASYN) at meanwhile, yielding a group of balanced datasets to train corresponding classifiers separately, and the final result will be voted by all these trained classifiers. Through combining under-sampling and over-sampling in this way, KA-Ensemble is good at solving class-imbalance problems with large imbalanced rate. We evaluated our proposed method with state-of-the-art sampling methods on 9 image classification datasets with different imbalanced rates ranging from less than 2 to more than 15, and the experimental results show that our KA-Ensemble performs better in terms of accuracy (ACC), F-Measure, G-Mean, and area under curve (AUC). Moreover, it can be used in both dichotomy and multi-classification on both image classification and other class-imbalance problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LDAMSS: Fast and efficient undersampling method for imbalanced learning

Article 16 September 2021

Hybrid Sampling with Bagging for Class Imbalance Learning

A new technique for classification method with imbalanced training data

Article 24 February 2024

References

Cenggoro TW et al (2018) Deep learning for imbalance data classification using class expert generative adversarial network. Procedia Comput Sci 135:60–67
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw 8(6):1564
Google Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130
MATH Google Scholar
Drummond C, Holte RC et al (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: ICML Workshop on Learning from Imbalanced Datasets II, vol 11. Citeseer, pp 1–8
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36
MathSciNet Google Scholar
Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: misclassification cost-sensitive boosting. In: International conference on machine learning, pp 97–105
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
MathSciNet MATH Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Guo H, Viktor HL (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM Sigkdd Explor Newsl 6(1):30–39
Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
Hart P (1968) The condensed nearest neighbor rule (Corresp.) IEEE Trans Inf Theory 14(3):515–516
Google Scholar
He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284
Google Scholar
He H, Ma Y (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, New York
MATH Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) “ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks. IEEE, pp 1322–1328
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B (Cybern) 42(2):513–529
Google Scholar
Huang C, Li Y, Change Loy C, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM Sigkdd Explor Newsl 6(1):40–49
Google Scholar
Kaur P, Negi V (2016) Techniques based upon boosting to counter class imbalance problem—a survey. In: International conference on computing for sustainable global development. IEEE, pp 2620–2623
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progr Artif Intell 5(4):221–232
Google Scholar
Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: International conference on machine learning, vol 97, Nashville, pp 179–186
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444
Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2018) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2018.2858826
Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybern) 39(2):539–550
Google Scholar
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375
Google Scholar
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322
Google Scholar
Lu H, Wang D, Li Y, Li J, Li X, Kim H, Serikawa S, Humar I (2019) CONet: a cognitive ocean network. IEEE Wirel Commun 26(3):1–10
Google Scholar
Ma J, Li S, Qin H, Hao A (2017) Unsupervised multi-class co-segmentation via joint-cut over l₁-manifold hyper-graph of discriminative image regions. IEEE Trans Image Process 26(3):1216–1230
MathSciNet MATH Google Scholar
Mao W, He L, Yan Y, Wang J (2017) Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine. Mech Syst Signal Process 83:450–473
Google Scholar
Mao W, Liu Y, Ding L, Li Y (2019) Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: a comparative study. IEEE Access 7:9515–9530
Google Scholar
Mohamed AM, Busch-Vishniac I (1995) Imbalance compensation and automation balancing in magnetic bearing systems using the Q-parameterization theory. IEEE Trans Control Syst Technol 3(2):202–211
Google Scholar
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
Google Scholar
Ou W, Yuan D, Liu Q, Cao Y (2018) Object tracking based on online representative sample selection via non-negative least square. Multimed Tools Appl 77 (9):10569–10587
Google Scholar
Peng L, Zhang H, Yang B, Chen Y (2014) A new approach for imbalanced data classification based on data gravitation. Inf Sci 288:347–373
Google Scholar
Press SJ, Wilson S (1978) Choosing between logistic regression and discriminant analysis. J Am Stat Assoc 73(364):699–705
MATH Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
MATH Google Scholar
Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electr Eng 40(1):41–50
Google Scholar
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Google Scholar
Tang B, He H (2015) KernelADASYN: kernel based adaptive synthetic data generation for imbalanced learning. In: IEEE congress on evolutionary computation. IEEE, pp 664–671
Wang Q, Li S, Qin H, Hao A (2016) Super-resolution of multi-observed RGB-D images based on nonlocal regression and total variation. IEEE Trans Image Process 25(3):1425–1440
MathSciNet MATH Google Scholar
Wang C, Yu Z, Zheng H, Wang N, Zheng B (2017) CGAN-Plankton: towards large-scale imbalanced class generation and fine-grained classification. In: IEEE international conference on image processing. IEEE, pp 855–859
Weiss GM (2004) Mining with rarity: a unifying framework. ACM Sigkdd Explor Newsl 6(1):7–19
Google Scholar
Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727
Google Scholar
Yu H, Ni J, Zhao J (2013) ACOSampling: an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 101:309–318
Google Scholar
Zhou Q, Zheng B, Zhu W, Latecki LJ (2016) Multi-scale context for scene labeling via flexible segmentation graph. Pattern Recogn 59:312–324
Google Scholar
Zhou Y, Bai X, Liu W, Latecki LJ (2016) Similarity fusion for visual tracking. Int J Comput Vis 118(3):337–363
MathSciNet Google Scholar
Zhou Q, Yang W, Gao G, Ou W, Lu H, Chen J, Latecki LJ (2019) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22(2):555–570
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, College of Information Science and Engineering, Ocean University of China, Qingdao, 266100, China
Hao Ding, Zhaorui Gu, Zhibin Yu, Haiyong Zheng & Bing Zheng
Shandong Key Laboratory of Digital Medicine and Computer Assisted Surgery, The Affiliated Hospital of Qingdao University, Qingdao, 266003, China
Bin Wei
Department of Mathematics, School of Science and Engineering, University of Dundee, Dundee, DD1 4HN, UK
Haiyong Zheng
College of Mechanical and Electrical Engineering, Qingdao Agricultural University, Qingdao, 266109, China
Juan Li

Authors

Hao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Zhaorui Gu
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Haiyong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Juan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bin Wei or Haiyong Zheng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China under Grants 61771440, 41776113 and 41674037, in part by the Qingdao Municipal Science and Technology Program under Grant 17-1-1-5-jch, in part by the China Scholarship Council under Grant 201806335022, and in part by the Foundation of Shandong provincial Key Laboratory of Digital Medicine and Computer Assisted Surgery under Grant SDKL-DMCAS-2018-07.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, H., Wei, B., Gu, Z. et al. KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling. Multimed Tools Appl 79, 14871–14888 (2020). https://doi.org/10.1007/s11042-019-07856-y

Download citation

Received: 28 February 2019
Revised: 12 May 2019
Accepted: 31 May 2019
Published: 26 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11042-019-07856-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

Abstract

Access this article

Similar content being viewed by others

LDAMSS: Fast and efficient undersampling method for imbalanced learning

Hybrid Sampling with Bagging for Class Imbalance Learning

A new technique for classification method with imbalanced training data

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

Abstract

Access this article

Similar content being viewed by others

LDAMSS: Fast and efficient undersampling method for imbalanced learning

Hybrid Sampling with Bagging for Class Imbalance Learning

A new technique for classification method with imbalanced training data

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation