Skip to main content

Advertisement

Log in

A Comparative Analysis on Recent Methods for Addressing Imbalance Classification

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In machine learning, the term ”class imbalanced” is frequently used. This is a crucial part of the field of machine learning. It is quite important in the classification process and has a significant impact on performance. That is why researchers are concentrating on it to overcome this difficulty. Various researchers have devised numerous methods till now. The approaches to addressing this imbalance issue found so far can be broadly categorized into three categories, which are the data-level approach, algorithm-level approach, and hybrid-level approach. To evaluate the most recent developments in resolving the negative effects of class imbalance, this study provides a comparative analysis of research that has been published within the last 5 years with an emphasis on high-class imbalance. In this study, an attempt has been made to provide a concise overview of what imbalance classification is, how it is created, and what the inconveniences are due to it. We have tried to provide a summary of several studies that have been published in the last few years and along with that a comparative analysis of all these approaches has been done.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of Data and Materials

Data sharing is not applicable to this article

References

  1. Lee Z-J, Lee C-Y, Chou S-T, Ma W-P, Ye F, Chen Z. A hybrid system for imbalanced data mining. Microsyst Technol. 2020;26(9):3043–7.

    Article  Google Scholar 

  2. Kamal S, Ripon SH, Dey N, Ashour AS, Santhi V. A mapreduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput Methods Programs Biomed. 2016;131:191–206.

    Article  Google Scholar 

  3. Arun K, Jabasheela L. Big data: review, classification and analysis survey. Int J Innov Res Inf Secur (IJIRIS). 2014;1(3):17–23.

    Google Scholar 

  4. Triguero I, Galar M, Vluymans S, Cornelis C, Bustince H, Herrera F, Saeys Y. Evolutionary undersampling for imbalanced big data classification. In: 2015 IEEE congress on evolutionary computation (CEC). IEEE; 2015. p. 715–22.

  5. Ali A, Shamsuddin SM, Ralescu AL. Classification with class imbalance problem. Int J Adv Soft Comput Appl. 2013;5(3):176–204.

    Google Scholar 

  6. Kesavaraj G, Sukumaran S. A study on classification techniques in data mining. In: 2013 fourth international conference on computing, communications and networking technologies (ICCCNT). IEEE; 2013. p. 1–7.

  7. Koturwar P, Girase S, Mukhopadhyay D. A survey of classification techniques in the area of big data (2015). arXiv:1503.07477.

  8. Kaur P, Gosain A. Issues and challenges of class imbalance problem in classification. Int J Inf Technol. 2018;14(1):539–45.

    Google Scholar 

  9. Madasamy K, Ramaswami M. Data imbalance and classifiers: impact and solutions from a big data perspective. Int J Comput Intell Res. 2017;13(9):2267–81.

    Google Scholar 

  10. Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. A survey on addressing high-class imbalance in big data. J Big Data. 2018;5(1):1–30.

    Article  Google Scholar 

  11. Hasanin T, Khoshgoftaar TM, Leevy JL, Bauder RA. Severely imbalanced big data challenges: investigating data sampling approaches. J Big Data. 2019;6(1):1–25.

    Article  Google Scholar 

  12. Fernández A, Río S, Chawla NV, Herrera F. An insight into imbalanced big data classification: outcomes and challenges. Complex Intell Syst. 2017;3(2):105–20.

    Article  Google Scholar 

  13. Rout N, Mishra D, Mallick MK. Handling imbalanced data: a survey. In: International proceedings on advances in soft computing, intelligent systems and applications. Springer; 2018. p. 431–43.

  14. Lemnaru C, Potolea R. Imbalanced classification problems: systematic study, issues and best practices. In: International conference on enterprise information systems. Springer; 2011. p. 35–50.

  15. Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell. 2016;5(4):221–32.

    Article  Google Scholar 

  16. Ahmed Z, Askari SMS, Das S. Comparative analysis of recent data-level methods for imbalance classification. In: 2023 4th international conference on computing and communication systems (I3CS). IEEE; 2023. p. 1–6.

  17. A gentle introduction to imbalanced classification. https://machinelearningmastery.com/what-is-imbalanced-classification/. Accessed 26 Oct 2021.

  18. Abdi L, Hashemi S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 2015;28(1):238–51.

    Article  Google Scholar 

  19. Somasundaram A, Reddy US. Data imbalance: effects and solutions for classification of large and highly imbalanced data. In: International conference on research in engineering, computers and technology (ICRECT 2016). 2016. p. 1–16.

  20. He H, Ma Y. Imbalanced learning: foundations, algorithms, and applications. 2013.

  21. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

    Article  MATH  Google Scholar 

  22. Bej S, Davtyan N, Wolfien M, Nassar M, Wolkenhauer O. Loras: an oversampling approach for imbalanced datasets. Mach Learn. 2021;110(2):279–301.

    Article  MathSciNet  MATH  Google Scholar 

  23. Kowalski BR, Bender C. k-nearest neighbor classification rule (pattern recognition) applied to nuclear magnetic resonance spectral interpretation. Anal Chem. 1972;44(8):1405–11.

    Article  Google Scholar 

  24. Kramer O, Kramer O. Scikit-learn. Machine learning for evolution strategies. 2016. p. 45–53 .

  25. Vuttipittayamongkol P, Elyan E. Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci. 2020;509:47–70.

    Article  Google Scholar 

  26. KEEL: a software tool to assess evolutionary algorithms for Data Mining problems (regression, classification, clustering, pattern mining and so on). https://sci2s.ugr.es/keel/datasets.php. Accessed 03 July 2022.

  27. UCI Machine Learning Repository: Data Sets. https://archive.ics.uci.edu/ml/datasets.php?format= &task=cla &att= &area= &numAtt= &numIns= &type= &sort=nameUp &view=list. Accessed 03 Sept 2022.

  28. Vo MT, Nguyen T, Vo HA, Le T. Noise-adaptive synthetic oversampling technique. Appl Intell. 2021;51(11):7827–36.

    Article  Google Scholar 

  29. Arefeen MA, Nimi ST, Rahman MS. Neural network-based under sampling techniques. IEEE Trans Syst Man Cybern Syst. 2020;52(2):1111–20.

    Article  Google Scholar 

  30. Liu C, Jin S, Wang D, Luo Z, Yu J, Zhou B, Yang C. Constrained oversampling: an oversampling approach to reduce noise generation in imbalanced datasets with class overlapping. IEEE Access. 2020;10:91452–65.

    Article  Google Scholar 

  31. Krawczyk B, Bellinger C, Corizzo R, Japkowicz N. Undersampling with support vectors for multi-class imbalanced data classification. In: 2021 international joint conference on neural networks (IJCNN). IEEE; 2021. p. 1–7.

  32. Sáez JA, Luengo J, Stefanowski J, Herrera F. Smote-ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci. 2015;291:184–203.

    Article  Google Scholar 

  33. Vuttipittayamongkol P, Elyan E, Petrovski A, Jayne C. Overlap-based under sampling for improving imbalanced data classification. In: International conference on intelligent data engineering and automated learning. Springer; 2018. p. 689–97.

  34. Liu J. Fuzzy support vector machine for imbalanced data with borderline noise. Fuzzy Sets Syst. 2021;413:64–73.

    Article  MathSciNet  Google Scholar 

  35. Yuan B-W, Luo X-G, Zhang Z-L, Yu Y, Huo H-W, Johannes T, Zou X-D. A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput Appl. 2021;33(9):4457–81.

    Article  Google Scholar 

  36. Tanveer M, Sharma A, Suganthan PN. Least squares knn-based weighted multiclass twin svm. Neurocomputing. 2021;459:454–64.

    Article  Google Scholar 

  37. Kumbure MM, Luukka P, Collan M. A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett. 2020;140:172–8.

    Article  Google Scholar 

  38. Lin E, Chen Q, Qi X. Deep reinforcement learning for imbalanced classification. Appl Intell. 2020;50(8):2488–502.

    Article  Google Scholar 

  39. Tao X, Li Q, Ren C, Guo W, He Q, Liu R, Zou J. Affinity and class probability-based fuzzy support vector machine for imbalanced data sets. Neural Netw. 2020;122:289–307.

    Article  Google Scholar 

  40. Boosting methods for multi-class imbalanced data classification: an experimental review. https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-020-00349-y.pdf. Accessed 03 May 2022.

  41. Zhao J, Jin J, Chen S, Zhang R, Yu B, Liu Q. A weighted hybrid ensemble method for classifying imbalanced data. Knowl Based Syst. 2020;203: 106087.

    Article  Google Scholar 

  42. Zhang J, Wang T, Ng WW, Pedrycz W. Ensembling perturbation-based oversamplers for imbalanced datasets. Neurocomputing. 2022;479:1.

    Article  Google Scholar 

  43. Kim KH, Sohn SY. Hybrid neural network with cost-sensitive support vector machine for class-imbalanced multimodal data. Neural Netw. 2020;130:176–84.

    Article  Google Scholar 

  44. Wang K-J, Makond B, Chen K-H, Wang K-M. A hybrid classifier combining smote with pso to estimate 5-year survivability of breast cancer patients. Appl Soft Comput. 2014;20:15–24.

    Article  Google Scholar 

  45. Huang J. Performance measures of machine learning. University of Western Ontario. 2008.

  46. He H, Bai Y, Garcia EA, Li S. Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence). IEEE; 2008. p. 1322–28.

  47. Farquad MAH, Bose I. Preprocessing unbalanced data using support vector machine. Decis Support Syst. 2012;53(1):226–33.

    Article  Google Scholar 

  48. Han H, Wang W-Y, Mao B-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing: international conference on intelligent computing, ICIC 2005, Hefei, China, August 23–26, 2005, Proceedings, Part I 1. Springer; 2005. p. 878–87.

  49. Nguyen HM, Cooper EW, Kamei K. Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig. 2011;3(1):4–21.

    Article  Google Scholar 

  50. scikit-learn: machine learning in Python—scikit-learn 1.3.0 documentation. https://scikit-learn.org/stable/. Accessed 17 Sept 2023.

  51. SIGKDD: KDD Cup 2008: Breast cancer. https://kdd.org/kdd-cup/view/kdd-cup-2008. Accessed 17 Sept 2023.

  52. KEEL: a software tool to assess evolutionary algorithms for Data Mining problems (regression, classification, clustering, pattern mining and so on). https://sci2s.ugr.es/keel/datasets.php. Accessed 17 Sept 2023

  53. LIBSVM data: classification, regression, and multi-label. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. Accessed 17 Sept 2023.

  54. Find Open Datasets and Machine Learning Projects | Kaggle. https://www.kaggle.com/datasets. Accessed 17 Sept 2023.

  55. SEER Incidence Data, 1975–2020. https://seer.cancer.gov/data/. Accessed 19 Sept 2023.

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

Study conception, design, and analysis: ZA; Draft manuscript preparation: ZA. Supervised by: SD. All authors reviewed the article and approved the final version of the manuscript.

Corresponding author

Correspondence to Sufal Das.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “SWOT to AI-embraced Communication Systems (SWOT-AI)” guest edited by Somnath Mukhopadhyay, Debashis De, Sunita Sarkar and Celia Shahnaz.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, Z., Das, S. A Comparative Analysis on Recent Methods for Addressing Imbalance Classification. SN COMPUT. SCI. 5, 30 (2024). https://doi.org/10.1007/s42979-023-02357-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02357-0

Keywords

Navigation