Skip to main content
Log in

A fused grey wolf and artificial bee colony model for imbalanced data classification problems

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

The issue of imbalanced datasets, i.e., uneven sample distribution among different classes causes training biases and degrades learning algorithm performance. In past, several solutions for data imbalance handling have been proposed but most of them focus on removing the majority class instances, leading to loss of important information. An alternate strategy to mitigate this issue that has been investigated in literature is minority class samples generation. However, generation of quality synthetic samples for minority class remains an open problem. In this study, a fusion of grey wolf optimizer (GWO) with artificial bee colony (ABC) is proposed to generate good representative samples of the minority class. The combination is analysed because GWO has good exploitation abilities, while ABC is good at exploration. The effectiveness of the proposed method is tested on 20 real-world benchmark datasets and on one real-life application, i.e., scam video classification on YouTube using standard assessment indicators. The performance of the proposed method is compared against 18 state-of-the-art data imbalance handling methods using three classification algorithms, i.e., support vector machine (SVM), k-nearest neighbours (KNN) and decision tree (DT). Our experimental results show an improvement in G-mean score on 18 out of 20 datasets with a maximum improvement of 8% for SVM, and on 17 out of 20 datasets with a maximum improvement of 10.7% for KNN and 6.3% for DT respectively. An improvement in AUC score is also seen on 17 out of 20 datasets for SVM and DT with a maximum improvement of 4.5% and 6% respectively, and on 16 out of 20 datasets for KNN with a maximum improvement of 7.7%. These results show that the proposed method is robust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://sci2s.ugr.es/keel/datasets.php

  2. https://sci2s.ugr.es/keel/datasets.php

  3. https://developers.google.com/youtube/v3/docs

References

  • Ala A, Alsaadi FE, Ahmadi M, Mirjalili S (2021) Optimization of an appointment scheduling problem for healthcare systems based on the quality of fairness service using whale optimization algorithm and nsga-ii. Sci Rep 11:19816

    Article  Google Scholar 

  • Ala A, Mahmoudi A, Mirjalili S, Simic V, Pamucar D (2023) Evaluating the performance of various algorithms for wind energy optimization: a hybrid decision-making model. Expert Syst Appl 221:119731

    Article  Google Scholar 

  • Ala A, Simic V, Bacanin N, Tirkolaee EB (2024) Blood supply chain network design with lateral freight: a robust possibilistic optimization model. Eng Appl Artif Intell 133:108053

    Article  Google Scholar 

  • Ala A, Simic V, Pamucar D, Bacanin N (2024) Enhancing patient information performance in internet of things-based smart healthcare system: hybrid artificial intelligence and optimization approaches. Eng Appl Artif Intell 131:107889

    Article  Google Scholar 

  • Aslan S, Arslan S (2022) A modified artificial bee colony algorithm for classification optimisation. Int J Bio-Inspired Comput 20:11–22

    Article  Google Scholar 

  • Azizia H, Rezab H (2021) Data mining based investigation of the impact of imbalanced dataset over fractured zone detection. Int J Eng Technol 10:124–133

    Google Scholar 

  • Bansal M, Goyal A, Choudhary A (2022) A comparative analysis of K-nearest neighbour, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decis Anal J 3:100071

    Article  Google Scholar 

  • Barua S, Islam M, Murase K, et al (2013) Prowsyn: proximity weighted synthetic oversampling technique for imbalanced data set learning. In: Pacific-Asia conference on knowledge discovery and data mining, Springer. pp 317–328

  • Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26:405–425

    Article  Google Scholar 

  • Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6:20–29

    Article  Google Scholar 

  • Bunkhumpornpat C, Sinapiromsaran K (2003) Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv Knowl Discov Data Mining. Springer, pp 475–482

  • Chakraborty A, Ghosh KK, De R, Cuevas E, Sarkar R (2021) Learning automata based particle swarm optimization for solving class imbalance problem. Appl Soft Comput 113:107959

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  • Chen L, Cai Z, Chen L, Gu Q, (2010) A novel differential evolution-clustering hybrid resampling algorithm on imbalanced datasets. In: 2010 Third international conference on knowledge discovery and data mining, IEEE. pp 81–85

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    Article  Google Scholar 

  • De La Calleja J, Fuentes O (2007) A distance-based over-sampling method for learning from imbalanced data sets. In: FLAIRS conference, pp 634–635

  • Douzas G, Bacao F (2017) Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl 82:40–52

    Article  Google Scholar 

  • Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S (2021) Ghost: adjusting the decision threshold to handle imbalanced data in machine learning. J Chem Inf Model 61:2623–2640

    Article  Google Scholar 

  • Fix E, Hodges JL (1989) Discriminatory analysis. nonparametric discrimination: consistency properties. Int Stat Rev 57:238–247

    Article  Google Scholar 

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378

    Article  Google Scholar 

  • Gao M, Hong X, Chen S, Harris CJ, Khalaf E (2014) Pdfos: Pdf estimation based over-sampling for imbalanced two-class problems. Neurocomputing 138:248–259

    Article  Google Scholar 

  • Gazzah S, Amara NEB (2008) New oversampling approaches based on polynomial fitting for imbalanced data sets. In: 2008 the eighth IAPR international workshop on document analysis systems, IEEE. PP 677–684

  • Gosain A, Sardana S (2017) Handling class imbalance problem using oversampling techniques: a review. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), IEEE. pp. 79–85

  • Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  • Han H, Wang WY, Mao BH, (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer. pp 878–887

  • He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence), IEEE. pp 1322–1328

  • Islam A, Belhaouari SB, Rehman AU, Bensmail H (2022) Knnor: an oversampling technique for imbalanced datasets. Appl Soft Comput 115:108288

    Article  Google Scholar 

  • Karaboga D, et al (2005) An idea based on honey bee swarm for numerical optimization. Technical report. Technical report-tr06, Erciyes university, engineering faculty, computer

  • Kaya E, Korkmaz S, Sahman MA, Cinar AC (2021) Debohid: a differential evolution based oversampling approach for highly imbalanced datasets. Expert Syst Appl 169:114482

    Article  Google Scholar 

  • Kovács G (2019) Smote-variants: a python implementation of 85 minority oversampling techniques. Neurocomputing 366:352–354

    Article  Google Scholar 

  • Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221–232

    Article  Google Scholar 

  • Lei D, Cui Z, Li M (2022) A dynamical artificial bee colony for vehicle routing problem with drones. Eng Appl Artif Intell 107:104510

    Article  Google Scholar 

  • Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), IEEE. pp 104–111

  • Mavrovouniotis M, Li C, Yang S (2017) A survey of swarm intelligence for dynamic optimization: algorithms and applications. Swarm Evol Comput 33:1–17

    Article  Google Scholar 

  • Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28:92–122

    Article  MathSciNet  Google Scholar 

  • Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  • Mishra S (2017) Handling imbalanced data: smote vs. random undersampling. Int Res J Eng Technol 4:317–320

    Google Scholar 

  • Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3:4–21

    Article  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Article  Google Scholar 

  • Sanchez AI, Morales EF, Gonzalez JA (2013) Synthetic oversampling of instances using clustering. Int J Artif Intell Tools 22:1350008

    Article  Google Scholar 

  • Tang S, Chen SP, (2008) The generation mechanism of synthetic minority class examples. In: 2008 International conference on information technology and applications in biomedicine, IEEE. 444–447

  • Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441

    Article  MathSciNet  Google Scholar 

  • Tsai CF, Lin WC (2021) Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9:13717–13726

    Article  Google Scholar 

  • Wei G, Mu W, Song Y, Dou J (2022) An improved and random synthetic minority oversampling technique for imbalanced data. Knowl-Based Syst 248:108839

    Article  Google Scholar 

  • Yu L, Zhou N (2021) Survey of imbalanced data methodologies. arXiv preprint arXiv:2104.02240

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashutosh Tripathi.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Human and animal resources

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

The research does not include any such participants which requires informed consent to be taken. Hence, this statement is not applicable to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bharti, K.K., Tripathi, A. & Ghosh, M. A fused grey wolf and artificial bee colony model for imbalanced data classification problems. Int J Syst Assur Eng Manag 15, 4085–4104 (2024). https://doi.org/10.1007/s13198-024-02412-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-024-02412-w

Keywords