A fused grey wolf and artificial bee colony model for imbalanced data classification problems

Bharti, Kusum Kumari; Tripathi, Ashutosh; Ghosh, Mohona

doi:10.1007/s13198-024-02412-w

A fused grey wolf and artificial bee colony model for imbalanced data classification problems

Original Article
Published: 19 July 2024

Volume 15, pages 4085–4104, (2024)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

100 Accesses
1 Citation
Explore all metrics

Abstract

The issue of imbalanced datasets, i.e., uneven sample distribution among different classes causes training biases and degrades learning algorithm performance. In past, several solutions for data imbalance handling have been proposed but most of them focus on removing the majority class instances, leading to loss of important information. An alternate strategy to mitigate this issue that has been investigated in literature is minority class samples generation. However, generation of quality synthetic samples for minority class remains an open problem. In this study, a fusion of grey wolf optimizer (GWO) with artificial bee colony (ABC) is proposed to generate good representative samples of the minority class. The combination is analysed because GWO has good exploitation abilities, while ABC is good at exploration. The effectiveness of the proposed method is tested on 20 real-world benchmark datasets and on one real-life application, i.e., scam video classification on YouTube using standard assessment indicators. The performance of the proposed method is compared against 18 state-of-the-art data imbalance handling methods using three classification algorithms, i.e., support vector machine (SVM), k-nearest neighbours (KNN) and decision tree (DT). Our experimental results show an improvement in G-mean score on 18 out of 20 datasets with a maximum improvement of 8% for SVM, and on 17 out of 20 datasets with a maximum improvement of 10.7% for KNN and 6.3% for DT respectively. An improvement in AUC score is also seen on 17 out of 20 datasets for SVM and DT with a maximum improvement of 4.5% and 6% respectively, and on 16 out of 20 datasets for KNN with a maximum improvement of 7.7%. These results show that the proposed method is robust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem

Article Open access 09 March 2021

Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning

Article 06 February 2018

A novel oversampling and feature selection hybrid algorithm for imbalanced data classification

Article 24 June 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://sci2s.ugr.es/keel/datasets.php
https://sci2s.ugr.es/keel/datasets.php
https://developers.google.com/youtube/v3/docs

References

Ala A, Alsaadi FE, Ahmadi M, Mirjalili S (2021) Optimization of an appointment scheduling problem for healthcare systems based on the quality of fairness service using whale optimization algorithm and nsga-ii. Sci Rep 11:19816
Article Google Scholar
Ala A, Mahmoudi A, Mirjalili S, Simic V, Pamucar D (2023) Evaluating the performance of various algorithms for wind energy optimization: a hybrid decision-making model. Expert Syst Appl 221:119731
Article Google Scholar
Ala A, Simic V, Bacanin N, Tirkolaee EB (2024) Blood supply chain network design with lateral freight: a robust possibilistic optimization model. Eng Appl Artif Intell 133:108053
Article Google Scholar
Ala A, Simic V, Pamucar D, Bacanin N (2024) Enhancing patient information performance in internet of things-based smart healthcare system: hybrid artificial intelligence and optimization approaches. Eng Appl Artif Intell 131:107889
Article Google Scholar
Aslan S, Arslan S (2022) A modified artificial bee colony algorithm for classification optimisation. Int J Bio-Inspired Comput 20:11–22
Article Google Scholar
Azizia H, Rezab H (2021) Data mining based investigation of the impact of imbalanced dataset over fractured zone detection. Int J Eng Technol 10:124–133
Google Scholar
Bansal M, Goyal A, Choudhary A (2022) A comparative analysis of K-nearest neighbour, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decis Anal J 3:100071
Article Google Scholar
Barua S, Islam M, Murase K, et al (2013) Prowsyn: proximity weighted synthetic oversampling technique for imbalanced data set learning. In: Pacific-Asia conference on knowledge discovery and data mining, Springer. pp 317–328
Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26:405–425
Article Google Scholar
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6:20–29
Article Google Scholar
Bunkhumpornpat C, Sinapiromsaran K (2003) Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv Knowl Discov Data Mining. Springer, pp 475–482
Chakraborty A, Ghosh KK, De R, Cuevas E, Sarkar R (2021) Learning automata based particle swarm optimization for solving class imbalance problem. Appl Soft Comput 113:107959
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Chen L, Cai Z, Chen L, Gu Q, (2010) A novel differential evolution-clustering hybrid resampling algorithm on imbalanced datasets. In: 2010 Third international conference on knowledge discovery and data mining, IEEE. pp 81–85
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Article Google Scholar
De La Calleja J, Fuentes O (2007) A distance-based over-sampling method for learning from imbalanced data sets. In: FLAIRS conference, pp 634–635
Douzas G, Bacao F (2017) Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl 82:40–52
Article Google Scholar
Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S (2021) Ghost: adjusting the decision threshold to handle imbalanced data in machine learning. J Chem Inf Model 61:2623–2640
Article Google Scholar
Fix E, Hodges JL (1989) Discriminatory analysis. nonparametric discrimination: consistency properties. Int Stat Rev 57:238–247
Article Google Scholar
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378
Article Google Scholar
Gao M, Hong X, Chen S, Harris CJ, Khalaf E (2014) Pdfos: Pdf estimation based over-sampling for imbalanced two-class problems. Neurocomputing 138:248–259
Article Google Scholar
Gazzah S, Amara NEB (2008) New oversampling approaches based on polynomial fitting for imbalanced data sets. In: 2008 the eighth IAPR international workshop on document analysis systems, IEEE. PP 677–684
Gosain A, Sardana S (2017) Handling class imbalance problem using oversampling techniques: a review. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), IEEE. pp. 79–85
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Article Google Scholar
Han H, Wang WY, Mao BH, (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer. pp 878–887
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence), IEEE. pp 1322–1328
Islam A, Belhaouari SB, Rehman AU, Bensmail H (2022) Knnor: an oversampling technique for imbalanced datasets. Appl Soft Comput 115:108288
Article Google Scholar
Karaboga D, et al (2005) An idea based on honey bee swarm for numerical optimization. Technical report. Technical report-tr06, Erciyes university, engineering faculty, computer
Kaya E, Korkmaz S, Sahman MA, Cinar AC (2021) Debohid: a differential evolution based oversampling approach for highly imbalanced datasets. Expert Syst Appl 169:114482
Article Google Scholar
Kovács G (2019) Smote-variants: a python implementation of 85 minority oversampling techniques. Neurocomputing 366:352–354
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221–232
Article Google Scholar
Lei D, Cui Z, Li M (2022) A dynamical artificial bee colony for vehicle routing problem with drones. Eng Appl Artif Intell 107:104510
Article Google Scholar
Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), IEEE. pp 104–111
Mavrovouniotis M, Li C, Yang S (2017) A survey of swarm intelligence for dynamic optimization: algorithms and applications. Swarm Evol Comput 33:1–17
Article Google Scholar
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28:92–122
Article MathSciNet Google Scholar
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Article Google Scholar
Mishra S (2017) Handling imbalanced data: smote vs. random undersampling. Int Res J Eng Technol 4:317–320
Google Scholar
Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3:4–21
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Article Google Scholar
Sanchez AI, Morales EF, Gonzalez JA (2013) Synthetic oversampling of instances using clustering. Int J Artif Intell Tools 22:1350008
Article Google Scholar
Tang S, Chen SP, (2008) The generation mechanism of synthetic minority class examples. In: 2008 International conference on information technology and applications in biomedicine, IEEE. 444–447
Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441
Article MathSciNet Google Scholar
Tsai CF, Lin WC (2021) Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9:13717–13726
Article Google Scholar
Wei G, Mu W, Song Y, Dou J (2022) An improved and random synthetic minority oversampling technique for imbalanced data. Knowl-Based Syst 248:108839
Article Google Scholar
Yu L, Zhou N (2021) Survey of imbalanced data methodologies. arXiv preprint arXiv:2104.02240

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Dr. B . R. Ambedkar National Institute of Technology, Jalandhar, India
Kusum Kumari Bharti
Pandit Deendayal Energy University, Gandhinagar, India
Ashutosh Tripathi
Indira Gandhi Delhi Technical University For Women, New Delhi, India
Mohona Ghosh

Authors

Kusum Kumari Bharti
View author publications
You can also search for this author inPubMed Google Scholar
Ashutosh Tripathi
View author publications
You can also search for this author inPubMed Google Scholar
Mohona Ghosh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ashutosh Tripathi.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Human and animal resources

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

The research does not include any such participants which requires informed consent to be taken. Hence, this statement is not applicable to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bharti, K.K., Tripathi, A. & Ghosh, M. A fused grey wolf and artificial bee colony model for imbalanced data classification problems. Int J Syst Assur Eng Manag 15, 4085–4104 (2024). https://doi.org/10.1007/s13198-024-02412-w

Download citation

Received: 25 January 2024
Revised: 08 May 2024
Accepted: 23 June 2024
Published: 19 July 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s13198-024-02412-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fused grey wolf and artificial bee colony model for imbalanced data classification problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem

Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning

A novel oversampling and feature selection hybrid algorithm for imbalanced data classification

Explore related subjects

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal resources

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now