Abstract
Adaptive Boosting (AdaBoost) algorithm is a widely used ensemble learning algorithm. And it can effectively improve the classification performance of ordinary datasets when combined with many other types of learning algorithms. The AdaBoost algorithm focuses on the overall classification performance of weak classifiers and aims to minimize the overall classification error. However, it ignores the imbalance in the number of samples between different classes, so it is not suitable for imbalanced data classification directly. In order to improve the classification accuracy of the minority samples in imbalanced datasets, this paper proposes an improved AdaBoost algorithm based on weight adjustment factors (AdaBoost-C). It redefines the error rate function by assigning a higher weight to the minority sample to emphasize its importance, and assigning a lower weight to the majority sample to suppress its importance. In addition, this paper also proposes an adaptive particle swarm optimization algorithm with exponential dynamic adjustment of inertia weight (EIWAPSO) to further optimize the weight of the weak classifier. It can effectively prevent the ensemble algorithm from generating redundant and useless weak classifiers to consume system resources, and avoid falling into local optimum. The experimental results show that the Recall and AUC values of the EIWAPSO-AdaBoost-C ensemble algorithm proposed in this paper have reached the highest values on datasets with different IR, and the maximum, minimum and average errors of this algorithm have reached the minimum values in a variety of comparison algorithms. Therefore, the algorithm proposed in this paper can not only effectively improve the classification accuracy of minority samples on imbalanced datasets, but also the algorithm is more stable.
Similar content being viewed by others
References
Belaala A, Terrissa LS, Yazid B et al (2020) Skin cancer and deep learning for dermoscopic images classification: A pilot study. J Clin Oncol 38(15suppl)
Shon Batbaatar et al (2020) Kim classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry 12(1):154
Zakaryazad A, Duman E (2016) A profit-driven artificial neural network (ANN) with applications to fraud detection and direct marketing. Neurocomputing 175:121–131
Bashir K, Li T, Yahaya M (2020) A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction . Int Arab J Inform Technol 17 (5):721–730
Malhotra R, Kamal S (2019) An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data . Neurocomputing 343:120–140
Bej S, Davtyan N, Wolfien M et al (2020) LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110:279–301
Zhu R, Guo Y, Xue JH (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recogn Lett 133:217–223
Ren Y, Zhang X, Ma Y et al (2020) Full convolutional neural network based on multi-scale feature fusion for the class imbalance remote sensing image classification. Remote Sens 12(21):3547
Zheng W, Zhao H (2020) Cost-sensitive hierarchical classification for imbalance classes. Appl Intell 50(1):2328–2338
Zhao J, Jin J, Chen S et al (2020) A weighted hybrid ensemble method for classifying imbalanced data. Knowl-Based Syst 203:106087
Li Y, Guo H, Li Y (2016) A boosting based ensemble learning algorithm in imbalanced data classification. Syst Eng Theory Pract 36:189–199
Dou P, Chen Y (2017) Remote sensing imagery classification using AdaBoost with a weight vector(WV AdaBoost). Remote Sens Lett 8(8):733–742
Li K, Xie P, Liu W (2017) An ensemble evolve algorithm for imbalanced data. J Comput Theor Nanosci 14(9):4624–4629
Lee W, Jun C, Lee J (2017) Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf Sci Int J 381(C):92–103
Li K, Zhou G, Zhai J et al (2019) Improved PSO-adaboost ensemble algorithm for imbalanced data. Sensors 19(6):1476
Sun J, Li H, Fujita H et al (2020) Class-imbalanced dynamic financial distress prediction based on AdaBoost-SVM ensemble combined with SMOTE and time weighting. Inf Fusion 54:128–144
Abuassba AO, Zhang D, Luo X (2019) A heterogeneous adaboost ensemble based extreme learning machines for imbalanced data. Int J Cogn Inf Nat Intell 13(3):19–35
Tong H, Wang S, Li G (2020) Credibility based imbalance boosting method for software defect proneness prediction. Appl Sci 10(22):8059
Deng X, Xu Y, Chen L et al (2020) Dynamic clustering method for imbalanced learning based on AdaBoost. J Supercomput 76(1):9716–9738
Gu Y, Cheng L (2018) Classification of unbalanced data based on MTS-AdaBoost. Appl Res Comput 35:346–348
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Jain NK, Nangia U, Jain J (2018) A review of particle swarm optimization. J Inst Eng 99 (4):1–5
Wang F, Zhang H, Li K et al (2018) A hybrid particle swarm optimization algorithm using adaptive learning strategy. Inform Sci 436-437:162–177
Xiang Z, Shao X, Wu H et al (2020) An adaptive integral separated proportional-integral controller based strategy for particle swarm optimization. Knowl-Based Syst 195:105696
Qin C, Gu X (2020) Improved PSO algorithm based on exponential center symmetric inertia weight function and its application in infrared image enhancement. Symmetry 12(2):248
Li Z, Qiu L, Li R et al (2020) Enhancing BCI-based emotion recognition using an improved particle swarm optimization for feature selection. Sensors (Basel Switzerland) 20(11):3028
Jaradat MA, Sawaqed LS, Alzgool MM (2020) Optimization of PIDD2-FLC for blood glucose level using particle swarm optimization with linearly decreasing weight. Biomed Sig Process Control 59:101922
Serizawa T, Fujita H (2020) Optimization of convolutional neural network using the linearly decreasing weight particle swarm optimization. Machine Learning. arXiv:2001.05670
Qi Z, Meng F, Tian Y et al (2017) Adaboost-LLP: A boosting method for learning with label proportions. IEEE Trans Neural Netw Learn Syst 29(8):1–12
Suntoro J, Christanto FW, Indriyawati H (2018) Software defect prediction using AWEIG+ ADACOST Bayesian algorithm for handling high dimensional data and class imbalance problem. Int J Inf Technol Bus 1(1):36–41
Ma J, Afolabi DO, Ren J et al (2019) Predicting seminal quality via imbalanced learning with evolutionary safe-level synthetic minority over-sampling technique. Cogn Comput 2019(1). https://doi.org/10.1007/s12559-019-09657-9
Prokhorenkova L, Gusev G, Vorobev A et al (2018) CatBoost: unbiased boosting with categorical features[C] NIPS’18. In: Proceedings of the 32nd international conference on neural information processing systems, December 2018, pp 6639–6649
Wang C, Deng C, Wang S (2020) Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recogn Lett 136:190–197
Acknowledgements
The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper.
Funding
This work was also supported by grants from the National Natural Science Foundation of China (Major Program, No.51991365), the National Natural Science Foundation of China (No.61673396).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, X., Li, K. Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm. Appl Intell 52, 6477–6502 (2022). https://doi.org/10.1007/s10489-021-02708-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02708-5