Skip to main content
Log in

Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Adaptive Boosting (AdaBoost) algorithm is a widely used ensemble learning algorithm. And it can effectively improve the classification performance of ordinary datasets when combined with many other types of learning algorithms. The AdaBoost algorithm focuses on the overall classification performance of weak classifiers and aims to minimize the overall classification error. However, it ignores the imbalance in the number of samples between different classes, so it is not suitable for imbalanced data classification directly. In order to improve the classification accuracy of the minority samples in imbalanced datasets, this paper proposes an improved AdaBoost algorithm based on weight adjustment factors (AdaBoost-C). It redefines the error rate function by assigning a higher weight to the minority sample to emphasize its importance, and assigning a lower weight to the majority sample to suppress its importance. In addition, this paper also proposes an adaptive particle swarm optimization algorithm with exponential dynamic adjustment of inertia weight (EIWAPSO) to further optimize the weight of the weak classifier. It can effectively prevent the ensemble algorithm from generating redundant and useless weak classifiers to consume system resources, and avoid falling into local optimum. The experimental results show that the Recall and AUC values of the EIWAPSO-AdaBoost-C ensemble algorithm proposed in this paper have reached the highest values on datasets with different IR, and the maximum, minimum and average errors of this algorithm have reached the minimum values in a variety of comparison algorithms. Therefore, the algorithm proposed in this paper can not only effectively improve the classification accuracy of minority samples on imbalanced datasets, but also the algorithm is more stable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Belaala A, Terrissa LS, Yazid B et al (2020) Skin cancer and deep learning for dermoscopic images classification: A pilot study. J Clin Oncol 38(15suppl)

  2. Shon Batbaatar et al (2020) Kim classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry 12(1):154

    Article  MathSciNet  Google Scholar 

  3. Zakaryazad A, Duman E (2016) A profit-driven artificial neural network (ANN) with applications to fraud detection and direct marketing. Neurocomputing 175:121–131

    Article  Google Scholar 

  4. Bashir K, Li T, Yahaya M (2020) A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction . Int Arab J Inform Technol 17 (5):721–730

    Article  Google Scholar 

  5. Malhotra R, Kamal S (2019) An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data . Neurocomputing 343:120–140

    Article  Google Scholar 

  6. Bej S, Davtyan N, Wolfien M et al (2020) LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110:279–301

    Article  MathSciNet  Google Scholar 

  7. Zhu R, Guo Y, Xue JH (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recogn Lett 133:217–223

    Article  Google Scholar 

  8. Ren Y, Zhang X, Ma Y et al (2020) Full convolutional neural network based on multi-scale feature fusion for the class imbalance remote sensing image classification. Remote Sens 12(21):3547

    Article  Google Scholar 

  9. Zheng W, Zhao H (2020) Cost-sensitive hierarchical classification for imbalance classes. Appl Intell 50(1):2328–2338

    Article  Google Scholar 

  10. Zhao J, Jin J, Chen S et al (2020) A weighted hybrid ensemble method for classifying imbalanced data. Knowl-Based Syst 203:106087

    Article  Google Scholar 

  11. Li Y, Guo H, Li Y (2016) A boosting based ensemble learning algorithm in imbalanced data classification. Syst Eng Theory Pract 36:189–199

    Google Scholar 

  12. Dou P, Chen Y (2017) Remote sensing imagery classification using AdaBoost with a weight vector(WV AdaBoost). Remote Sens Lett 8(8):733–742

    Article  Google Scholar 

  13. Li K, Xie P, Liu W (2017) An ensemble evolve algorithm for imbalanced data. J Comput Theor Nanosci 14(9):4624–4629

    Article  Google Scholar 

  14. Lee W, Jun C, Lee J (2017) Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification. Inf Sci Int J 381(C):92–103

    Google Scholar 

  15. Li K, Zhou G, Zhai J et al (2019) Improved PSO-adaboost ensemble algorithm for imbalanced data. Sensors 19(6):1476

    Article  Google Scholar 

  16. Sun J, Li H, Fujita H et al (2020) Class-imbalanced dynamic financial distress prediction based on AdaBoost-SVM ensemble combined with SMOTE and time weighting. Inf Fusion 54:128–144

    Article  Google Scholar 

  17. Abuassba AO, Zhang D, Luo X (2019) A heterogeneous adaboost ensemble based extreme learning machines for imbalanced data. Int J Cogn Inf Nat Intell 13(3):19–35

    Article  Google Scholar 

  18. Tong H, Wang S, Li G (2020) Credibility based imbalance boosting method for software defect proneness prediction. Appl Sci 10(22):8059

    Article  Google Scholar 

  19. Deng X, Xu Y, Chen L et al (2020) Dynamic clustering method for imbalanced learning based on AdaBoost. J Supercomput 76(1):9716–9738

    Article  Google Scholar 

  20. Gu Y, Cheng L (2018) Classification of unbalanced data based on MTS-AdaBoost. Appl Res Comput 35:346–348

    Google Scholar 

  21. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  22. Jain NK, Nangia U, Jain J (2018) A review of particle swarm optimization. J Inst Eng 99 (4):1–5

    Google Scholar 

  23. Wang F, Zhang H, Li K et al (2018) A hybrid particle swarm optimization algorithm using adaptive learning strategy. Inform Sci 436-437:162–177

    Article  MathSciNet  Google Scholar 

  24. Xiang Z, Shao X, Wu H et al (2020) An adaptive integral separated proportional-integral controller based strategy for particle swarm optimization. Knowl-Based Syst 195:105696

    Article  Google Scholar 

  25. Qin C, Gu X (2020) Improved PSO algorithm based on exponential center symmetric inertia weight function and its application in infrared image enhancement. Symmetry 12(2):248

    Article  MathSciNet  Google Scholar 

  26. Li Z, Qiu L, Li R et al (2020) Enhancing BCI-based emotion recognition using an improved particle swarm optimization for feature selection. Sensors (Basel Switzerland) 20(11):3028

    Article  Google Scholar 

  27. Jaradat MA, Sawaqed LS, Alzgool MM (2020) Optimization of PIDD2-FLC for blood glucose level using particle swarm optimization with linearly decreasing weight. Biomed Sig Process Control 59:101922

    Article  Google Scholar 

  28. Serizawa T, Fujita H (2020) Optimization of convolutional neural network using the linearly decreasing weight particle swarm optimization. Machine Learning. arXiv:2001.05670

  29. Qi Z, Meng F, Tian Y et al (2017) Adaboost-LLP: A boosting method for learning with label proportions. IEEE Trans Neural Netw Learn Syst 29(8):1–12

    Google Scholar 

  30. Suntoro J, Christanto FW, Indriyawati H (2018) Software defect prediction using AWEIG+ ADACOST Bayesian algorithm for handling high dimensional data and class imbalance problem. Int J Inf Technol Bus 1(1):36–41

    Google Scholar 

  31. Ma J, Afolabi DO, Ren J et al (2019) Predicting seminal quality via imbalanced learning with evolutionary safe-level synthetic minority over-sampling technique. Cogn Comput 2019(1). https://doi.org/10.1007/s12559-019-09657-9

  32. Prokhorenkova L, Gusev G, Vorobev A et al (2018) CatBoost: unbiased boosting with categorical features[C] NIPS’18. In: Proceedings of the 32nd international conference on neural information processing systems, December 2018, pp 6639–6649

  33. Wang C, Deng C, Wang S (2020) Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recogn Lett 136:190–197

    Article  Google Scholar 

Download references

Acknowledgements

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper.

Funding

This work was also supported by grants from the National Natural Science Foundation of China (Major Program, No.51991365), the National Natural Science Foundation of China (No.61673396).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kewen Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Li, K. Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm. Appl Intell 52, 6477–6502 (2022). https://doi.org/10.1007/s10489-021-02708-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02708-5

Keywords

Navigation