Skip to main content
Log in

An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

For multi-class imbalanced classification tasks that occur in many real-world applications, the class imbalance, which is caused by the case that some classes are not as frequent as other classes, and class overlap, which is caused by the case that some classes contains a similar number of data, are the major challenges. Both of them make the classification task complicated. The decomposition-based strategy is an effective way to improve the performance of multi-class imbalanced classification tasks. However, current studies based on this strategy have failed to solve the problems of class imbalance and overlapping simultaneously. Therefore, we propose an effective method , namely clustering-based adaptive decomposition and editing-based diversified oversamping procedure(CluAD-EdiDO), to solve the above problems in this paper. The proposed CluAD-EdiDO consists of two key components: the clustering-based adaptive decomposition and the editing-based diversified oversampling technique. The former is applied to group similar data samples of the data set into clusters(i.e., “sub-problems”). The latter is applied independently in different clusters to combat the imbalance and overlap, reducing the impact of the majority classes in overlapping region and oversampling the minority classes appropriately. Furthermore, a diversified ensemble learning framework is adopted to select the best classification algorithm for different sub-problems. Extensive experiments on 17 real-world datasets demonstrate that our method outperforms for multi-class imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Almeida TA, Almeida J, Yamakami A (2011) Spam filtering: how the dimensionality reduction affects the accuracy of naive bayes classifiers. Journal of Internet Services and Applications 1(3):183–200

    Article  Google Scholar 

  2. Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. In: Thirtieth AAAI conference on artificial intelligence

  3. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Twenty-fourth international joint conference on artificial intelligence

  4. Wu Q, Ye Y, Zhang H, Ng MK, Ho S-S (2014) Forestexter: an efficient random forest algorithm for imbalanced text categorization. Knowl-Based Syst 67:105–116

    Article  Google Scholar 

  5. Ghorai S, Mukherjee A, Dutta PK (2010) Discriminant analysis for fast multiclass data classification through regularized kernel function approximation. IEEE Transactions on Neural Networks 21(6):1020–1029

    Article  Google Scholar 

  6. Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13(2):415–425

    Article  Google Scholar 

  7. Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing, Springer, pp 41–50

  8. Clark P, Boswell R (1991) Rule induction with cn2: Some recent improvements. In: European working session on learning, Springer, pp 151–163

  9. Dietterich TG, Bakiri G (1994) Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2:263–286

    Article  Google Scholar 

  10. Vluymans S, Fernández A, Saeys Y, Cornelis C, Herrera F (2018) Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl Inf Syst 56(1):55–84

    Article  Google Scholar 

  11. Galar M, Fernández A, Barrenechea E, Herrera F (2015) Drcw-ovo: distance-based relative competence weighting combination for one-vs-one strategy in multi-class problems. Pattern Recognition 48(1):28–42

    Article  Google Scholar 

  12. Lee HK, Kim SB (2018) An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst Appl 98:72–83

    Article  Google Scholar 

  13. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357

    Article  Google Scholar 

  14. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics (3), pp 408–421

  15. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1):20–29

    Article  Google Scholar 

  16. Lin W-C, Tsai C-F, Hu Y-H, Jhang J-S (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26

    Article  Google Scholar 

  17. Zhang Z, Krawczyk B, Garcìa S., Rosales-Pérez A., Herrera F (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl-Based Syst 106:251–263

    Article  Google Scholar 

  18. Li D-C, Liu C-W, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Computers in Biology and Medicine 40(5):509–518

    Article  Google Scholar 

  19. Zhu T, Lin Y, Liu Y, Zhang W, Zhang J (2019) Minority oversampling for imbalanced ordinal regression. Knowl-Based Syst 166:140–155

    Article  Google Scholar 

  20. Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340

    Article  Google Scholar 

  21. Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227

    Google Scholar 

  22. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  23. Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE symposium on computational intelligence and data mining, IEEE, pp 324–331

  24. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(4):463–484

    Article  Google Scholar 

  25. Gónzalez S, García S, Lázaro M, Figueiras-Vidal AR, Herrera F (2017) Class switching according to nearest enemy distance for learning from highly imbalanced data-sets. Pattern Recogn 70:12–24

    Article  Google Scholar 

  26. García S, Zhang Z-L, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:22–37

    Article  MathSciNet  Google Scholar 

  27. Wang S, Yao X (2012) Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42(4):1119–1130

    Article  Google Scholar 

  28. Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833

    Article  Google Scholar 

  29. Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowledge and Data Eng 28(1):238–251

    Article  Google Scholar 

  30. Ghanem AS, Venkatesh S, West G (2010) Multi-class pattern classification in imbalanced data. In: 2010 20th international conference on pattern recognition, IEEE, pp 2881–2884

  31. Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2013) Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn 46(12):3412–3424

    Article  Google Scholar 

  32. Kang S, Cho S, Kang P (2015) Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149:677–682

    Article  Google Scholar 

  33. Datta S, Das S (2015) Near-bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw 70:39–52

    Article  Google Scholar 

  34. Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl-Based Syst 158:81–93

    Article  Google Scholar 

  35. Ackermann MR, Blömer J, Kuntze D, Sohler C (2014) Analysis of agglomerative clustering. Algorithmica 69(1):184–215

    Article  MathSciNet  Google Scholar 

  36. Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intel Inform Syst 46(3):563–597

    Article  Google Scholar 

  37. Santoso B, Wijayanto H, Notodiputro KA, Sartono B (2018) K-neighbor over-sampling with cleaning data: a new approach to improve classification performance in data sets with class imbalance. Appl Math Sci 12(10):449–460

    Google Scholar 

  38. Wu T-F, Lin C-J, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5(Aug):975–1005

    MathSciNet  MATH  Google Scholar 

  39. Triguero I, González S, Moyano JM, García López S, Alcalá Fernández J, Luengo Martín J, Fernández Hilario A, Díaz J, Sánchez L, Herrera F et al Keel 3.0: an open source software for multi-stage analysis in data mining

  40. Asuncion A, Newman D (2007) Uci machine learning repository

  41. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bulletin 1(6):80–83

    Article  Google Scholar 

  42. Zeng N, Wang Z, Zhang H, Liu W, Alsaadi FE (2016) Deep belief networks for quantitative analysis of a gold immunochromatographic strip. Cognitive Computation 8(4):684–692

    Article  Google Scholar 

  43. Chen Z, Lin T, Xia X, Xu H, Ding S (2018) A synthetic neighborhood generation based ensemble learning for the imbalanced data classification. Appl Intell 48(8):2441– 2457

    Article  Google Scholar 

  44. Akkasi A, Varoğlu E, Dimililer N (2017) Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text. Appl Intell, pp 1–14

  45. Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl-Based Syst 50:137–143

    Article  Google Scholar 

  46. Li K-S, Wang H-R, Liu K-H (2019) A novel error-correcting output codes algorithm based on genetic programming. Swarm and Evolutionary Computation 50:100564

    Article  Google Scholar 

  47. Beyan C, Fisher R (2015) Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn 48(5):1653–1672

    Article  Google Scholar 

  48. Benjilali W, Guicquero W, Jacques L, Sicard G (2019) Exploring hierarchical machine learning for hardware-limited multi-class inference on compressed measurements. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, pp 1–5

  49. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40(1):185–197

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangtao Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Zhang, L., Wei, X. et al. An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets. Appl Intell 51, 1918–1933 (2021). https://doi.org/10.1007/s10489-020-01883-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01883-1

Keywords

Navigation