Skip to main content
Log in

Adaptively weighted three-way decision oversampling: A cluster imbalanced-ratio based approach

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Oversampling is an effective method to fulfill imbalanced learning, owing to its easy-to-go capability of achieving the balance by synthesizing new samples. However, precise synthesizing in oversampling is always a significant yet challenging task due primarily to various problems such as noise samples, within-class imbalance, and selection of boundary samples. In order to solve these problems, this paper proposes a new improved oversampling method, called adaptively weighted three-way decision oversampling (AWTDO) for imbalanced learning. The working principle of the proposed AWTDO method includes three main steps. Firstly, remove the noise sample roughly, implement K-means clustering algorithm on raw data to establish multi-clusters, and calculate imbalanced ratio of each cluster. Secondly, classify all clusters into three categories according to their imbalanced ratios and three-way decision, such as positive domain, boundary domain, and negative domain. Accordingly, assign the number of synthetic samples distinguishably to each cluster regarding its category. Thirdly, determinatively select the target minority sample in each cluster and generate the new synthetic samples by using the stochastic linear interpolation technique according to different sampling weight. Finally, some comparative experiments on public datasets have shown that the proposed AWTDO method outperforms nine state-of-the-art oversampling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Han W H, Huang Z Z, Li S D, Jia Y (2019) Distribution-sensitive unbalanced data oversampling method for medical diagnosis. J Med Syst 43:10

    Article  Google Scholar 

  2. Xiao J, Xie L, He C Z, Jiang X Y (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst Appl 39(3):3668–3675

    Article  Google Scholar 

  3. Zheng Z, Wu X, Srihari R K (2004) Feature selection for text categorization on imbalanced data. Sigkdd Explor 6(1):80–89

    Article  Google Scholar 

  4. He H, Garcia E A (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21 (9):1263–1284

    Article  Google Scholar 

  5. Dai F F, Song Y, Si W Y, Yang G S, Hu J H, Wang X L, Improved C B S O (2021) A distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data. Inf Sci 569:70–89

    Article  MathSciNet  Google Scholar 

  6. Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W F (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  7. Chen Z, Lin T, Chen R, Xie Y, Xu H (2017) Creating diversity in ensembles using synthetic neighborhoods of training samples. Appl Intell 47(2):570–583

    Article  Google Scholar 

  8. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484

    Article  Google Scholar 

  9. Han H, Wang W -Y, Mao B -H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing. Springer, pp 878–887

  10. He H, Bai Y, Garcia E A, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, Hong Kong, pp 1322–1328

  11. Barua S, Islam M M, Yao X, Murase K (2014) MWMOTE– Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425

    Article  Google Scholar 

  12. Douzas G, Bacao F (2017) Self-organizing map oversampling (somo) for imbalanced data set learning. Expert Syst Appl 82:40–52

    Article  Google Scholar 

  13. Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on K-means and SMOTE. Inf Sci 465:1–20

    Article  Google Scholar 

  14. Lichman M (2016) UCI Machine Learning Repository, [Online], Available: http://archive.ics.uci.edu/ml

  15. Fix E, Hodges JL (1951) Discriminatory analysis-nonparametric discrimination: Consistency properties, Technical Report 4, USAF School of Aviation Medicine. Randolph Field 57(3)

  16. Friedman J H (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29 (5):1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  17. McCullagh P (1984) Generalized linear models. Eur J Oper Res 16(3):285–292

    Article  MathSciNet  MATH  Google Scholar 

  18. Guo Y, Hastie T, Tibshirani R (2007) Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8:86–100

    Article  MATH  Google Scholar 

  19. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-asia Conference on Advances in Knowledge Discovery and Data Mining, pp 475–482

  20. Holte R C, Acker L, Porter B W (1989) Concept learning and the problem of small disjuncts. In: Proceedings of the IJCAI, vol 89, 813–818

  21. Maciejewski T, Stefanowski J (2011) Local neighbourhood extension of smote for mining imbalanced data. In: Proceedings of the Computational Intelligence and Data Mining, Paris, pp 11-15

  22. Cieslak D A, Chawla N V, Striegel A (2006) Combating imbalance in network intrusion datasets. In: IEEE Int Conf Granular Comput, pp 732–737

  23. Ma L, Fan S H (2017) CURE-SMOTE Algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinforma 18:18

    Article  Google Scholar 

  24. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2011) DBSMOTE: Density-based synthetic minority over-sampling technique. Appl Intell 36(3):664–684

    Article  Google Scholar 

  25. Douzas G, Rauch R, Bacao F (2021) G-SOMO: an oversampling approach based on self-organized maps and geometric SMOTE. Expert Syst Appl:183

  26. Li J N, Zhu Q S, Wu Q W, Fan Z (2021) A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors. Inf Sci 565:438–455

    Article  MathSciNet  Google Scholar 

  27. Nekooeimehr I, Lai-Yuen S K (2016) Adaptive semi-unsupervised weighted oversampling (a-SUWO) for imbalanced datasets. Expert Syst Appl 46:405–416

    Article  Google Scholar 

  28. Wei J A, Huang H S, Yao L G (2020) NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems. Expert Syst Appl 158:113–504

    Article  Google Scholar 

  29. Yao Y Y, Wong S K M, Lingras P (1990) A decision-theoretic rough set model. In: The 5th international symposium on methodologies for intelligent systems, vol 5, pp 17–25

  30. Yao Y Y (2010) Three-way decisions with probabilistic rough sets. Inf Sci 180(3):341–353

    Article  MathSciNet  Google Scholar 

  31. Yao Y Y (2011) The superiority of three-way decisions in probabilistic rough set models. Inf Sci 181(6):1080–1096

    Article  MathSciNet  MATH  Google Scholar 

  32. Yao Y Y (2012) An outline of a theory of three-way decisions. In: The 8th Int Conf Rough Sets Current Trends Comput 181(6):1–17

  33. Yu H, Wang Y (2012) Three-way decisions method for overlapping clustering. In: Proceedings of international conference on rough sets and current trends in computing, pp 277–286

  34. Yu H, Zhang C, Wang G (2016) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91(1):189–203

    Article  Google Scholar 

  35. Yu H, Chen Y, Lingras P, Wang G (2019) A three-way cluster ensemble approach for large-scale data. Int J Approx Reason 115:32–49

    Article  MathSciNet  MATH  Google Scholar 

  36. Liu D, Yao Y Y, Li T R (2011) Three-way investment decisions with decision-theoretic rough sets. Int J Comput Intell Syst 4:66–74

    Google Scholar 

  37. Lurie J D, Sox H C (1999) Principles of medical decision making. Spine 24(5):493–498

    Article  Google Scholar 

  38. Yan Y T, Wu Z B, Du X Q (2019) A three-way decision ensemble method for imbalanced data oversampling. Int J Approx Reason 107:1–16

    Article  MathSciNet  MATH  Google Scholar 

  39. Guo H, Viktor H L (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-IM approach. ACM Sigkdd Explor Newsl 6(1):30–39

    Article  Google Scholar 

  40. Gong J (2021) A novel oversampling technique for imbalanced learning based on SMOTE and genetic algorithm. In: Mantoro T, Lee M, Ayu MA, Wong KW, Hidayanto AN (eds) Neural Information Processing, ICONIP 2021, LNCS 13110. Springer, pp 201–212

  41. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. Proc Int Conf Mach Learn:179–186

  42. Dunn J C (1973) A fuzzy relative of the ISODATA process and its use in detecting compact Well-Separated clusters. J Cybern 3(3):32–57

    Article  MathSciNet  MATH  Google Scholar 

  43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  44. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675

    Article  MATH  Google Scholar 

  45. Nemenyi P B (1963) Distribution-free multiple comparisons. PhD thesis, Princeton University

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants (No.62073223).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinli Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Gong, J., Song, Y. et al. Adaptively weighted three-way decision oversampling: A cluster imbalanced-ratio based approach. Appl Intell 53, 312–335 (2023). https://doi.org/10.1007/s10489-022-03394-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03394-7

Keywords

Navigation