Skip to main content
Log in

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Learning a classifier from class-imbalance data is an important challenge. Among the existing solutions, SMOTE has received great praise and features an extensive range of practical applications. However, SMOTE and its extensions usually degrade due to noise generation and within-class imbalances. Although multiple variations of SMOTE are developed, few of them can solve the above problems at the same time. Besides, many improvements of SMOTE are based on advanced models with introducing external parameters. To solve imbalances between and within classes while overcoming noise generation, a novel synthetic minority oversampling technique based on relative and absolute densities is proposed. First, a novel noise filter based on relative density is proposed to remove noise and smooth class boundary. Second, sparsity and boundary weights are proposed and calculated by relative and absolute densities, respectively. Third, normalized weights based on absolute and sparse weights are proposed to generate more synthetic minority class samples in the class boundary and sparse regions. The main advantages of the proposed algorithm are that: (a) It can effectively avoid noise generation while removing noise and smoothing class the boundary in original data. (b) It generates more synthetic samples in class boundaries and sparse regions; (c) No additional parameters are introduced. Intensive experiments prove that SMOTE-RD outperforms 7 popular oversampling methods in average AUC, average F-measure and average G-mean on real data sets with the acceptable time cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets and third-party libraries used in the experiments are open sources and accessible online (http://archive.ics.uci.edu/ml/datasets.php).

References

  1. Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl-Based Syst 184(15):104895. https://doi.org/10.1016/j.knosys.2019.104895

    Article  Google Scholar 

  2. Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399

    Article  Google Scholar 

  3. Chen JK, Chin YH (1999) A concurrency control algorithm for nearest neighbor query. Inf Sci 114(1–4):187–204

    Article  Google Scholar 

  4. Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybernet 39(1):281–288

    Article  Google Scholar 

  5. Breiman LI, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees (cart). Biometrics 40(3):358

    MATH  Google Scholar 

  6. Xu Z, Shen D, Nie T, Kou Y (2020) A hybrid sampling algorithm combining m-smote and enn based on random forest for medical imbalanced data. J Biomed Inform 107:103465. https://doi.org/10.1016/j.jbi.2020.103465

    Article  Google Scholar 

  7. Alqatawna J, Faris H, Jaradat K, Al-Zewairi M, Adwan O (2015) Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int J Commun Netw Syst Sci 8(5):118–129

    Google Scholar 

  8. Wang L, Wu C (2020) Dynamic imbalanced business credit evaluation based on learn++ with sliding time window and weight sampling and FCM with multiple kernels. Inf Sci 520:305–323

    Article  Google Scholar 

  9. Pérez-Ortiz M, Sáez A, Sánchez-Monedero J, Gutiérrez PA, Hervás-Martínez C (2016) Tackling the ordinal and imbalance nature of a melanoma image classification problem. 2016 international joint conference on neural networks (IJCNN), Vancouver, pp 2156–2163. https://doi.org/10.1109/IJCNN.2016.7727466

    Book  Google Scholar 

  10. Elreedy D, Atiya AF (2019) A comprehensive analysis of Syntheic minority oversampling technique (SMOTE) for handling class imbalance. Inf Sci 505:32–64

    Article  Google Scholar 

  11. Fan W, Stolfo S, Zhang J, Chan P (1999) Adacost: misclassification cost-sensitive boosting. International conference on machine learning 99:97–105

  12. Wang KJ, Adrian AM, Chen KH, Wang KM (2015) A hybrid classifier combining borderline-smote with airs algorithm for estimating brain metastasis from lung cancer: a case study in Taiwan. Comput Methods Prog Biomed 119(2):63–76

    Article  Google Scholar 

  13. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač N, Gamberger D, Todorovski L, Blockeel H (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, vol 2838, pp 22–26. https://doi.org/10.1007/978-3-540-39804-2_12

  14. Zeng ZQ, Gao J (2009) Improving SVM Classification with Imbalance Data Set. Conference: Proceedings of the 16th International Conference on Neural Information Processing: Part I, pp 389–398

  15. Raghuwanshi BS, Shukla S (2020) SMOTE based class-specific extreme learning machine for imbalanced learning. Knowl-Based Syst 187:104814. https://doi.org/10.1016/j.knosys.2019.06.022

    Article  Google Scholar 

  16. Xie X, Liu H, Zeng S, Lin L, Li W (2020) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl-Based Syst 213:106689. https://doi.org/10.1016/j.knosys.2020.106689

    Article  Google Scholar 

  17. Seng Z, Kareem SA, Varathan KD (2021) A neighborhood Undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst Appl 168:114246. https://doi.org/10.1016/j.eswa.2020.114246

    Article  Google Scholar 

  18. Jia C, Zuo Y (2017) S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 7:84–89

    Article  Google Scholar 

  19. Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149

    Article  Google Scholar 

  20. Kamarulzalis AH, Razali MHM, Moktar B (2018) Data pre-processing using smote technique for gender classification with imbalance hu’s moments features, IISA 2018: Advances in Intelligent, Interactive Systems and Applications, Springer, Singapore, pp 3510355

  21. Liu C, Wu J, Mirador L, Song Y, Hou W (2018) Classifying dna methylation imbalance data in cancer risk prediction using smote and tomek link methods. International Conference of Pioneering Computer Scientists, Engineers and Educators, pp 1–9

  22. Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) Lvq-smote-learning vector quantization based synthetic minority over-sampling technique for biomedical data. BioData Min 6(1):1–10

    Article  Google Scholar 

  23. Zhang J, Li X (2017) Phishing detection method based on borderline-smote deep belief network. In: Wang G, Atiquzzaman M, Yan Z, Choo KK (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2017. Lecture Notes in Computer Science, pp 45–53

  24. Georgios D, Fernando B, Felix L (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20

    Article  Google Scholar 

  25. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning, Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. IEEE, pp 1322–1328

  26. Chen B, Xia S, Chen Z, Wang B, Wang G (2020) RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. Inf Sci. https://doi.org/10.1016/j.ins.2020.10.013

  27. Pan T, Zhao J, Wu W, Yang J (2020) Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf Sci 512:1214–1233

    Article  Google Scholar 

  28. Li J, Zhu Q, Wu Q, Zhu F (2021) A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors. Inf Sci 565:438–455

    Article  MathSciNet  Google Scholar 

  29. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29

    Article  Google Scholar 

  30. Sáeza JA, Luengob J, Stefanowskic J, Herreraa F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering [J]. Inf Sci 291(10):184–203

    Article  Google Scholar 

  31. Xia S, Xiong Z, Luo Y, Dong L, Xing C (2015) Relative density based support vector machine. Neurocomputing 149(Part C):1424–1432

    Article  Google Scholar 

  32. Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I, pp 878–887

  33. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C, (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem, Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482

  34. Ma L, Fan SH (2017) CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinform 18(18):169

    Article  Google Scholar 

  35. Xu Z, Shen D, Nie T, Kou Y, Yin N, Han X (2021) A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf Sci 572:574–589

    Article  MathSciNet  Google Scholar 

  36. Li J, Zhu Q, Wu Q, Zhang Z, Gong Y, He Z, Zhu F (2021) Smote-nan-de: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution. Knowl-Based Syst 223(8):107056

    Article  Google Scholar 

  37. Puntumapon K, Waiyamai K (2012) A pruning-based approach for searching precise and generalized region for synthetic minority over-sampling, advances in knowledge discovery and data mining. Springer, Berlin Heidelberg

    Google Scholar 

  38. Rivera WA (2017) Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Inf Sci 408:146–161

    Article  Google Scholar 

  39. Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Commun SMC-6:769–772

    Article  MathSciNet  MATH  Google Scholar 

  40. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC 2(3):408–421

    Article  MathSciNet  MATH  Google Scholar 

  41. Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques [J]. J Comput Sci Technol 22:387–396

    Article  Google Scholar 

  42. Xu W, Dong L (2016) A novel relative density based support vector machine. Optik 127(22):10348–10354

    Article  Google Scholar 

  43. Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

Download references

Code availability

Code resource is available at https://github.com/liurj2021/SMOTERDCodes.git

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruijuan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, R. A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification. Appl Intell 53, 786–803 (2023). https://doi.org/10.1007/s10489-022-03512-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03512-5

Keywords

Navigation