A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

Liu, Ruijuan

doi:10.1007/s10489-022-03512-5

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

Published: 21 April 2022

Volume 53, pages 786–803, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ruijuan Liu¹

579 Accesses
4 Citations
Explore all metrics

Abstract

Learning a classifier from class-imbalance data is an important challenge. Among the existing solutions, SMOTE has received great praise and features an extensive range of practical applications. However, SMOTE and its extensions usually degrade due to noise generation and within-class imbalances. Although multiple variations of SMOTE are developed, few of them can solve the above problems at the same time. Besides, many improvements of SMOTE are based on advanced models with introducing external parameters. To solve imbalances between and within classes while overcoming noise generation, a novel synthetic minority oversampling technique based on relative and absolute densities is proposed. First, a novel noise filter based on relative density is proposed to remove noise and smooth class boundary. Second, sparsity and boundary weights are proposed and calculated by relative and absolute densities, respectively. Third, normalized weights based on absolute and sparse weights are proposed to generate more synthetic minority class samples in the class boundary and sparse regions. The main advantages of the proposed algorithm are that: (a) It can effectively avoid noise generation while removing noise and smoothing class the boundary in original data. (b) It generates more synthetic samples in class boundaries and sparse regions; (c) No additional parameters are introduced. Intensive experiments prove that SMOTE-RD outperforms 7 popular oversampling methods in average AUC, average F-measure and average G-mean on real data sets with the acceptable time cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

A survey on semi-supervised learning

Article Open access 15 November 2019

Data availability

The datasets and third-party libraries used in the experiments are open sources and accessible online (http://archive.ics.uci.edu/ml/datasets.php).

References

Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl-Based Syst 184(15):104895. https://doi.org/10.1016/j.knosys.2019.104895
Article Google Scholar
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399
Article Google Scholar
Chen JK, Chin YH (1999) A concurrency control algorithm for nearest neighbor query. Inf Sci 114(1–4):187–204
Article Google Scholar
Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybernet 39(1):281–288
Article Google Scholar
Breiman LI, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees (cart). Biometrics 40(3):358
MATH Google Scholar
Xu Z, Shen D, Nie T, Kou Y (2020) A hybrid sampling algorithm combining m-smote and enn based on random forest for medical imbalanced data. J Biomed Inform 107:103465. https://doi.org/10.1016/j.jbi.2020.103465
Article Google Scholar
Alqatawna J, Faris H, Jaradat K, Al-Zewairi M, Adwan O (2015) Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int J Commun Netw Syst Sci 8(5):118–129
Google Scholar
Wang L, Wu C (2020) Dynamic imbalanced business credit evaluation based on learn++ with sliding time window and weight sampling and FCM with multiple kernels. Inf Sci 520:305–323
Article Google Scholar
Pérez-Ortiz M, Sáez A, Sánchez-Monedero J, Gutiérrez PA, Hervás-Martínez C (2016) Tackling the ordinal and imbalance nature of a melanoma image classification problem. 2016 international joint conference on neural networks (IJCNN), Vancouver, pp 2156–2163. https://doi.org/10.1109/IJCNN.2016.7727466
Book Google Scholar
Elreedy D, Atiya AF (2019) A comprehensive analysis of Syntheic minority oversampling technique (SMOTE) for handling class imbalance. Inf Sci 505:32–64
Article Google Scholar
Fan W, Stolfo S, Zhang J, Chan P (1999) Adacost: misclassification cost-sensitive boosting. International conference on machine learning 99:97–105
Wang KJ, Adrian AM, Chen KH, Wang KM (2015) A hybrid classifier combining borderline-smote with airs algorithm for estimating brain metastasis from lung cancer: a case study in Taiwan. Comput Methods Prog Biomed 119(2):63–76
Article Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač N, Gamberger D, Todorovski L, Blockeel H (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, vol 2838, pp 22–26. https://doi.org/10.1007/978-3-540-39804-2_12
Zeng ZQ, Gao J (2009) Improving SVM Classification with Imbalance Data Set. Conference: Proceedings of the 16^th International Conference on Neural Information Processing: Part I, pp 389–398
Raghuwanshi BS, Shukla S (2020) SMOTE based class-specific extreme learning machine for imbalanced learning. Knowl-Based Syst 187:104814. https://doi.org/10.1016/j.knosys.2019.06.022
Article Google Scholar
Xie X, Liu H, Zeng S, Lin L, Li W (2020) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl-Based Syst 213:106689. https://doi.org/10.1016/j.knosys.2020.106689
Article Google Scholar
Seng Z, Kareem SA, Varathan KD (2021) A neighborhood Undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst Appl 168:114246. https://doi.org/10.1016/j.eswa.2020.114246
Article Google Scholar
Jia C, Zuo Y (2017) S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 7:84–89
Article Google Scholar
Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149
Article Google Scholar
Kamarulzalis AH, Razali MHM, Moktar B (2018) Data pre-processing using smote technique for gender classification with imbalance hu’s moments features, IISA 2018: Advances in Intelligent, Interactive Systems and Applications, Springer, Singapore, pp 3510355
Liu C, Wu J, Mirador L, Song Y, Hou W (2018) Classifying dna methylation imbalance data in cancer risk prediction using smote and tomek link methods. International Conference of Pioneering Computer Scientists, Engineers and Educators, pp 1–9
Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) Lvq-smote-learning vector quantization based synthetic minority over-sampling technique for biomedical data. BioData Min 6(1):1–10
Article Google Scholar
Zhang J, Li X (2017) Phishing detection method based on borderline-smote deep belief network. In: Wang G, Atiquzzaman M, Yan Z, Choo KK (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2017. Lecture Notes in Computer Science, pp 45–53
Georgios D, Fernando B, Felix L (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning, Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. IEEE, pp 1322–1328
Chen B, Xia S, Chen Z, Wang B, Wang G (2020) RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. Inf Sci. https://doi.org/10.1016/j.ins.2020.10.013
Pan T, Zhao J, Wu W, Yang J (2020) Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf Sci 512:1214–1233
Article Google Scholar
Li J, Zhu Q, Wu Q, Zhu F (2021) A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors. Inf Sci 565:438–455
Article MathSciNet Google Scholar
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Article Google Scholar
Sáeza JA, Luengob J, Stefanowskic J, Herreraa F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering [J]. Inf Sci 291(10):184–203
Article Google Scholar
Xia S, Xiong Z, Luo Y, Dong L, Xing C (2015) Relative density based support vector machine. Neurocomputing 149(Part C):1424–1432
Article Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I, pp 878–887
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C, (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem, Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482
Ma L, Fan SH (2017) CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinform 18(18):169
Article Google Scholar
Xu Z, Shen D, Nie T, Kou Y, Yin N, Han X (2021) A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf Sci 572:574–589
Article MathSciNet Google Scholar
Li J, Zhu Q, Wu Q, Zhang Z, Gong Y, He Z, Zhu F (2021) Smote-nan-de: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution. Knowl-Based Syst 223(8):107056
Article Google Scholar
Puntumapon K, Waiyamai K (2012) A pruning-based approach for searching precise and generalized region for synthetic minority over-sampling, advances in knowledge discovery and data mining. Springer, Berlin Heidelberg
Google Scholar
Rivera WA (2017) Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Inf Sci 408:146–161
Article Google Scholar
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Commun SMC-6:769–772
Article MathSciNet MATH Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC 2(3):408–421
Article MathSciNet MATH Google Scholar
Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques [J]. J Comput Sci Technol 22:387–396
Article Google Scholar
Xu W, Dong L (2016) A novel relative density based support vector machine. Optik 127(22):10348–10354
Article Google Scholar
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet Google Scholar

Download references

Code availability

Code resource is available at https://github.com/liurj2021/SMOTERDCodes.git

Author information

Authors and Affiliations

Department of Public Course, Chongqing Jianzhu College, Chongqing, 400072, China
Ruijuan Liu

Authors

Ruijuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruijuan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, R. A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification. Appl Intell 53, 786–803 (2023). https://doi.org/10.1007/s10489-022-03512-5

Download citation

Accepted: 04 March 2022
Published: 21 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03512-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A comparative analysis of gradient boosting algorithms

A survey on semi-supervised learning

Data availability

References

Code availability

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A comparative analysis of gradient boosting algorithms

A survey on semi-supervised learning

Data availability

References

Code availability

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation