Abstract
It has been pointed out that the class imbalance problem is one of the critical areas in classification. Furthermore, existing literatures show that other factors such as class overlap, small disjuncts, and noises will aggravate classification performance when they are combined with class imbalance. In this work, we focus on the joint effects of class imbalance and class overlap, and study binary classification performances of six algorithms under different combination of imbalance ratios and overlap degrees. The experiments corroborate that different types of classifiers show distinct robustness to class imbalance and overlap degree. We arrive the conclusion that essentially the densities of different regions of data space affect the classification performance. In addition, based on observations from our experiments, we infer to changing the densities of different regions of data space should be a good way to address problem of class imbalance and class overlap.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shaukat, A.S., Usha, A.: An effective distance-based feature selection approach for imbalanced data. Appl. Intell. 50, 717–745 (2020)
Dai, Q., Liu, J.W., Shi, Y.H.: Class-overlap undersampling based on Schur decomposition for Class-imbalance problems. Expert Syst. Appl. 221, 119735 (2023)
Hoyos-Osorio, J., Alvarez-Meza, A., et al.: Relevant information undersampling to support imbalanced data classification. Neurocomputing 436, 136–146 (2021)
Li, D.-C., Wang, S.-Y., et al.: Learning class-imbalanced data with region-impurity synthetic minority oversampling technique. Inf. Sci. 607, 1391–1407 (2022)
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631(2021)
Lee, H.K., Kim, S.B.: An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst. Appl. 98, 72–83 (2018)
Barella, V.H., Garcia, L.P.: Assessing the data complexity of imbalanced datasets, Inf. Sci. 553, 83–109 (2021)
Dudjak, M., Martinović, G.: An empirical study of data intrinsic characteristics that make learning fromimbalanced data difficult. Expert Syst. with Appl. 182 (2021)
Santos, M.S., Abreu, P., et al.: A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Inf. Fus. 89, 228–253 (2023)
IBM homepage. https://www.ibm.com/topics/naive-bayes
García, V., Sánchez, J., Mollineda, R.An empirical study of the behavior of classifiers on imbalanced and overlapped datasets. In: Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, pp. 397–406(2007)
García, V., Mollineda, R.A., Sánchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11(3), 269–280(2008)
Lee, H.K., Kim, S.B.: An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst. Appl.98, 72–83(2018)
Linear Discriminant Analysis. https://www.geeksforgeeks.org/
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, USA (1995)
Yuan, B.W., Zhang, Z.L., et al.: OIS-RF: a novel overlap and imbalance sensitive random forest. Eng. Appl. Artif. Intell. 104, 104355 (2021)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. Wiley, New York (2001)
Liang, X.W., Jiang, A.P., et al.: LR-SMOTE—An improved unbalanced dataset oversampling based on K-means and SVM. Knowl.-Based Syst. 196, 105845 (2020)
Shi, S., Li, J., et al.: A hybrid imbalanced classification model based on data density. Inf. Sci. 624, 50–67 (2023)
Wei, Z., Zhang, L., Zhao, L.: Minority-prediction-probability-based oversampling techniquefor imbalanced learning. 622, 1273–1295 (2023)
Han, H., Li, W., Wang, J., Qin, G., Qin, X.: Enhance explainability of manifold learning. Neurocomputing 500, 877–895 (2022). https://doi.org/10.1016/j.neucom.2022.05.119
Acknowledgments
The authors acknowledge National Natural Science Foundation of China (Grant: 62066039), Natural Science Foundation of Qinghai Province (Grant: 2022-ZJ-925), and the “111” Project (D20035).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fan, Y., Huang, H., DangZhi, C., Ji, X., Wu, Q. (2024). An Experimental Study of the Joint Effects of Class Imbalance and Class Overlap. In: Han, H., Baker, E. (eds) Next Generation Data Science. SDSC 2023. Communications in Computer and Information Science, vol 2113. Springer, Cham. https://doi.org/10.1007/978-3-031-61816-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-61816-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61815-4
Online ISBN: 978-3-031-61816-1
eBook Packages: Computer ScienceComputer Science (R0)