Abstract
The paper focuses on methods and algorithms for oversampling two-classes imbalanced datasets. We propose a taxonomy for oversampling approaches and review state-of-the-art algorithms. The paper discusses also some strengths and weaknesses of the oversampling methods. A computational experiment aims at comparing the performance of several oversampling algorithms. Conclusions discuss possible directions for future developments in the field of balancing imbalanced datasets to achieve better performance when mining them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Guo, H., Li, Y., Jennifer Shang, G., Mingyun, H.Y., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Han, H., Wang, W., Mao, B.: Borderline-SMOTE: A new oversampling method in imbalanced data sets learning. In: Advances in Intelligent Computing, International Conference on Intelligent Computing 2005, Hefei, China, Proceedings, Part I, pp. 878–887 (2005)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43
Maldonado, S., Vairetti, C., Fernandez, A., Herrera, F.: FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification. Pattern Recogn. 124, 108511 (2022)
He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, part of the IEEE World Congress on Computational Intelligence. IEEE, Hong Kong, China, pp. 1322–1328 (2008)
Barus, S., Islam, M.M., Yao, X., Murase, K.: Mwmote–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)
Nekooeimehr, I., Lai-Yuen, S.K.: Adaptive semi-unsupervised weighted over-sampling (A-SUWO) for imbalanced datasets. Expert Syst. Appl. 46, 405–416 (2016)
Gao, M., Hong, X., Chen, S., J. Harris, C.J.: Probability density function estimation based over-sampling for imbalanced two class problems. In: The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, pp. 1 – 8 (2012)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: Density-based synthetic minority over-sampling technique. Appl. Intell. 36, 664–684 (2012)
Pan, T., Zhao, J., Wu, W., Yang, J.: Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf. Sci. 512, 1214–1233 (2020)
Wang, X., Jian, X., Zeng, T., Jing, L.: Local distribution-based adaptive minority over-sampling for imbalanced data classification. Neurocomputing 422, 200–213 (2021)
Sharma, S., Bellinger, C., Krawczyk, B., Zaiane, O., Japkowicz, N.: Synthetic over-sampling with the majority class: A new perspective on handling extreme imbalance. In: 2018 IEEE International Conference on Data Mining, pp. 448–456. IEEE, Singapore (2018)
Islam, A., Belhaouari, S.B., Rehman, A.U., Bensmail, H.: KNNOR: an oversampling technique for imbalanced datasets. Appl. Soft Comput. 115, 108288 (2022)
Sadhukhan, P., Palit, S.: Adaptive learning of minority class prior to minority over-sampling. Pattern Recogn. Lett. 136, 16–24 (2020)
Das, B., Krishnan, N.C., Cook, D.J.: Racog and wRacog: two probabilistic over-sampling techniques. IEEE Trans. Knowl. Data Eng. 27(1), 222–234 (2014)
Jiang, Z., Pan, T., Zhang, C., Yang, J.: A new oversampling method based on the classification contribution degree. Symmetry 13, 194 (2021)
Borowska, K., Stepaniuk, J.: A rough-granular approach to the imbalanced data classification problem. Appl. Soft Comput. J. 83, 105607 (2019)
Chen, H., Li, T., Fan, X., Luo, C.: Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 483, 1–20 (2019)
Fu, Y.-G., Ye, J.-F., Yin, Z.-F., Chen, L.-J., Wang, Y.-M., Liu, G.-G.: Construction of EBRB classifier for imbalanced data based on Fuzzy C-Means clustering. Knowl.-Based Syst. 234, 107590 (2021)
Wang, K.-F., An J, Wei, Z., Cui, C., Ma, X.-H., Ma ,C., Bao, H.-Q.: Deep learning-based Imbalanced classification with fuzzy support vector machine. Front. Bioeng. Biotechnol. 9, 802712 (2022)
Tang, B., He, H.: Kerneladasyn: kernel based adaptive synthetic data generation for imbalanced learning. In: IEEE Congress on Evolutionary Computation, CEC 2015, pp. 664 – 671. IEEE, Sendai, Japan (2015)
Perez-Ortiz, M., Gutierrez, P.A., Tino, P., Hervas-Martinez, C.: Oversampling the minority class in the feature space. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1947–1961 (2016)
Mathew, J., Pang, C.K., Luo, M., Leong, W.H.: Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 29, 4065–4076 (2018)
Liang, P., Li, W., Hu, J.: Oversampling the minority class in a multi-linear feature space for imbalanced data classification. IEEE J. Trans. Electr. Electr. Eng. 13, 1483–1491 (2018)
Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)
Koziarski, M.: Potential Anchoring for imbalanced data classification. Pattern Recogn. 120, 108114 (2021)
Ye, X., Li, H., Imakura, A., Sakurai, T.: An oversampling framework for imbalanced classification based on Laplacian eigenmaps. Neurocomputing 399, 107–116 (2020)
Jedrzejowicz, J., Jedrzejowicz, P.: GEP-based classifier for mining imbalanced data. Expert Syst. Appl. 164, 114058 (2021)
Jedrzejowicz, J., Jedrzejowicz, P.: Imbalanced data mining using oversampling and cellular GEP ensemble. In: Nguyen, N.T., Iliadis, L., Maglogiannis, I., Trawiński, B. (eds.) ICCCI 2021. LNCS (LNAI), vol. 12876, pp. 360–372. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88081-1_27
Keel Dataset Repository. https://sci2s.ugr.es/keel/datasets.php. Accessed 07 Mar 2022
Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. 13(2), 87–129 (2001)
Yi, X., Xu, Y., Hu, Q., et al.: ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection. Complex Intell. Syst. 8, 2247–2272 (2022). https://doi.org/10.1007/s40747-021-00638-w
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jedrzejowicz, P. (2022). Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation. In: Nguyen, N.T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2022. Lecture Notes in Computer Science(), vol 13501. Springer, Cham. https://doi.org/10.1007/978-3-031-16014-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-16014-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16013-4
Online ISBN: 978-3-031-16014-1
eBook Packages: Computer ScienceComputer Science (R0)