Skip to main content

Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13501))

Included in the following conference series:

  • 1251 Accesses

Abstract

The paper focuses on methods and algorithms for oversampling two-classes imbalanced datasets. We propose a taxonomy for oversampling approaches and review state-of-the-art algorithms. The paper discusses also some strengths and weaknesses of the oversampling methods. A computational experiment aims at comparing the performance of several oversampling algorithms. Conclusions discuss possible directions for future developments in the field of balancing imbalanced datasets to achieve better performance when mining them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Guo, H., Li, Y., Jennifer Shang, G., Mingyun, H.Y., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  • Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  • Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  • Han, H., Wang, W., Mao, B.: Borderline-SMOTE: A new oversampling method in imbalanced data sets learning. In: Advances in Intelligent Computing, International Conference on Intelligent Computing 2005, Hefei, China, Proceedings, Part I, pp. 878–887 (2005)

    Google Scholar 

  • Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  • Maldonado, S., Vairetti, C., Fernandez, A., Herrera, F.: FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification. Pattern Recogn. 124, 108511 (2022)

    Article  Google Scholar 

  • He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, part of the IEEE World Congress on Computational Intelligence. IEEE, Hong Kong, China, pp. 1322–1328 (2008)

    Google Scholar 

  • Barus, S., Islam, M.M., Yao, X., Murase, K.: Mwmote–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)

    Article  Google Scholar 

  • Nekooeimehr, I., Lai-Yuen, S.K.: Adaptive semi-unsupervised weighted over-sampling (A-SUWO) for imbalanced datasets. Expert Syst. Appl. 46, 405–416 (2016)

    Article  Google Scholar 

  • Gao, M., Hong, X., Chen, S., J. Harris, C.J.: Probability density function estimation based over-sampling for imbalanced two class problems. In: The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, pp. 1 – 8 (2012)

    Google Scholar 

  • Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: Density-based synthetic minority over-sampling technique. Appl. Intell. 36, 664–684 (2012)

    Article  Google Scholar 

  • Pan, T., Zhao, J., Wu, W., Yang, J.: Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf. Sci. 512, 1214–1233 (2020)

    Article  Google Scholar 

  • Wang, X., Jian, X., Zeng, T., Jing, L.: Local distribution-based adaptive minority over-sampling for imbalanced data classification. Neurocomputing 422, 200–213 (2021)

    Article  Google Scholar 

  • Sharma, S., Bellinger, C., Krawczyk, B., Zaiane, O., Japkowicz, N.: Synthetic over-sampling with the majority class: A new perspective on handling extreme imbalance. In: 2018 IEEE International Conference on Data Mining, pp. 448–456. IEEE, Singapore (2018)

    Google Scholar 

  • Islam, A., Belhaouari, S.B., Rehman, A.U., Bensmail, H.: KNNOR: an oversampling technique for imbalanced datasets. Appl. Soft Comput. 115, 108288 (2022)

    Article  Google Scholar 

  • Sadhukhan, P., Palit, S.: Adaptive learning of minority class prior to minority over-sampling. Pattern Recogn. Lett. 136, 16–24 (2020)

    Article  Google Scholar 

  • Das, B., Krishnan, N.C., Cook, D.J.: Racog and wRacog: two probabilistic over-sampling techniques. IEEE Trans. Knowl. Data Eng. 27(1), 222–234 (2014)

    Article  Google Scholar 

  • Jiang, Z., Pan, T., Zhang, C., Yang, J.: A new oversampling method based on the classification contribution degree. Symmetry 13, 194 (2021)

    Article  Google Scholar 

  • Borowska, K., Stepaniuk, J.: A rough-granular approach to the imbalanced data classification problem. Appl. Soft Comput. J. 83, 105607 (2019)

    Article  Google Scholar 

  • Chen, H., Li, T., Fan, X., Luo, C.: Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 483, 1–20 (2019)

    Article  Google Scholar 

  • Fu, Y.-G., Ye, J.-F., Yin, Z.-F., Chen, L.-J., Wang, Y.-M., Liu, G.-G.: Construction of EBRB classifier for imbalanced data based on Fuzzy C-Means clustering. Knowl.-Based Syst. 234, 107590 (2021)

    Article  Google Scholar 

  • Wang, K.-F., An J, Wei, Z., Cui, C., Ma, X.-H., Ma ,C., Bao, H.-Q.: Deep learning-based Imbalanced classification with fuzzy support vector machine. Front. Bioeng. Biotechnol. 9, 802712 (2022)

    Google Scholar 

  • Tang, B., He, H.: Kerneladasyn: kernel based adaptive synthetic data generation for imbalanced learning. In: IEEE Congress on Evolutionary Computation, CEC 2015, pp. 664 – 671. IEEE, Sendai, Japan (2015)

    Google Scholar 

  • Perez-Ortiz, M., Gutierrez, P.A., Tino, P., Hervas-Martinez, C.: Oversampling the minority class in the feature space. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1947–1961 (2016)

    Article  MathSciNet  Google Scholar 

  • Mathew, J., Pang, C.K., Luo, M., Leong, W.H.: Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 29, 4065–4076 (2018)

    Article  Google Scholar 

  • Liang, P., Li, W., Hu, J.: Oversampling the minority class in a multi-linear feature space for imbalanced data classification. IEEE J. Trans. Electr. Electr. Eng. 13, 1483–1491 (2018)

    Article  Google Scholar 

  • Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)

    Article  Google Scholar 

  • Koziarski, M.: Potential Anchoring for imbalanced data classification. Pattern Recogn. 120, 108114 (2021)

    Article  Google Scholar 

  • Ye, X., Li, H., Imakura, A., Sakurai, T.: An oversampling framework for imbalanced classification based on Laplacian eigenmaps. Neurocomputing 399, 107–116 (2020)

    Article  Google Scholar 

  • Jedrzejowicz, J., Jedrzejowicz, P.: GEP-based classifier for mining imbalanced data. Expert Syst. Appl. 164, 114058 (2021)

    Article  Google Scholar 

  • Jedrzejowicz, J., Jedrzejowicz, P.: Imbalanced data mining using oversampling and cellular GEP ensemble. In: Nguyen, N.T., Iliadis, L., Maglogiannis, I., TrawiÅ„ski, B. (eds.) ICCCI 2021. LNCS (LNAI), vol. 12876, pp. 360–372. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88081-1_27

    Chapter  Google Scholar 

  • Keel Dataset Repository. https://sci2s.ugr.es/keel/datasets.php. Accessed 07 Mar 2022

  • Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. 13(2), 87–129 (2001)

    MathSciNet  MATH  Google Scholar 

  • Yi, X., Xu, Y., Hu, Q., et al.: ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection. Complex Intell. Syst. 8, 2247–2272 (2022). https://doi.org/10.1007/s40747-021-00638-w

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Jedrzejowicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jedrzejowicz, P. (2022). Oversampling for Mining Imbalanced Datasets: Taxonomy and Performance Evaluation. In: Nguyen, N.T., Manolopoulos, Y., Chbeir, R., Kozierkiewicz, A., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2022. Lecture Notes in Computer Science(), vol 13501. Springer, Cham. https://doi.org/10.1007/978-3-031-16014-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16014-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16013-4

  • Online ISBN: 978-3-031-16014-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics