Skip to main content

Noise-Robust Gaussian Distribution Based Imbalanced Oversampling

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Abstract

Imbalanced data classification has become one of the hot topics in the field of data mining and machine learning. Oversampling is one of the mainstream methods to solve the imbalance problem by synthesizing new samples to balance the data distribution. However, due to the limited sample local information, the data synthetic process is risky in deteriorating the class overlap phenomenon, showing a vulnerable robustness with respect to data noise. In this paper, we propose a noise robust gaussian distribution based imbalanced oversampling (NGOS). NGOS first determines the neighborhood radius based on the global information, and then assigns sampling weights to minority class samples based on the density and the distance information within each of the neighborhoods. Finally, NGOS generates new samples with a Gaussian distribution model. We validate the effectiveness of our proposed method on the 38 KEEL datasets, DT classifier and eleven comparison methods. Experimental results show that our method outperforms the other compared methods in terms of Fmeasure, AUC, Gmean. The codes of NGOS are released in https://github.com/ytyancp/NGOS.

This work was supported in part by the National Natural Science Foundation of China under Grant 62376002.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2012)

    Article  Google Scholar 

  2. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  3. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  5. Chen, Y., Wu, K., Chen, X., Tang, C., Zhu, Q.: An entropy-based uncertainty measurement approach in neighborhood systems. Inf. Sci. 279, 239–250 (2014)

    Article  MathSciNet  Google Scholar 

  6. Folino, G., Pisani, F.S., Sabatino, P.: An incremental ensemble evolved by using genetic programming to efficiently detect drifts in cyber security datasets. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pp. 1103–1110 (2016)

    Google Scholar 

  7. García, V., Sánchez, J.S., Marqués, A., Florencia, R., Rivera, G.: Understanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced data. Exp. Syst. Appl. 158, 113026 (2020)

    Article  Google Scholar 

  8. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Exp. Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  9. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005, Part I 1. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  10. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  11. Ivan, T.: Two modifications of CNN. IEEE Trans. Syst. Man Commun. (SMC) 6, 769–772 (1976)

    MathSciNet  Google Scholar 

  12. Jurgovsky, J., et al.: Sequence classification for credit-card fraud detection. Exp. Syst. Appl. 100, 234–245 (2018)

    Article  Google Scholar 

  13. Koziarski, M.: Radial-based undersampling for imbalanced data classification. Pattern Recogn. 102, 107262 (2020)

    Article  Google Scholar 

  14. Krawczyk, B., Koziarski, M., Woźniak, M.: Radial-based oversampling for multiclass imbalanced data classification. IEEE Trans. Neural Netw. Learn. Syst. 31(8), 2818–2831 (2019)

    Article  MathSciNet  Google Scholar 

  15. Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)

    Article  Google Scholar 

  16. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)

    Article  Google Scholar 

  17. Rodriguez, D., Herraiz, I., Harrison, R., Dolado, J., Riquelme, J.C.: Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, pp. 1–10 (2014)

    Google Scholar 

  18. Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 212, 106631 (2021)

    Article  Google Scholar 

  19. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)

    Article  MathSciNet  Google Scholar 

  20. Xie, Y., Qiu, M., Zhang, H., Peng, L., Chen, Z.: Gaussian distribution based oversampling for imbalanced data classification. IEEE Trans. Knowl. Data Eng. 34(2), 667–679 (2022)

    Article  Google Scholar 

  21. Yan, Y., Jiang, Y., Zheng, Z., Yu, C., Zhang, Y., Zhang, Y.: LDAS: local density-based adaptive sampling for imbalanced data classification. Exp. Syst. Appl. 191, 116213 (2022)

    Article  Google Scholar 

  22. Yan, Y., Zhu, Y., Liu, R., Zhang, Y., Zhang, Y., Zhang, L.: Spatial distribution-based imbalanced undersampling. IEEE Trans. Knowl. Data Eng. 35, 6376–6391 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanting Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shao, X., Yan, Y. (2024). Noise-Robust Gaussian Distribution Based Imbalanced Oversampling. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14488. Springer, Singapore. https://doi.org/10.1007/978-981-97-0801-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0801-7_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0800-0

  • Online ISBN: 978-981-97-0801-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics