Skip to main content

A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14089))

Included in the following conference series:

  • 927 Accesses

Abstract

The synthetic minority oversampling technique (SMOTE) has emerged as a significant approach to address class imbalance challenges in machine learning. However, the algorithm is afflicted by challenges such as the imbalanced distribution of minority class data and concerns regarding the quality of synthetic data. The enhanced variants combined with the clustering algorithm encounter the problems such as difficulty in determining the optimal value of hyperparameters and class overlap. So this paper proposes a new improved algorithm named NP-SMOTE. The core concept of the algorithm is as follows: initially, the FINCH algorithm is employed to cluster the minority class data into distinct clusters. Subsequently, the data within each cluster are categorized into boundary data and central data by determining the class of nearest neighbors for each minority class data. Finally, the appropriate synthesis methods are applied to generate data for these two classes of minority class data. This algorithm obviates the need for predetermined hyperparameters and circumvents the limitations of class overlap by synthesizing data from various classes in a customized manner. The algorithm exhibits robustness and superior generalizability as demonstrated by their comparison with commonly used algorithms across 6 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Du, G., Zhang, J., Luo, Z., Ma, F., Ma, L., Li, S.: Joint imbalanced classification and feature selection for hospital readmissions. Knowledge-Based Systems 200, (2020)

    Google Scholar 

  2. Wang, C., Xin, C., Xu, Z.: A novel deep metric learning model for imbalanced fault diagnosis and toward open-set classification. Knowl.-Based Syst. 220, 106925 (2021)

    Article  Google Scholar 

  3. Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., Asadpour, M.: Boosting methods for multi-class imbalanced data classification: an experimental review. J. Big Data 7(1), 1–47 (2020). https://doi.org/10.1186/s40537-020-00349-y

    Article  Google Scholar 

  4. SaÄŸlam, F., Cengiz, M.A.: A novel SMOTE-based resampling technique trough noise detection and the boosting procedure. Expert Syst. Appl. 200, 117023 (2022)

    Article  Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  6. Douzas, G., Bacao, F., Last, F.J.I.S.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 465, 1–20 (2018)

    Article  Google Scholar 

  7. Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)

    Article  MathSciNet  Google Scholar 

  8. Pruengkarn, R., Wong, K.W., Fung, C.C.: Multiclass imbalanced classification using fuzzy c-mean and SMOTE with fuzzy support vector machine. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.-S. (eds.) ICONIP 2017. LNCS, vol. 10638, pp. 67–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4_7

    Chapter  Google Scholar 

  9. Dong, Y., Wang, X.: A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets. In: Xiong, H., Lee, W.B. (eds.) KSEM 2011. LNCS (LNAI), vol. 7091, pp. 343–352. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25975-3_30

    Chapter  Google Scholar 

  10. Sarfraz, S., Sharma, V., Stiefelhagen, R.: Efficient parameter-free clustering using first neighbor relations. In: Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition, pp. 8934–8943 (2019)

    Google Scholar 

  11. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  12. Douzas, G., Bacao, F.: Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf. Sci. 501, 118–135 (2019)

    Article  Google Scholar 

  13. Pradipta, G.A., Wardoyo, R., Musdholifah, A., Sanjaya, I.N.H.: Radius-SMOTE: a new oversampling technique of minority samples based on radius distance for learning from imbalanced data. IEEE Access 9, 74763–74777 (2021)

    Article  Google Scholar 

  14. Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)

    Article  Google Scholar 

  15. Maulidevi, N.U., Surendro, K.: SMOTE-LOF for noise identification in imbalanced data classification. J. King Saud Univ.-Comput. Inf. Sci. 34, 3413–3423 (2022)

    Google Scholar 

  16. Santos, M.S., Abreu, P.H., García-Laencina, P.J., Simão, A., Carvalho, A.: A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inform. 58, 49–59 (2015)

    Article  Google Scholar 

  17. Wei, G., Mu, W., Song, Y., Dou, J.: An improved and random synthetic minority oversampling technique for imbalanced data. Knowl.-Based Syst. 248, 108839 (2022)

    Article  Google Scholar 

  18. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  19. Kunakorntum, I., Hinthong, W., Phunchongharn, P.: A synthetic minority based on probabilistic distribution (SyMProD) oversampling for imbalanced datasets. IEEE Access 8, 114692–114704 (2020)

    Article  Google Scholar 

  20. Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26, 405–425 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by Jiangsu Petrochemical Process Key Equipment Digital Twin Technology Engineering Research Center Open Project (Project number DTEC202103); Research and Development of Key Technologies of Smart Clothing Enterprise Management Cloud Platform (Project number BY2022218).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, S., Li, Z., Yuan, B., Yang, G., Wang, X., Li, N. (2023). A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14089. Springer, Singapore. https://doi.org/10.1007/978-981-99-4752-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4752-2_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4751-5

  • Online ISBN: 978-981-99-4752-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics