skip to main content
10.1145/3573942.3574014acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

An Anti-Noise Hybrid Clustering Oversampling Technique for Imbalanced Data Classification

Authors Info & Claims
Published:16 May 2023Publication History

ABSTRACT

Oversampling techniques have always been favored in the field of imbalanced data classification. Noise processing plays an important role in the field of unbalanced data classification because the noise data directly affects the distribution of newly synthesized samples. We propose a new anti-noise hybrid clustering oversampling technique for imbalanced data classification to synthesize high-quality samples (ANCO). The algorithm is the first to propose a combination of noise filtering and optimization in an oversampling method. Firstly, based on the sample location information, the processing strategy of noise filtering and optimization is designed. Then, sample nearest neighbor interpolation is used to create a new sample. To compare the performance of our approach with representative oversampling, five datasets with variable imbalance rates in KEEL are chosen for testing. The results show that the ANCO algorithm improves the classifier's overall performance.

References

  1. Agustianto K, Destarianto P. Imbalance Data Handling using Neighborhood Cleaning Rule (NCL) Sampling Method for Precision Student Modeling[C]//2019 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE). IEEE, 2019: 86-89.Google ScholarGoogle Scholar
  2. Munkhdalai T, Namsrai O E, Ryu K H. Self-training in significance space of support vectors for imbalanced biomedical event data[J]. BMC bioinformatics, 2015, 16(7): 1-8.Google ScholarGoogle Scholar
  3. Siers M J, Islam M Z. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem[J]. Information Systems, 2015, 51: 62-71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Han W, Huang Z, Li S, Distribution-sensitive unbalanced data oversampling method for medical diagnosis[J]. Journal of medical Systems, 2019, 43(2): 1-10.Google ScholarGoogle Scholar
  5. Ramentol E, Gondres I, Lajes S, Fuzzy-rough imbalanced learning for the diagnosis of High Voltage Circuit Breaker maintenance: The SMOTE-FRST-2T algorithm[J]. Engineering Applications of Artificial Intelligence, 2016, 48: 134-139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Moreo A, Esuli A, Sebastiani F. Distributional random oversampling for imbalanced text classification[C]//Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016: 805-808.Google ScholarGoogle Scholar
  7. Chawla N V, Bowyer K W, Hall L O, SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16: 321-357.Google ScholarGoogle Scholar
  8. Li Y, Xu W, Li W, Research on hybrid intrusion detection method based on the ADASYN and ID3 algorithms[J]. Mathematical Biosciences and Engineering, 2022, 19(2): 2030-2042.Google ScholarGoogle ScholarCross RefCross Ref
  9. Seng Z, Kareem S A, Varathan K D. A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification[J]. Expert Systems with Applications, 2021, 168: 114246.Google ScholarGoogle ScholarCross RefCross Ref
  10. Zuech R, Hancock J, Khoshgoftaar T M. Detecting web attacks using random undersampling and ensemble learners[J]. Journal of Big Data, 2021, 8(1): 1-20.Google ScholarGoogle ScholarCross RefCross Ref
  11. Lin W C, Tsai C F, Hu Y H, Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17-26.Google ScholarGoogle ScholarCross RefCross Ref
  12. Li J, Zhu Q, Wu Q, A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors[J]. Information Sciences, 2021, 565: 438-455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Li X, Zhou Q. Research on Improving SMOTE Algorithms for Unbalanced Data Set Classification[C]//2019 International Conference on Electronic Engineering and Informatics (EEI). IEEE, 2019: 476-480.Google ScholarGoogle Scholar
  14. Xiaolong X U, Wen C, Yanfei S U N. Over-sampling algorithm for imbalanced data classification[J]. Journal of Systems Engineering and Electronics, 2019, 30(6): 1182-1191.Google ScholarGoogle ScholarCross RefCross Ref
  15. Chen B, Xia S, Chen Z, RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise[J]. Information Sciences, 2021, 553: 397-428.Google ScholarGoogle ScholarCross RefCross Ref
  16. Wang C R, Shao X H. An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem[J]. IEEE Access, 2020, 9: 5069-5082.Google ScholarGoogle ScholarCross RefCross Ref
  17. Li M, Xiong A, Wang L, ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification[J]. Knowledge-Based Systems, 2020, 196: 105818.Google ScholarGoogle ScholarCross RefCross Ref
  18. Xie Y, Qiu M, Zhang H, Gaussian distribution based oversampling for imbalanced data classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2020.Google ScholarGoogle Scholar
  19. Wei J, Huang H, Yao L, NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems[J]. Expert Systems with Applications, 2020, 158: 113504.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sáez J A, Luengo J, Stefanowski J, SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J]. Information Sciences, 2015, 291: 184-203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Guan H, Zhang Y, Xian M, SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling[J]. Applied Intelligence, 2021, 51(3): 1394-1409.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zeng M, Zou B, Wei F, Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data[C]//2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS). IEEE, 2016: 225-228.Google ScholarGoogle Scholar
  23. Liu S, Zhang K. Under-sampling and feature selection algorithms for S2SMLP[J]. IEEE Access, 2020, 8: 191803-191814.Google ScholarGoogle ScholarCross RefCross Ref
  24. Zheng W, Zhao H. Cost-sensitive hierarchical classification for imbalance classes[J]. Applied Intelligence, 2020, 50(8): 2328-2338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhang C, Tan K C, Li H, A cost-sensitive deep belief network for imbalanced classification[J]. IEEE transactions on neural networks and learning systems, 2018, 30(1): 109-122.Google ScholarGoogle Scholar
  26. Yuan Z, Zhao P. An improved ensemble learning for imbalanced data classification[C]//2019 IEEE 8th joint international information technology and artificial intelligence conference (ITAIC). IEEE, 2019: 408-411.Google ScholarGoogle Scholar

Index Terms

  1. An Anti-Noise Hybrid Clustering Oversampling Technique for Imbalanced Data Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
      September 2022
      1221 pages
      ISBN:9781450396899
      DOI:10.1145/3573942

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 May 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format