skip to main content
10.1145/3656766.3656820acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbarConference Proceedingsconference-collections
research-article

A Hybrid Nearest Neighbor Based SMOTE Oversampling Algorithm

Published: 01 June 2024 Publication History

Abstract

The classic SMOTE method is susceptible to noise, and the quality of the generated sample heavily relies on the quality of the original sample. However, the traditional KNN noise filtering technology is not effective for the data with complex data distribution. Therefore, a new SMOTE oversampling technique based on near-neighbor mixed noise reduction is proposed in this paper (MNR-SMOTE). The advantage of the MNR-SMOTE algorithm is that noise filtering is performed on both the minority and majority samples, thus ensuring that SMOTE produces high-quality samples. The experimental results on 10 UCI datasets show that MNR-SMOTE is superior to 10 other advanced oversamplings in AUC, BACC. The effectiveness of the proposed algorithm for imbalanced data classification is verified.

References

[1]
Min Li, An Xiong, Lei Wang and Shaobo Deng 2020 ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification Knowledge-Based Systems 196(1) pp 105-118.
[2]
Asniar, Nur U M and Kridanto S 2021 SMOTE-LOF for noise identification in imbalanced data classification Journal of King Saud University – Computer and Information Sciences 9 pp 300-309.
[3]
Yihong Li Yunpeng Wang Tao Li Beibei Li and Xiaolong Lan 2021 SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique Knowledge-Based Systems Part I pp 205-219.
[4]
Fatih S and Mehmet A C 2022 A novel SMOTE-based resampling technique trough noise detection and the boosting procedure Expert Systems with Applications 3 pp 109-124.
[5]
Junnan L, Qingsheng Z, Quanwang W, Zhiyong Z, Yanlu G, Ziqing H and Fan Z 2021 SMOTE-NaN-DE: Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution Knowledge-Based Systems Part I pp 225-238.
[6]
Ruizhi Z, Shaowu L, Baokang Y, Puliang Y and Xiaoqi T 2023 A density-based oversampling approach for class imbalance and data overlap Journal Pre-proofs 352(2) pp 771-786.
[7]
Kourou K, Exarchos T P, Exarchos K P, Karamouzis M V and Fotiadis D I 2015 Machine learning applications in cancer prognosis and prediction Computational and Structural Biotechnology Journal 13 pp 8-17.
[8]
Sánchez D, Vila M A, Cerda L and Serrano J M 2009 Association rules applied to credit card fraud detection Expert Systems with Applications 8 pp 3630-3640,
[9]
Xialin W, Yanying L, Jiaoni Z, Baoshuang Z and Huanhuan G 2023 An oversampling method based on differential evolution and natural neighbors Applied Soft Computing 6(2) pp10-30.
[10]
Ashfaq R A R, Wang X Z, Huang J Z, Abbas H and He Y L 2017 Fuzziness based semi supervised learning approach for Intrusion Detection System Information Sciences 378(1) pp 484-497.
[11]
Hart P E 1968 The Condensed Nearest Neighbour Rule IEEE Transactions on Information Theory 14(5) pp 515-516.
[12]
Chawla N V, Bowyer K W, Hall L O and Kegelmeyer W P 2002 SMOTE: Synthetic Minority Over-sampling Technique Journal of Artificial Intelligence Research 16 pp321-357.
[13]
Batista G E A P A, Prati R C, Monard M C 2004 A study of the behavior of several methods for balancing machine learning training data SIGKDD Explorations 6(1) pp20-29.
[14]
Wilson D L 2009 A symptotic Properties of Nearest Neighbor Rules Using Edited Data IEEE Transactions on Systems Man and Cybernetics 2 pp 408-421.
[15]
Ramentol E, Caballero Y, Bello R, Herrera F 2012 SMOTE-RSB: A Hybrid Preprocessing Approach based on Oversampling and Undersampling for High Imbalanced Data-Sets using SMOTE and Rough Sets Theory Knowledge and Information Systems 33(2) pp 245-265.
[16]
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C 2009 Safe-level-SMOTE: Safe-levelsynthetic minority over-sampling technique for handling the class imbalanced problem Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining 9 pp 475-482.
[17]
Han H, Wang W Y and Mao B H 2005 Borderline-SMOTE: A New Over-sampling Method in Imbalanced Data Sets Learning Proceedings of the International Conference on Intelligent Computing Part I pp 878-887.

Index Terms

  1. A Hybrid Nearest Neighbor Based SMOTE Oversampling Algorithm

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBAR '23: Proceedings of the 2023 3rd International Conference on Big Data, Artificial Intelligence and Risk Management
    November 2023
    1156 pages
    ISBN:9798400716478
    DOI:10.1145/3656766
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICBAR 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 14
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media