ABSTRACT
Oversampling techniques have always been favored in the field of imbalanced data classification. Noise processing plays an important role in the field of unbalanced data classification because the noise data directly affects the distribution of newly synthesized samples. We propose a new anti-noise hybrid clustering oversampling technique for imbalanced data classification to synthesize high-quality samples (ANCO). The algorithm is the first to propose a combination of noise filtering and optimization in an oversampling method. Firstly, based on the sample location information, the processing strategy of noise filtering and optimization is designed. Then, sample nearest neighbor interpolation is used to create a new sample. To compare the performance of our approach with representative oversampling, five datasets with variable imbalance rates in KEEL are chosen for testing. The results show that the ANCO algorithm improves the classifier's overall performance.
- Agustianto K, Destarianto P. Imbalance Data Handling using Neighborhood Cleaning Rule (NCL) Sampling Method for Precision Student Modeling[C]//2019 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE). IEEE, 2019: 86-89.Google Scholar
- Munkhdalai T, Namsrai O E, Ryu K H. Self-training in significance space of support vectors for imbalanced biomedical event data[J]. BMC bioinformatics, 2015, 16(7): 1-8.Google Scholar
- Siers M J, Islam M Z. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem[J]. Information Systems, 2015, 51: 62-71.Google ScholarDigital Library
- Han W, Huang Z, Li S, Distribution-sensitive unbalanced data oversampling method for medical diagnosis[J]. Journal of medical Systems, 2019, 43(2): 1-10.Google Scholar
- Ramentol E, Gondres I, Lajes S, Fuzzy-rough imbalanced learning for the diagnosis of High Voltage Circuit Breaker maintenance: The SMOTE-FRST-2T algorithm[J]. Engineering Applications of Artificial Intelligence, 2016, 48: 134-139.Google ScholarDigital Library
- Moreo A, Esuli A, Sebastiani F. Distributional random oversampling for imbalanced text classification[C]//Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016: 805-808.Google Scholar
- Chawla N V, Bowyer K W, Hall L O, SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16: 321-357.Google Scholar
- Li Y, Xu W, Li W, Research on hybrid intrusion detection method based on the ADASYN and ID3 algorithms[J]. Mathematical Biosciences and Engineering, 2022, 19(2): 2030-2042.Google ScholarCross Ref
- Seng Z, Kareem S A, Varathan K D. A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification[J]. Expert Systems with Applications, 2021, 168: 114246.Google ScholarCross Ref
- Zuech R, Hancock J, Khoshgoftaar T M. Detecting web attacks using random undersampling and ensemble learners[J]. Journal of Big Data, 2021, 8(1): 1-20.Google ScholarCross Ref
- Lin W C, Tsai C F, Hu Y H, Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17-26.Google ScholarCross Ref
- Li J, Zhu Q, Wu Q, A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors[J]. Information Sciences, 2021, 565: 438-455.Google ScholarDigital Library
- Li X, Zhou Q. Research on Improving SMOTE Algorithms for Unbalanced Data Set Classification[C]//2019 International Conference on Electronic Engineering and Informatics (EEI). IEEE, 2019: 476-480.Google Scholar
- Xiaolong X U, Wen C, Yanfei S U N. Over-sampling algorithm for imbalanced data classification[J]. Journal of Systems Engineering and Electronics, 2019, 30(6): 1182-1191.Google ScholarCross Ref
- Chen B, Xia S, Chen Z, RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise[J]. Information Sciences, 2021, 553: 397-428.Google ScholarCross Ref
- Wang C R, Shao X H. An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem[J]. IEEE Access, 2020, 9: 5069-5082.Google ScholarCross Ref
- Li M, Xiong A, Wang L, ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification[J]. Knowledge-Based Systems, 2020, 196: 105818.Google ScholarCross Ref
- Xie Y, Qiu M, Zhang H, Gaussian distribution based oversampling for imbalanced data classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2020.Google Scholar
- Wei J, Huang H, Yao L, NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems[J]. Expert Systems with Applications, 2020, 158: 113504.Google ScholarCross Ref
- Sáez J A, Luengo J, Stefanowski J, SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J]. Information Sciences, 2015, 291: 184-203.Google ScholarDigital Library
- Guan H, Zhang Y, Xian M, SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling[J]. Applied Intelligence, 2021, 51(3): 1394-1409.Google ScholarDigital Library
- Zeng M, Zou B, Wei F, Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data[C]//2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS). IEEE, 2016: 225-228.Google Scholar
- Liu S, Zhang K. Under-sampling and feature selection algorithms for S2SMLP[J]. IEEE Access, 2020, 8: 191803-191814.Google ScholarCross Ref
- Zheng W, Zhao H. Cost-sensitive hierarchical classification for imbalance classes[J]. Applied Intelligence, 2020, 50(8): 2328-2338.Google ScholarDigital Library
- Zhang C, Tan K C, Li H, A cost-sensitive deep belief network for imbalanced classification[J]. IEEE transactions on neural networks and learning systems, 2018, 30(1): 109-122.Google Scholar
- Yuan Z, Zhao P. An improved ensemble learning for imbalanced data classification[C]//2019 IEEE 8th joint international information technology and artificial intelligence conference (ITAIC). IEEE, 2019: 408-411.Google Scholar
Index Terms
- An Anti-Noise Hybrid Clustering Oversampling Technique for Imbalanced Data Classification
Recommendations
A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
AbstractTraditional approaches tend to cause classier bias in the imbalanced data set, resulting in poor classification performance for minority classes. In particular, there are many imbalanced data in financial fraud, network intrusion, and fault ...
Distance-based arranging oversampling technique for imbalanced data
AbstractClass imbalance data sets are common in a vast variety of real-world application areas. Synthetic minority oversampling technique (SMOTE) is an important technique for processing imbalanced data sets. SMOTE requires the user to preset the number ...
Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis
Aiming at the problem of low accuracy of classification learning algorithm caused by serious imbalance of sample set in medical diagnostic application, this paper proposes a distribution-sensitive oversampling algorithm for imbalanced data. The ...
Comments