ABSTRACT
In recent years, the imbalanced classification problem has received much attention. SMOTE is one of the most popular methods to improve the performance of unbalanced data classification models. SMOTE changes the data distribution of unbalanced data sets by adding a few generated class samples, but the SMOTE algorithm has some limitations of its own, which may lead to problems such as the generated samples are noisy, the generated samples aggravate the boundary blurring, etc., which are especially obvious in the presence of samples with label noise. Granular-ball computing is an efficient, robust and scalable modeling method developed in the field of granular computing in recent years, and we can obtain clear decision boundaries by dividing data sets through granular-ball. Accordingly, this paper proposes a method, called Granular-ball SMOTE(GBSMOTE),to solve the above problems by first dividing the data set by granular-ball computing and then using SMOTE oversampling inside the granular-ball. The experimental results show the effectiveness of the proposed method, which is more prominent in the samples with label noise.
- [1] Bennin K., Keung J., Phannachitta P., et al. MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction[J]. IEEE Transactions on Software Engineering, 2018: 1-1.Google Scholar
- [2] Jalal A., Nazia H., Mohammad S. Credit card fraud detection using data pre-processing on imbalanced data-both oversampling and under-sampling[C]//Proceedings of the 2020 ICCA International Conference on Computing Advancements, 2020, 68: 1-4.Google Scholar
- [3] Xiao Yawen, Wu Jun, Lin Zongli. Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data[J]. Computers in Biology and Medicine, 2021, 135: Article ID 104540.Google ScholarDigital Library
- [4] Guan Donghai, Wei Hongqiang, Yuan Weiwei, et al. Improving label noise filtering by exploiting unlabeled data[J]. IEEE Access, 2018: 11154-11165.Google Scholar
- [5] K. H. Lee, X. He, L. Zhang and L. Yang, "CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 5447-5456, doi: 10.1109/CVPR.2018.00571.Google ScholarCross Ref
- [6] D. Devi, S. K. Biswas and B. Purkayastha, "A Boosting based Adaptive Oversampling Technique for Treatment of Class Imbalance," 2019 International Conference on Computer Communication and Informatics (ICCCI), 2019, pp. 1-7, doi: 10.1109/ICCCI.2019.8821947.Google ScholarCross Ref
- [7] Chawla N., Bowyer K., Hall L., et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.Google ScholarCross Ref
- [8] Yao Yiyu. Perspectives of granular computing[C]//2005 IEEE International Conference on Granular Computing, 2005, 1: 85-90.Google Scholar
- [9] Xia Shuyin, Liu Yusheng, Ding Xin, et al. Granular ball computing classifiers for efficient, scalable and robust learning[J]. Information Sciences, 2019, 483: 136-152.Google ScholarDigital Library
- [10] Xia Shuyin, Peng Daowan, Meng Deyu, et al. A fast adaptive k-means with no bounds[J]. IEEE transactions on Pattern Analysis and Machine Intelligence, 2022, 44(01): 87-89.Google Scholar
- [11] Xia Shuyin, Zhang Z, Li W, et al. GBNRS: A novel rough set algorithm for fast adaptive attribute reduction in classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2020.Google Scholar
- [12] Xia Shuyin, Zheng S, Wang G, et al. Granular ball sampling for noisy label classification or imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021.Google Scholar
- [13] Han Hui, Wang Wenyuan, Mao Binghuan. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[C]//Proceedings of International Conference on Intelligent Computing. Hefei, China, 2005: 878-887.Google Scholar
- [14] Bunkhumpornpat C., Sinapiromsaran K., Lursinsap C. Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Bangkok, Thailand, 2009: 475-482.Google Scholar
- [15] He Haibo, Bai Yang, Garcia E., et al. ADASYN: Aaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of 2008 IEEE International Joint Conference on Neural Networks. Hong Kong, China, 2008: 1322-1328.Google Scholar
- [16] Chen Baiyun, Xia Shuyin, Chen Zizhong, et al. RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise[J]. Information Sciences, 2021, 553: 397-428.Google ScholarCross Ref
- [17] Batista G., Prati R., Monard M. A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6(1): 20-29.Google ScholarDigital Library
- [18] Douzas G., Bacao F., Last F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information Sciences, 2018, 465: 1-20.Google ScholarDigital Library
- [19] M. Sokolova, G. Lapalme. A systematic analysis of performance measures for classification tasks[J]. Information processing & management, 2009, 45(4): 427-437.Google Scholar
Index Terms
- GBSMOTE: A Robust Sampling Method Based on Granular-ball Computing and SMOTE for Class Imbalance
Recommendations
A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance
Computational Science – ICCS 2019AbstractClass Imbalance problems are often encountered in many applications. Such problems occur whenever a class is under-represented, has a few data points, compared to other classes. However, this minority class is usually a significant one. One ...
Applying Threshold SMOTE Algoritwith Attribute Bagging to Imbalanced Datasets
Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology - Volume 8171Synthetic minority over-sampling technique SMOTE is an effective over-sampling technique and specifically designed for learning from imbalanced data sets. However, in the process of synthetic sample generation, SMOTE is of some blindness. This paper ...
Comments