skip to main content
10.1145/3594300.3594304acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmaiConference Proceedingsconference-collections
research-article

GBSMOTE: A Robust Sampling Method Based on Granular-ball Computing and SMOTE for Class Imbalance

Authors Info & Claims
Published:13 September 2023Publication History

ABSTRACT

In recent years, the imbalanced classification problem has received much attention. SMOTE is one of the most popular methods to improve the performance of unbalanced data classification models. SMOTE changes the data distribution of unbalanced data sets by adding a few generated class samples, but the SMOTE algorithm has some limitations of its own, which may lead to problems such as the generated samples are noisy, the generated samples aggravate the boundary blurring, etc., which are especially obvious in the presence of samples with label noise. Granular-ball computing is an efficient, robust and scalable modeling method developed in the field of granular computing in recent years, and we can obtain clear decision boundaries by dividing data sets through granular-ball. Accordingly, this paper proposes a method, called Granular-ball SMOTE(GBSMOTE),to solve the above problems by first dividing the data set by granular-ball computing and then using SMOTE oversampling inside the granular-ball. The experimental results show the effectiveness of the proposed method, which is more prominent in the samples with label noise.

References

  1. [1] Bennin K., Keung J., Phannachitta P., et al. MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction[J]. IEEE Transactions on Software Engineering, 2018: 1-1.Google ScholarGoogle Scholar
  2. [2] Jalal A., Nazia H., Mohammad S. Credit card fraud detection using data pre-processing on imbalanced data-both oversampling and under-sampling[C]//Proceedings of the 2020 ICCA International Conference on Computing Advancements, 2020, 68: 1-4.Google ScholarGoogle Scholar
  3. [3] Xiao Yawen, Wu Jun, Lin Zongli. Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data[J]. Computers in Biology and Medicine, 2021, 135: Article ID 104540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Guan Donghai, Wei Hongqiang, Yuan Weiwei, et al. Improving label noise filtering by exploiting unlabeled data[J]. IEEE Access, 2018: 11154-11165.Google ScholarGoogle Scholar
  5. [5] K. H. Lee, X. He, L. Zhang and L. Yang, "CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 5447-5456, doi: 10.1109/CVPR.2018.00571.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] D. Devi, S. K. Biswas and B. Purkayastha, "A Boosting based Adaptive Oversampling Technique for Treatment of Class Imbalance," 2019 International Conference on Computer Communication and Informatics (ICCCI), 2019, pp. 1-7, doi: 10.1109/ICCCI.2019.8821947.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chawla N., Bowyer K., Hall L., et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Yao Yiyu. Perspectives of granular computing[C]//2005 IEEE International Conference on Granular Computing, 2005, 1: 85-90.Google ScholarGoogle Scholar
  9. [9] Xia Shuyin, Liu Yusheng, Ding Xin, et al. Granular ball computing classifiers for efficient, scalable and robust learning[J]. Information Sciences, 2019, 483: 136-152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Xia Shuyin, Peng Daowan, Meng Deyu, et al. A fast adaptive k-means with no bounds[J]. IEEE transactions on Pattern Analysis and Machine Intelligence, 2022, 44(01): 87-89.Google ScholarGoogle Scholar
  11. [11] Xia Shuyin, Zhang Z, Li W, et al. GBNRS: A novel rough set algorithm for fast adaptive attribute reduction in classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2020.Google ScholarGoogle Scholar
  12. [12] Xia Shuyin, Zheng S, Wang G, et al. Granular ball sampling for noisy label classification or imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021.Google ScholarGoogle Scholar
  13. [13] Han Hui, Wang Wenyuan, Mao Binghuan. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[C]//Proceedings of International Conference on Intelligent Computing. Hefei, China, 2005: 878-887.Google ScholarGoogle Scholar
  14. [14] Bunkhumpornpat C., Sinapiromsaran K., Lursinsap C. Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Bangkok, Thailand, 2009: 475-482.Google ScholarGoogle Scholar
  15. [15] He Haibo, Bai Yang, Garcia E., et al. ADASYN: Aaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of 2008 IEEE International Joint Conference on Neural Networks. Hong Kong, China, 2008: 1322-1328.Google ScholarGoogle Scholar
  16. [16] Chen Baiyun, Xia Shuyin, Chen Zizhong, et al. RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise[J]. Information Sciences, 2021, 553: 397-428.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Batista G., Prati R., Monard M. A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6(1): 20-29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Douzas G., Bacao F., Last F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information Sciences, 2018, 465: 1-20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] M. Sokolova, G. Lapalme. A systematic analysis of performance measures for classification tasks[J]. Information processing & management, 2009, 45(4): 427-437.Google ScholarGoogle Scholar

Index Terms

  1. GBSMOTE: A Robust Sampling Method Based on Granular-ball Computing and SMOTE for Class Imbalance
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICMAI '23: Proceedings of the 2023 8th International Conference on Mathematics and Artificial Intelligence
          April 2023
          106 pages
          ISBN:9781450399982
          DOI:10.1145/3594300

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 September 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)34
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format