ISSN: 2577-610X

 JDI Homepage
 Guidelines for Authors
 JDI Online

Subscribers: to view a paper, simply click on the title of the paper, the pdf (or ps or zip file) file will pup up on your screen. If you have any problem to access the files, please check with your librarian or contact jdi@rintonpress.com      To subscribe to JDI, please click Here.

 

Journal of Data Intelligence  ISSN: 2577-610X      published since 2020
Vol.2 No.1  March, 2021 

An Ensemble Framework of Multi-ratio Undersampling-based Imbalanced Classification (pp030-046)
        
Takahiro Komamizu, Yasuhiro Ogawa, and Katsuhiko Toyama
         
doi:
https://doi.org/10.26421/JDI2.1-2
Abstracts:  Class imbalance is commonly observed in real-world data, and it is problematic in that it degrades classification performance due to biased supervision. Undersampling is an effective resampling approach to the class imbalance. The conventional undersampling-based approaches involve a single fixed sampling ratio. However, different sampling ratios have different preferences toward classes. In this paper, an undersampling-based ensemble framework, MUEnsemble, is proposed. This framework involves weak classifiers of different sampling ratios, and it allows for a flexible design for weighting weak classifiers in different sampling ratios. To demonstrate the principle of the design, in this paper, a uniform weighting function and a Gaussian weighting function are presented. An extensive experimental evaluation shows that MUEnsemble outperforms undersampling-based and oversampling-based state-of-the-art methods in terms of recall, gmean, F-measure, and ROC-AUC metrics. Also, the evaluation showcases that the Gaussian weighting function is superior to the uniform weighting function. This indicates that the Gaussian weighting function can capture the different preferences of sampling ratios toward classes. An investigation into the effects of the parameters of the Gaussian weighting function shows that the parameters of this function can be chosen in terms of recall, which is preferred in many real-world applications.
Key words: imbalanced classification, resampling, undersampling, ensemble