Abstract:
Real-world datasets are often imbalanced, with an important class having many fewer examples than other classes. In medical data, normal examples typically greatly outnum...Show MoreMetadata
Abstract:
Real-world datasets are often imbalanced, with an important class having many fewer examples than other classes. In medical data, normal examples typically greatly outnumber disease examples. A classifier learned from imbalanced data, will tend to be very good at the predicting examples in the larger (normal) class, yet the smaller (disease) class is typically of more interest. Imbalance is dealt with at the feature vector level (create synthetic feature vectors or discard some examples from the larger class) or by assigning differential costs to errors. Here, we introduce a novel method for over-sampling minority class examples at the image level, rather than the feature vector level. Our method was applied to the problem of Glioblastoma patient survival group prediction. Synthetic minority class examples were created by adding Gaussian noise to original medical images from the minority class. Uniform local binary patterns (LBP) histogram features were then extracted from the original and synthetic image examples with a random forests classifier. Experimental results show the new method (Image SMOTE) increased minority class predictive accuracy and also the AUC (area under the receiver operating characteristic curve), compared to using the imbalanced dataset directly or to creating synthetic feature vectors.
Date of Conference: 05-08 October 2017
Date Added to IEEE Xplore: 30 November 2017
ISBN Information: