Authors:
Nada Boudegzdame
1
;
Karima Sedki
1
;
Rosy Tspora
2
;
3
;
4
and
Jean-Baptiste Lamy
1
Affiliations:
1
LIMICS, INSERM, Université Sorbonne Paris Nord, Sorbonne Université, France
;
2
INSERM, Université de Paris Cité, Sorbonne Université, Cordeliers Research Center, France
;
3
HeKA, INRIA, France
;
4
Department of Medical Informatics, Hôpital Européen Georges-Pompidou, AP-HP, France
Keyword(s):
Imbalanced Data, Oversampling, SMOTE, Class Imbalance, Data Augmentation, Machine Learning, Neural Networks, Synthetic Data, Synthetic Sample Detector, Generative Adversarial Networks.
Abstract:
Oversampling algorithms are commonly used in machine learning to address class imbalance by generating new synthetic samples of the minority class. While oversampling can improve classification models’ performance on minority classes, our research reveals that models often learn to detect noise generated by oversampling algorithms rather than the underlying patterns. To overcome this issue, this article proposes a method that involves identifying and filtering unrealistic synthetic data, using advanced technique such a neural network for detecting unrealistic synthetic data samples. This aims to enhance the quality of the oversampled datasets and improve machine learning models’ ability to uncover genuine patterns. The effectiveness of the proposed approach is thoroughly examined and evaluated, demonstrating enhanced model performance.