Authors:
Yuxuan Yang
;
Hadi Khorshidi
and
Uwe Aickelin
Affiliation:
School of Computing and Information Systems, The University of Melbourne, Grattan Street, Parkville, Victoria, Australia
Keyword(s):
Over-sampling, Diversity Optimisation, Genetic Algorithm, Imbalanced Data, Clustering.
Abstract:
In many real-life classification tasks, the issue of imbalanced data is commonly observed. The workings of mainstream machine learning algorithms typically assume the classes amongst underlying datasets are relatively well-balanced. The failure of this assumption can lead to a biased representation of the models’ performance. This has encouraged the incorporation of re-sampling techniques to generate more balanced datasets. However, mainstream re-sampling methods fail to account for the distribution of minority data and the diversity within generated instances. Therefore, in this paper, we propose a data-generation algorithm, Cluster-based Diversity Over-sampling (CDO), to consider minority instance distribution during the process of data generation. Diversity optimisation is utilised to promote diversity within the generated data. We have conducted extensive experiments on synthetic and real-world datasets to evaluate the performance of CDO in comparison with SMOTE-based and diversi
ty-based methods (DADO, DIWO, BL-SMOTE, DB-SMOTE, and MAHAKIL). The experiments show the superiority of CDO.
(More)