Abstract:
Classification of imbalanced datasets has become one of the most challenging problems in big data mining. Because the number of positive samples is far less than the nega...Show MoreMetadata
Abstract:
Classification of imbalanced datasets has become one of the most challenging problems in big data mining. Because the number of positive samples is far less than the negative samples, low accuracy and poor generalization performance and some other defects always go with learning process of traditional algorithms. Ensemble construction algorithm is an important method to handle this problem. Especially, the ensemble construction algorithm based on random under-sampling or clustering can effectively improve the performance of classification. However, the former causes information loss easily and the latter increases complexity. In this paper, we propose ACUS, an improved ensemble algorithm based on automatic clustering and under-sampling. ACUS conducts clustering first according to the weight of samples, and then it constructs balanced-distributed dataset which consists of a certain percentage of the majority class and all of the minority class from each cluster. With Adaboost algorithm construction, these datasets are used to get an ensemble classifier. Experimental results demonstrate the advantages of our proposed algorithm in terms of accuracy, simplicity and high stability.
Published in: 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)
Date of Conference: 09-11 December 2016
Date Added to IEEE Xplore: 19 January 2017
ISBN Information:
Electronic ISSN: 2374-9628