ABSTRACT
The class imbalance problem is frequently found in many real-world domains, where many of traditional classifiers often fail to detect minority class objects due to paying less attention to those. In an effort to address this class imbalance problem, a new under-sampling technique GAUS (genetic algorithm based under-sampling) is proposed in this paper. GAUS is designed to overcome several limitations of existing methods such as performance instability and information loss of data distribution. To select informative majority objects, GAUS tries to maximize the performance of a prototype classifier such that the prototypes minimize the loss between distributions of original and undersampled majority objects. We confirmed the effectiveness of the proposed GAUS based on real-world datasets.
- Tomczak, J. M., and Zięa, M. 2015. Probabilistic combination of classification rules and its application to medical diagnosis. Machine Learning, 101, 1--3, 105--135. Google ScholarDigital Library
- Sahin, Y., Bulkan, S., and Duman, E. 2013. A cost-sensitive decision tree approach for fraud detection. Expert Systems with Applications, 40, 15, 5916--5923. Google ScholarDigital Library
- Wang, J., Fu, W., Lu, H., and Ma, S. 2014. Bilayer Sparse Topic Model for Scene Analysis in Imbalanced Surveillance Videos. Image Processing, IEEE Transactions on, 23, 12, 5198--5208.Google Scholar
- Murphey, Y. L., Chen, Z. H., and Feldkamp, L. A. 2008. An incremental neural learning framework and its application to vehicle diagnostics. Applied Intelligence, 28, 1, 29--49. Google ScholarDigital Library
- He, H., and Garcia, E. 2009. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21, 9, 1263--1284. Google ScholarDigital Library
- Garcı, S., Triguero, I., Carmona, C. J., and Herrera, F. 2012. Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems, 25, 1, 3--12. Google ScholarDigital Library
- Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., and Herrera, F. 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42, 4, 463--484. Google ScholarDigital Library
- Lee, J. S., and Zhu, D. 2011. When Costs Are Unequal and Unknown: A Subtree Grafting Approach for Unbalanced Data Classification*. Decision Sciences, 42, 4, 803--829.Google ScholarCross Ref
- Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap, C. 2012. DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence, 36, 3, 664--684. Google ScholarDigital Library
- Batista, G. E., Prati, R. C., and Monard, M. C. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6, 1, 20--29. Google ScholarDigital Library
- Mani, I., and Zhang, I. 2003. kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of the 20th International Conference on Machine Learning Workshop on Learning from imbalanced Data Sets (Washington, USA., August 21-24, 2003)Google Scholar
- Prati, R. C., Batista, G. E., and Monard, M. C. 2009. Data mining with imbalanced class distributions: concepts and methods. In Proceedings of the 4th Indian International Conference on Artificial Intelligence, (Tumkur, Karnataka, India, December 16--18, 2009). 359--376.Google Scholar
- Cateni, S., Colla, V., and Vannucci, M. 2014. A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing, 135, 32--41.Google ScholarDigital Library
- Bradley, A. P. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30, 7, 1145--1159. Google ScholarDigital Library
- Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., and Herrera, F. 2010. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17. 2--3, 255--287.Google Scholar
Index Terms
- A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification
Recommendations
Over-sampling via under-sampling in strongly imbalanced data
Classification of imbalanced datasets is an important challenge in machine learning. This investigation analysed the effect of ratio imbalance and the selected classifier on the application of several re-sampling strategies to deal with imbalanced ...
Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve ...
KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
AbstractImbalanced learning has become a research emphasis in recent years because of the growing number of class-imbalance classification problems in real applications. It is particularly challenging when the imbalanced rate is very high. Sampling, ...
Comments