Abstract
Unbalanced data in a classification problem appears when there are many more instances of some classes than others. Several solutions were proposed to solve this problem at data level by under-sampling. The aim of this work is to propose evolutionary prototype selection algorithms that tackle the problem of unbalanced data by using a new fitness function. The results obtained show that a balancing of data performed by evolutionary under-sampling outperforms previously proposed under-sampling methods in classification accuracy, obtaining reduced subsets and getting a good balance on data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6, 1–6 (2004)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence and Research 16, 321–357 (2002)
Tan, S.: Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28, 667–671 (2005)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 20–29 (2004)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003)
Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Transactions on Evolutionary Computation 7, 561–575 (2003)
Eshelman, L.J.: The CHC adaptative search algorithm: How to safe search when engaging in nontraditional genetic recombination. In: FOGA, pp. 265–283 (1990)
Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report, Pittsburgh, PA, USA (1994)
Tomek, I.: Two modifications of CNN. IEEE Transactions on Systems, Man, and Communications 6, 769–772 (1976)
Hart, P.E.: The condensed nearest neighbour rule. IEEE Transactions on Information Theory 18, 515–516 (1968)
Kubat, M., Matwin, S.: Addressing the course of imbalanced training sets: Onesided selection. In: ICML, pp. 179–186 (1997)
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2, 408–421 (1972)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 7, 37–66 (1991)
Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)
Newman, D.J., Hettich, S., Merz, C.B.: UCI repository of machine learning databases (1998)
Demśar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Wilcoxon, F.: Individual comparisons by rankings methods. Biometrics 1, 80–83 (1945)
Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
García, S., Cano, J.R., Fernández, A., Herrera, F. (2006). A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_168
Download citation
DOI: https://doi.org/10.1007/11875581_168
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)