Abstract
In real-world applications, it has been often observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. This effect becomes especially significant on instance-based learning due to the use of some dissimilarity measure. We analyze the effects of class imbalance on the classifier performance and how the overlap has influence on such an effect, as well as on several techniques proposed in the literature to tackle the class imbalance. Besides, we study how these methods affect to the performance on both classes, not only on the minority class as usual.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)
Batista, G.E., Pratti, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6, 20–29 (2004)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Eavis, T., Japkowicz, N.: A recognition-based alternative to discrimination-based multi-layer perceptrons, In: Proc. Workshop on Learning from Imbalanced Data Sets, Technical Report WS-00-05 (2000)
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1996)
Gordon, D.F., Perlis, D.: Explicitly biased generalization. Computational Intelligence 5, 67–81 (1989)
Japkowicz, N.: Class imbalance: are we focusing on the right issue? In: Proc. Intl. Workshop on Learning from Imbalanced Data Sets II (2003)
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explorations 6, 40–49 (2004)
Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, pp. 179–186 (1997)
Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 73–79 (1998)
Orriols, A., Bernardó, E.: The class imbalance problem in learning classifier systems: a preliminary study. In: Proc. Conf. on Genetic and Evolutionary Computation, pp. 74–78 (2005)
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. 11th Intl. Conf. on Machine Learning, pp. 217–225 (1994)
Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conference on Artificial Intelligence, pp. 312–321 (2004)
Tan, S.: Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28, 667–671 (2005)
Weiss, G.M.: The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning. PhD thesis, Rutgers University (2003)
Wilson, D.L.: Asymptotic properties of nearest neighbour rules using edited data. IEEE Trans. on Systems, Man and Cybernetics 2, 408–421 (1972)
Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. International Journal of Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
García, V., Alejo, R., Sánchez, J.S., Sotoca, J.M., Mollineda, R.A. (2006). Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_45
Download citation
DOI: https://doi.org/10.1007/11875581_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)