Abstract
Several works point out class imbalance as an obstacle on applying machine learning algorithms to real world domains. However, in some cases, learning algorithms perform well on several imbalanced domains. Thus, it does not seem fair to directly correlate class imbalance to the loss of performance of learning algorithms. In this work, we develop a systematic study aiming to question whether class imbalances are truly to blame for the loss of performance of learning systems or whether the class imbalances are not a problem by themselves. Our experiments suggest that the problem is not directly caused by class imbalances, but is also related to the degree of overlapping among the classes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chawla, N., Japkowicz, N., Kolcz, A. (eds.): ICML 2003 Workshop on Learning from Imbalanced Data Sets (II) (2003), Proceedings available at http://www.site.uottawa.ca/~nat/Workshop2003/workshop2003.html
Drummond, C., Holt, R.C.: Explicity representing expected cost: An alternative to roc representation. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–207 (2000)
Ferri, C., Flach, P., Hernández-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Hoffman, C.S.A. (ed.) Nineteenth International Conference on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann Publishers, San Francisco (2002)
Hand, D.J.: Construction and Assessment of Classification Rules. John Wiley and Sons, Chichester (1997)
Japkowicz, N. (ed.): AAAI Workshop on Learning from Imbalanced Data Sets. AAAI Press, Menlo Park (2003), Techical report WS-00-05
Japkowicz, N.: Class imbalances: Are we focusing on the right issue. In: Proc. of the ICML 2003 Workshop on Learning from Imbalanced Data Sets (II) (2003)
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5), 429–450 (2002)
Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distributions. Technical Report A-2001-2, University of Tampere, Finland (2001)
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Datasets (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Provost, F.J., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44, Rutgers University, Department of Computer Science (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C. (2004). Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds) MICAI 2004: Advances in Artificial Intelligence. MICAI 2004. Lecture Notes in Computer Science(), vol 2972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24694-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-24694-7_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21459-5
Online ISBN: 978-3-540-24694-7
eBook Packages: Springer Book Archive