Definition
Data are said to suffer the Class Imbalance Problem when the class distributions are highly imbalanced. In this context, many classification learning algorithms have low predictive accuracy for the infrequent class. Cost-sensitive learning is a common approach to solve this problem.
Motivation and Background
Class imbalanced datasets occur in many real-world applications where the class distributions of data are highly imbalanced. For the two-class case, without loss of generality, one assumes that the minority or rare class is the positive class, and the majority class is the negative class. Often the minority class is very infrequent, such as 1% of the dataset. If one applies most traditional (cost-insensitive) classifiers on the dataset, they are likely to predict everything as negative (the majority class). This was often regarded as a problem in learning from highly imbalanced datasets.
However, Provost (2000) describes two fundamental assumptions that are often made...
Recommended Reading
Drummond, C., & Holte, R. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the seventeenth international conference on machine learning (pp. 239–246).
Drummond, C., & Holte, R. (2005). Severe class imbalance: Why better algorithms aren’t the answer. In Proceedings of the sixteenth European conference of machine learning, LNAI (Vol. 3720, pp. 539–546).
Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–450.
Ling, C. X., & Li, C. (1998). Data mining for direct marketing – Specific problems and solutions. In Proceedings of fourth international conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 73–79).
Provost, F. (2000). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Ling, C.X., Sheng, V.S. (2011). Class Imbalance Problem. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_110
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_110
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering