Skip to main content

Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2006 (IDEAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

In real-world applications, it has been often observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. This effect becomes especially significant on instance-based learning due to the use of some dissimilarity measure. We analyze the effects of class imbalance on the classifier performance and how the overlap has influence on such an effect, as well as on several techniques proposed in the literature to tackle the class imbalance. Besides, we study how these methods affect to the performance on both classes, not only on the minority class as usual.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)

    Article  Google Scholar 

  • Batista, G.E., Pratti, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6, 20–29 (2004)

    Article  Google Scholar 

  • Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  • Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 155–164 (1999)

    Google Scholar 

  • Eavis, T., Japkowicz, N.: A recognition-based alternative to discrimination-based multi-layer perceptrons, In: Proc. Workshop on Learning from Imbalanced Data Sets, Technical Report WS-00-05 (2000)

    Google Scholar 

  • Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1996)

    Article  Google Scholar 

  • Gordon, D.F., Perlis, D.: Explicitly biased generalization. Computational Intelligence 5, 67–81 (1989)

    Article  Google Scholar 

  • Japkowicz, N.: Class imbalance: are we focusing on the right issue? In: Proc. Intl. Workshop on Learning from Imbalanced Data Sets II (2003)

    Google Scholar 

  • Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explorations 6, 40–49 (2004)

    Article  Google Scholar 

  • Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, pp. 179–186 (1997)

    Google Scholar 

  • Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 73–79 (1998)

    Google Scholar 

  • Orriols, A., Bernardó, E.: The class imbalance problem in learning classifier systems: a preliminary study. In: Proc. Conf. on Genetic and Evolutionary Computation, pp. 74–78 (2005)

    Google Scholar 

  • Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. 11th Intl. Conf. on Machine Learning, pp. 217–225 (1994)

    Google Scholar 

  • Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conference on Artificial Intelligence, pp. 312–321 (2004)

    Google Scholar 

  • Tan, S.: Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28, 667–671 (2005)

    Article  Google Scholar 

  • Weiss, G.M.: The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning. PhD thesis, Rutgers University (2003)

    Google Scholar 

  • Wilson, D.L.: Asymptotic properties of nearest neighbour rules using edited data. IEEE Trans. on Systems, Man and Cybernetics 2, 408–421 (1972)

    Article  MATH  Google Scholar 

  • Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. International Journal of Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

García, V., Alejo, R., Sánchez, J.S., Sotoca, J.M., Mollineda, R.A. (2006). Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_45

Download citation

  • DOI: https://doi.org/10.1007/11875581_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics