Skip to main content

A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2006 (IDEAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

Unbalanced data in a classification problem appears when there are many more instances of some classes than others. Several solutions were proposed to solve this problem at data level by under-sampling. The aim of this work is to propose evolutionary prototype selection algorithms that tackle the problem of unbalanced data by using a new fitness function. The results obtained show that a balancing of data performed by evolutionary under-sampling outperforms previously proposed under-sampling methods in classification accuracy, obtaining reduced subsets and getting a good balance on data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6, 1–6 (2004)

    Article  Google Scholar 

  • Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence and Research 16, 321–357 (2002)

    MATH  Google Scholar 

  • Tan, S.: Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28, 667–671 (2005)

    Article  Google Scholar 

  • Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 20–29 (2004)

    Article  Google Scholar 

  • Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)

    Article  MATH  Google Scholar 

  • Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  • Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Transactions on Evolutionary Computation 7, 561–575 (2003)

    Article  Google Scholar 

  • Eshelman, L.J.: The CHC adaptative search algorithm: How to safe search when engaging in nontraditional genetic recombination. In: FOGA, pp. 265–283 (1990)

    Google Scholar 

  • Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report, Pittsburgh, PA, USA (1994)

    Google Scholar 

  • Tomek, I.: Two modifications of CNN. IEEE Transactions on Systems, Man, and Communications 6, 769–772 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  • Hart, P.E.: The condensed nearest neighbour rule. IEEE Transactions on Information Theory 18, 515–516 (1968)

    Article  Google Scholar 

  • Kubat, M., Matwin, S.: Addressing the course of imbalanced training sets: Onesided selection. In: ICML, pp. 179–186 (1997)

    Google Scholar 

  • Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  • Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2, 408–421 (1972)

    Article  MATH  Google Scholar 

  • Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 7, 37–66 (1991)

    Google Scholar 

  • Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)

    Article  Google Scholar 

  • Newman, D.J., Hettich, S., Merz, C.B.: UCI repository of machine learning databases (1998)

    Google Scholar 

  • Demśar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    Google Scholar 

  • Wilcoxon, F.: Individual comparisons by rankings methods. Biometrics 1, 80–83 (1945)

    Article  Google Scholar 

  • Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (1997)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

García, S., Cano, J.R., Fernández, A., Herrera, F. (2006). A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_168

Download citation

  • DOI: https://doi.org/10.1007/11875581_168

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics