Skip to main content

Learning from Imbalanced Sets through Resampling and Weighting

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2652))

Included in the following conference series:

Abstract

The problem of imbalanced training sets in supervised pattern recognition methods is receiving growing attention. Imbalanced training sample means that one class is represented by a large number of examples while the other is represented by only a few. It has been observed that this situation, which arises in several practical situations, may produce an important deterioration of the classification accuracy, in particular with patterns belonging to the less represented classes. In the present paper, we introduce a new approach to design an instance-based classifier in such imbalanced environments.

Partially supported by grants 32016-A (Mexican CONACyT), TIC2000-1703-C03-03 (Spanish CICYT), and P1-1B2002-07 (Fundació Caixa Castelló-Bancaixa).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barandela, R., Cortès, N., Palacios, A.: The nearest neighbor rule and the reduction of the training sample size. In: Proc. 9th Spanish Symp. on Pattern Recognition and Image Analysis, vol. 1, pp. 103–108 (2001)

    Google Scholar 

  2. Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)

    Article  Google Scholar 

  3. Chan, P., Stolfo, S.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, pp. 164–168 (1998)

    Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2000)

    Article  Google Scholar 

  5. Chen, C.H., Józwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recognition Letters 17, 819–823 (1996)

    Article  Google Scholar 

  6. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Information Theory 13, 21–27 (1967)

    Article  Google Scholar 

  7. Dasarathy, B.V.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamos (1991)

    Google Scholar 

  8. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Int. Conf. on Knowledge Discovery and Data Mining, pp. 155–164 (1999)

    Google Scholar 

  9. Dudani, S.A.: The distance-weighted k-nearest neighbor rule. IEEE Trans. on Systems, Man, and Cybernetics 6, 325–327 (1976)

    Article  Google Scholar 

  10. Eavis, T., Japkowicz, N.: A recognition-based alternative to discrimination-based multi-layer perceptrons. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 280–292. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Ezawa, K.J., Singh, M., Norton, S.W.: Learning goal oriented Bayesian networks for telecommunications management. In: Proc. 13th Int. Conf. on Machine Learning, pp. 139–147 (1996)

    Google Scholar 

  12. Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1996)

    Article  Google Scholar 

  13. Ferri, F.J., Sánchez, J.S., Pla, F.: Editing prototypes in the finite sample size case using alternative neighbourhoods. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 620–629. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Gordon, D.F., Perlis, D.: Explicitly biased generalization. Computational Intelligence 5, 67–81 (1989)

    Article  Google Scholar 

  15. Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. on Information Theory 14, 515–516 (1968)

    Article  Google Scholar 

  16. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: onesided selection. In: Proc. 14th Int. Conf. on Machine Learning, pp. 179–186 (1997)

    Google Scholar 

  17. Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30, 195–215 (1998)

    Article  Google Scholar 

  18. Lewis, D., Catlett, J.: Heterogeneous uncertainity sampling for supervised learning. In: Proc. 11th Int. Conf. on Machine Learning, pp. 148–156 (1994)

    Chapter  Google Scholar 

  19. Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, pp. 73–79 (1998)

    Google Scholar 

  20. Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive Bayes. In: Proc. 16th Int. Conf. on Machine Learning, pp. 258–267 (1999)

    Google Scholar 

  21. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. 11th Int. Conf. on Machine Learning, pp. 217–225 (1994)

    Chapter  Google Scholar 

  22. Swets, J., Dawes, R., Monahan, J.: Better decisions through science. Scientific American, 82–87 (2000)

    Article  Google Scholar 

  23. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data sets. IEEE Trans. on Systems, Man and Cybernetics 2, 408–421 (1972)

    Article  MathSciNet  Google Scholar 

  24. Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. International Journal of Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barandela, R., Sánchez, J.S., García, V., Ferri, F.J. (2003). Learning from Imbalanced Sets through Resampling and Weighting. In: Perales, F.J., Campilho, A.J.C., de la Blanca, N.P., Sanfeliu, A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2003. Lecture Notes in Computer Science, vol 2652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44871-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-44871-6_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40217-6

  • Online ISBN: 978-3-540-44871-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics