Skip to main content

An Improved Algorithm for SVMs Classification of Imbalanced Data Sets

  • Conference paper
Engineering Applications of Neural Networks (EANN 2009)

Abstract

Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diagnosis and text classification, for instance, small and heavily imbalanced data sets are common. In this paper, we propose the Boundary Elimination and Domination algorithm (BED) to enhance SVM class-prediction accuracy on applications with imbalanced class distributions. BED is an informative resampling strategy in input space. In order to balance the class distributions, our algorithm considers density information in training sets to remove noisy examples of the majority class and generate new synthetic examples of the minority class. In our experiments, we compared BED with original SVM and Synthetic Minority Oversampling Technique (SMOTE), a popular resampling strategy in the literature. Our results demonstrate that this new approach improves SVM classifier performance on several real world imbalanced problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boser, B.E., Guyon, I.M., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM Press, New York (1992)

    Google Scholar 

  2. Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  3. Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  4. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, London (2000)

    Book  MATH  Google Scholar 

  5. Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17, 786–795 (2005)

    Article  Google Scholar 

  6. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42, 203–231 (2001)

    Article  MATH  Google Scholar 

  7. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378 (2007)

    Article  MATH  Google Scholar 

  8. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html

  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  10. Tan, P., Steinbach, M.: Introduction to Data Mining. Addison Wesley, Reading (2006)

    Google Scholar 

  11. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  12. Egan, J.P.: Signal detection theory and ROC analysis. Academic Press, London (1975)

    Google Scholar 

  13. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6, 7–19 (2004)

    Article  Google Scholar 

  14. Karakoulas, G., Shawe-Taylor, J.: Optimizing classifiers for imbalanced training sets. In: Proceedings of Conference on Advances in Neural Information Processing Systems II, pp. 253–259. MIT Press, Cambridge (1999)

    Google Scholar 

  15. Li, Y., Shawe-Taylor, J.: The SVM with uneven margins and Chinese document categorization. In: Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, pp. 216–227 (2003)

    Google Scholar 

  16. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)

    Google Scholar 

  17. Joachims, T.: Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Norwell (2002)

    Book  Google Scholar 

  18. Cristianini, N., Shawe-Taylor, J., Kandola, J.: On kernel target aligment. In: Proceedings of the Neural Information Processing Systems NIPS 2001, pp. 367–373. MIT Press, Cambridge (2002)

    Google Scholar 

  19. Kandola, J., Shawe-Taylor, J.: Refining kernels for regression and uneven classification problems. In: Proceedings of International Conference on Artificial Intelligence and Statistics. Springer, Heidelberg (2003)

    Google Scholar 

  20. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Proceedings of European Conference on Machine Learning, pp. 39–50 (2004)

    Google Scholar 

  21. Vilariño, F., Spyridonos, P., Vitri, J., Radeva, P.: Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of International Workshop on Pattern Recognition for Crime Prevention, Security and Surveillance, pp. 783–791 (2005)

    Google Scholar 

  22. Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst., Man, Cybern. B 39, 281–288 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Castro, C.L., Carvalho, M.A., Braga, A.P. (2009). An Improved Algorithm for SVMs Classification of Imbalanced Data Sets. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds) Engineering Applications of Neural Networks. EANN 2009. Communications in Computer and Information Science, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03969-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03969-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03968-3

  • Online ISBN: 978-3-642-03969-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics