An Improved Algorithm for SVMs Classification of Imbalanced Data Sets

Castro, Cristiano Leite; Carvalho, Mateus Araujo; Braga, Antônio Padua

doi:10.1007/978-3-642-03969-0_11

Cristiano Leite Castro⁴,
Mateus Araujo Carvalho⁴ &
Antônio Padua Braga⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 43))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

1436 Accesses
6 Citations

Abstract

Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diagnosis and text classification, for instance, small and heavily imbalanced data sets are common. In this paper, we propose the Boundary Elimination and Domination algorithm (BED) to enhance SVM class-prediction accuracy on applications with imbalanced class distributions. BED is an informative resampling strategy in input space. In order to balance the class distributions, our algorithm considers density information in training sets to remove noisy examples of the majority class and generate new synthetic examples of the minority class. In our experiments, we compared BED with original SVM and Synthetic Minority Oversampling Technique (SMOTE), a popular resampling strategy in the literature. Our results demonstrate that this new approach improves SVM classifier performance on several real world imbalanced problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boser, B.E., Guyon, I.M., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM Press, New York (1992)
Google Scholar
Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)
Book MATH Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, London (2000)
Book MATH Google Scholar
Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17, 786–795 (2005)
Article Google Scholar
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42, 203–231 (2001)
Article MATH Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378 (2007)
Article MATH Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Tan, P., Steinbach, M.: Introduction to Data Mining. Addison Wesley, Reading (2006)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Egan, J.P.: Signal detection theory and ROC analysis. Academic Press, London (1975)
Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6, 7–19 (2004)
Article Google Scholar
Karakoulas, G., Shawe-Taylor, J.: Optimizing classifiers for imbalanced training sets. In: Proceedings of Conference on Advances in Neural Information Processing Systems II, pp. 253–259. MIT Press, Cambridge (1999)
Google Scholar
Li, Y., Shawe-Taylor, J.: The SVM with uneven margins and Chinese document categorization. In: Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, pp. 216–227 (2003)
Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)
Google Scholar
Joachims, T.: Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Norwell (2002)
Book Google Scholar
Cristianini, N., Shawe-Taylor, J., Kandola, J.: On kernel target aligment. In: Proceedings of the Neural Information Processing Systems NIPS 2001, pp. 367–373. MIT Press, Cambridge (2002)
Google Scholar
Kandola, J., Shawe-Taylor, J.: Refining kernels for regression and uneven classification problems. In: Proceedings of International Conference on Artificial Intelligence and Statistics. Springer, Heidelberg (2003)
Google Scholar
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Proceedings of European Conference on Machine Learning, pp. 39–50 (2004)
Google Scholar
Vilariño, F., Spyridonos, P., Vitri, J., Radeva, P.: Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of International Workshop on Pattern Recognition for Crime Prevention, Security and Surveillance, pp. 783–791 (2005)
Google Scholar
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst., Man, Cybern. B 39, 281–288 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics Engineering, Federal University of Minas Gerais, Av. Antônio Carlos, 6.627 Campus UFMG - Pampulha, 30.161-970, Belo Horizonte, MG, Brasil
Cristiano Leite Castro, Mateus Araujo Carvalho & Antônio Padua Braga

Authors

Cristiano Leite Castro
View author publications
You can also search for this author in PubMed Google Scholar
Mateus Araujo Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Antônio Padua Braga
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computing, London Metropolitan University, 166-220 Holloway Road, N7 8DB, London, UK
Dominic Palmer-Brown
School of Computing, IT and Engineering, University of East London, Docklands Campus, 4-6 University Way, E16 2RD, London, UK
Chrisina Draganova & Haris Mouratidis &
School of Computing, IT and Engineering, University of East London, London, UK
Elias Pimenidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castro, C.L., Carvalho, M.A., Braga, A.P. (2009). An Improved Algorithm for SVMs Classification of Imbalanced Data Sets. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds) Engineering Applications of Neural Networks. EANN 2009. Communications in Computer and Information Science, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03969-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-03969-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03968-3
Online ISBN: 978-3-642-03969-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics