Skip to main content
Log in

Imbalanced classification using support vector machine ensemble

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Imbalanced data sets often have detrimental effects on the performance of a conventional support vector machine (SVM). To solve this problem, we adopt both strategies of modifying the data distribution and adjusting the classifier. Both minority and majority classes are resampled to increase the generalization ability. For minority class, an one-class support vector machine model combined with synthetic minority oversampling technique is used to oversample the support vector instances. For majority class, we propose a new method to decompose the majority class into clusters and remove two clusters using a distance measure to lessen the effect of outliers. The remaining clusters are used to build an SVM ensemble with the oversampled minority patterns, the SVM ensemble can achieve better performance by considering potentially suboptimal solutions. Experimental results on benchmark data sets are provided to illustrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6

    Article  Google Scholar 

  2. Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1):7–19

    Article  Google Scholar 

  3. Vapnik VN (2000) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  4. Zhao XM, Li X, Chen L, Aihara K (2008) Protein classification with imbalanced data. Proteins 70(4):1125–1132

    Article  Google Scholar 

  5. Wu MR, Ye JP (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092

    Article  MathSciNet  Google Scholar 

  6. Li X, Wang L, Sung E (2008) Adaboost with svm-based component classifiers. Eng Appl Artif Intell 21(5):785–795

    Article  Google Scholar 

  7. Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    Article  Google Scholar 

  8. Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378

    Article  MATH  Google Scholar 

  9. Gardner AB, Krieger AM, Vachtsevanos G, Litt B (2006) One-class novelty detection for seizure analysis from intracranial eeg. J Mach Learn Res 7(7):1025–1044

    MathSciNet  Google Scholar 

  10. Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82

    Article  Google Scholar 

  11. Liao TW (2008) Classification of weld flaws with imbalanced class data. Expert Syst Appl 35(3):1041–1052

    Article  Google Scholar 

  12. Guo H, Viktor H (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor 6(1):30–39

    Article  Google Scholar 

  13. Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727

    Article  MathSciNet  Google Scholar 

  14. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, Nashville, pp 179–186

  15. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286

    Article  MATH  Google Scholar 

  16. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: Synthetic minority over-sampling technique. J Artif Intell Res 16(6):321–357

    MATH  Google Scholar 

  17. Liu Y, An A, Huang X (2006) Boosting prediction accuracy on imbalanced datasets with svm ensembles. In: 10th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Singapore, pp 107–118

  18. Wang HY (2008) Combination approach of smote and biased-svm for imbalanced datasets. In: Proceedings of the international joint conference on neural networks. Institute of Electrical and Electronics Engineers Inc., Hong Kong, pp 228–231

  19. Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306

    Article  Google Scholar 

  20. Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning. Springer, Pisa, pp 39–50

  21. Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5(5):975–1005

    MathSciNet  Google Scholar 

  22. Scholkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  Google Scholar 

  23. Lloyd SP (1982) Least-squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  MATH  MathSciNet  Google Scholar 

  24. Chang CC, Lin CJ (2001) Libsvm: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

  25. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publisers, San Fransisco

    MATH  Google Scholar 

  26. Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767

    Article  MATH  Google Scholar 

  27. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  MathSciNet  Google Scholar 

  28. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kauffman, San Francisco, pp 148–156

  29. Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, J., Gu, H. & Liu, W. Imbalanced classification using support vector machine ensemble. Neural Comput & Applic 20, 203–209 (2011). https://doi.org/10.1007/s00521-010-0349-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-010-0349-9

Keywords

Navigation