Imbalanced classification using support vector machine ensemble

Tian, Jiang; Gu, Hong; Liu, Wenqi

doi:10.1007/s00521-010-0349-9

Imbalanced classification using support vector machine ensemble

Original Article
Published: 03 March 2010

Volume 20, pages 203–209, (2011)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jiang Tian¹,
Hong Gu¹ &
Wenqi Liu¹

1098 Accesses
43 Citations
3 Altmetric
Explore all metrics

Abstract

Imbalanced data sets often have detrimental effects on the performance of a conventional support vector machine (SVM). To solve this problem, we adopt both strategies of modifying the data distribution and adjusting the classifier. Both minority and majority classes are resampled to increase the generalization ability. For minority class, an one-class support vector machine model combined with synthetic minority oversampling technique is used to oversample the support vector instances. For majority class, we propose a new method to decompose the majority class into clusters and remove two clusters using a distance measure to lessen the effect of outliers. The remaining clusters are used to build an SVM ensemble with the oversampled minority patterns, the SVM ensemble can achieve better performance by considering potentially suboptimal solutions. Experimental results on benchmark data sets are provided to illustrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6
Article Google Scholar
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1):7–19
Article Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Zhao XM, Li X, Chen L, Aihara K (2008) Protein classification with imbalanced data. Proteins 70(4):1125–1132
Article Google Scholar
Wu MR, Ye JP (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092
Article MathSciNet Google Scholar
Li X, Wang L, Sung E (2008) Adaboost with svm-based component classifiers. Eng Appl Artif Intell 21(5):785–795
Article Google Scholar
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Article Google Scholar
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Article MATH Google Scholar
Gardner AB, Krieger AM, Vachtsevanos G, Litt B (2006) One-class novelty detection for seizure analysis from intracranial eeg. J Mach Learn Res 7(7):1025–1044
MathSciNet Google Scholar
Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82
Article Google Scholar
Liao TW (2008) Classification of weld flaws with imbalanced class data. Expert Syst Appl 35(3):1041–1052
Article Google Scholar
Guo H, Viktor H (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor 6(1):30–39
Article Google Scholar
Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727
Article MathSciNet Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, Nashville, pp 179–186
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Article MATH Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: Synthetic minority over-sampling technique. J Artif Intell Res 16(6):321–357
MATH Google Scholar
Liu Y, An A, Huang X (2006) Boosting prediction accuracy on imbalanced datasets with svm ensembles. In: 10th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Singapore, pp 107–118
Wang HY (2008) Combination approach of smote and biased-svm for imbalanced datasets. In: Proceedings of the international joint conference on neural networks. Institute of Electrical and Electronics Engineers Inc., Hong Kong, pp 228–231
Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306
Article Google Scholar
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning. Springer, Pisa, pp 39–50
Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5(5):975–1005
MathSciNet Google Scholar
Scholkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article Google Scholar
Lloyd SP (1982) Least-squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Article MATH MathSciNet Google Scholar
Chang CC, Lin CJ (2001) Libsvm: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publisers, San Fransisco
MATH Google Scholar
Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767
Article MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH MathSciNet Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kauffman, San Francisco, pp 148–156
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Dalian University of Technology, Dalian, China
Jiang Tian, Hong Gu & Wenqi Liu

Authors

Jiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gu
View author publications
You can also search for this author in PubMed Google Scholar
Wenqi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, J., Gu, H. & Liu, W. Imbalanced classification using support vector machine ensemble. Neural Comput & Applic 20, 203–209 (2011). https://doi.org/10.1007/s00521-010-0349-9

Download citation

Received: 25 July 2009
Accepted: 10 February 2010
Published: 03 March 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s00521-010-0349-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced classification using support vector machine ensemble

Abstract

Access this article

Similar content being viewed by others

Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE

An Ensemble Method Based on SVC and Euclidean Distance for Classification Binary Imbalanced Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Imbalanced classification using support vector machine ensemble

Abstract

Access this article

Similar content being viewed by others

Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE

An Ensemble Method Based on SVC and Euclidean Distance for Classification Binary Imbalanced Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation