Constructing support vector machine ensemble with segmentation for imbalanced datasets

Li, Qian; Yang, Bing; Li, Yi; Deng, Naiyang; Jing, Ling

doi:10.1007/s00521-012-1041-z

Constructing support vector machine ensemble with segmentation for imbalanced datasets

Original Article
Published: 10 July 2012

Volume 22, pages 249–256, (2013)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Qian Li¹,
Bing Yang¹,
Yi Li²,
Naiyang Deng¹ &
…
Ling Jing¹

585 Accesses
16 Citations
Explore all metrics

Abstract

A novel method, namely ensemble support vector machine with segmentation (SeEn–SVM), for the classification of imbalanced datasets is proposed in this paper. In particular, vector quantization algorithm is used to segment the majority class and hence generates some small datasets that are of less imbalance than original one, and two different weighted functions are proposed to integrate all the results of basic classifiers. The goal of the SeEn–SVM algorithm is to improve the prediction accuracy of the minority class, which is more interesting for people. The SeEn–SVM is applied to six UCI datasets, and the results confirmed its better performance than previously proposed methods for imbalance problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Ensemble Method Based on SVC and Euclidean Distance for Classification Binary Imbalanced Data

Imbalanced Data Classification Method Based on Ensemble Learning

Benchmarking framework for class imbalance problem using novel sampling approach for big data

Article 13 June 2019

Khyati Ahlawat, Anuradha Chug & Amit Prakash Singh

References

Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6
Article Google Scholar
Japkowicz N (2000) Learning from imbalanced data sets: a comparison of various strategies. In: Proceedings of the AAAI’2000 workshop on learning from imbalanced data sets, pp 10–15
Chawla NV, Japkowicz N, Kolcz A (Eds.) (2003) In: Proceedings of the ICML’2003 workshop on learning from imbalanced data sets
Chawla NV, Japkowicz N, Zhou ZH (2009) In: PAKDD’2009 workshop: data mining when classes are imbalanced and errors have costs, Thailand
Nguwi YY, Cho SY (2010) An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst Appl 37(12):8303–8312
Article Google Scholar
Tian J, Gu H, Liu WQ (2011) Imbalanced classification using support vector machine ensemble. Neural Comput Appl 20(2):203–209
Article Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. In:Proceedings of the fourteenth international conference on machine learning, pp 179–186
Chawla NV, Bowyer K, Hall L, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Domingos P (1999) MetaCost: a general method for making classifiers cost sensitive. In: Proceedings of the fifth international conference on knowledge discovery and data mining, pp 155–164
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 973–978
Drummond C, Holte RC (2003) C4.5, class imbalance and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, held in conjunction with ICML 2003
Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Leran Res 2(2):139–154
Google Scholar
Raskutti B, Kowalczyk A (2003) Extreme re-balancing for SVMs: a case study. In: Workshop on learning from imbalanced data sets II, international conference on machine learning
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn 20:273–297
MATH Google Scholar
Deng NY, Tian YJ, Zhang CH (2012) Support vector machines: theory, algorithms, and extensions. CRC Press (in press)
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of ECML 2004. LNCS (LNAI), 3201, pp 39–50
Yang CY, Wang JJ, Yang JS and Yu GD (2008) Imbalanced SVM learning with margin compensation, In: Proceedings of ISNN 2008, Part I, LNCS 5263, pp 636–644
Benjamin X, Wang, Japkowicz N (2008) Boosting support vector machines for imbalanced data sets. In: Proceedings of ISMIS 2008, LNAI 4994, pp 38–47
Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. Advances in neural information processing systems 7. MIT Press, Cambridge, MA, pp 231–238
Google Scholar
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263
Article MathSciNet MATH Google Scholar
Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer, Dordrecht
Book MATH Google Scholar
Yu T, Debenham J, Jan T, Simoff S (2006) Combine vector quantization and support vector machine for imbalanced datasets, In: TFTP international federation for information processing, 2006, pp 217–227
Zhao XM, Wang Y, Chen LN, Kazuyuki A (2008) Gene function prediction using labeled and unlabeled data. BMC Bioinform 9:57–62
Article Google Scholar
Dror G, Sorek R, Shamir R (2005) Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21(7):897–901
Article Google Scholar
Ning H, Yang B, Cui J, Jing L (2009) Detection of horizontal gene transfer in bacterial genomes. In: Proceedings of the third international symposium on optimization and systems biology, pp 229–236
Kubat M, Hotle R, Matwin S (1997) Learning when negative examples abound. In: Proceedings of the 9th European conference on machine learning. London: Springer, Heidelberg, 1224, pp 146–153
Hsu C-W, Chang C-C, Lin C-J (2008) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin
Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Transact Syst Man Cybern Part B Cybern 39(2):539–550
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 10971223, No. 11071252) and Chinese Universities Scientific Fund (2011JS039, 2012YJ130). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

College of Science, China Agricultural University, Beijing, 100083, People’s Republic of China
Qian Li, Bing Yang, Naiyang Deng & Ling Jing
Department of Mathematics, Beijing University of Posts and Telecommunications, Beijing, 100876, People’s Republic of China
Yi Li

Authors

Qian Li
View author publications
You can also search for this author in PubMed Google Scholar
Bing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Naiyang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ling Jing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ling Jing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Q., Yang, B., Li, Y. et al. Constructing support vector machine ensemble with segmentation for imbalanced datasets. Neural Comput & Applic 22 (Suppl 1), 249–256 (2013). https://doi.org/10.1007/s00521-012-1041-z

Download citation

Received: 27 June 2011
Accepted: 22 June 2012
Published: 10 July 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s00521-012-1041-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing support vector machine ensemble with segmentation for imbalanced datasets

Abstract

Access this article

Similar content being viewed by others

An Ensemble Method Based on SVC and Euclidean Distance for Classification Binary Imbalanced Data

Imbalanced Data Classification Method Based on Ensemble Learning

Benchmarking framework for class imbalance problem using novel sampling approach for big data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constructing support vector machine ensemble with segmentation for imbalanced datasets

Abstract

Access this article

Similar content being viewed by others

An Ensemble Method Based on SVC and Euclidean Distance for Classification Binary Imbalanced Data

Imbalanced Data Classification Method Based on Ensemble Learning

Benchmarking framework for class imbalance problem using novel sampling approach for big data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation