A Novel Method for Highly Imbalanced Classification with Weighted Support Vector Machine

Qi, Biao; Jiang, Jianguo; Shi, Zhixin; Li, Meimei; Fan, Wei

doi:10.1007/978-3-030-29551-6_24

Biao Qi^11,12,
Jianguo Jiang^11,12,
Zhixin Shi^11,12,
Meimei Li^11,12 &
…
Wei Fan^11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11775))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2665 Accesses
2 Citations

Abstract

In real life, the problem of imbalanced data classification is unavoidable and difficult to solve. Traditional SVMs based classification algorithms usually cannot classify highly imbalanced data accurately, and sampling strategies are widely used to help settle the matter. In this paper, we put forward a novel undersampling method i.e., granular weighted SVMs-repetitive under-sampling (GWSVM-RU) for highly imbalanced classification, which is a weighted SVMs version of the granular SVMs-repetitive undersampling (GSVM-RU) once proposed by Yuchun Tang et al. We complete the undersampling operation by extracting the negative information granules repetitively which are obtained through the naive SVMs algorithm, and then combine the negative and positive granules again to compose the new training data sets. Thus we rebalance the original imbalanced data sets and then build new models by weighted SVMs to predict the testing data set. Besides, we explore four other rebalance heuristic mechanisms including cost-sensitive learning, undersampling, oversampling and GSVM-RU, our approach holds the higher classification performance defined by new evaluation metrics including G-Mean, F-Measure and AUC-ROC. Theories and experiments reveal that our approach outperforms other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.matlabsky.com/thread-17936-1-1.html.

References

Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Chapter Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Chen, C., Liaw, A., Breiman, L., et al.: Using random forest to learn imbalanced data, vol. 110, pp. 1–12. University of California, Berkeley (2004)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. KDD 99, 155–164 (1999)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Article Google Scholar
Keerthi, S.S., Lin, C.J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput. 15(7), 1667–1689 (2003)
Article Google Scholar
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186. Nashville, USA (1997)
Google Scholar
Tang, Y., Zhang, Y.Q.: Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In: 2006 IEEE International Conference on Granular Computing, pp. 457–460. IEEE (2006)
Google Scholar
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(1), 281–288 (2009)
Google Scholar
Vapnik, V., Vapnik, V.: Statistical Learning Theory, pp. 156–160. Wiley, New York (1998)
Google Scholar
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Networks 10(5), 988–999 (1999)
Article Google Scholar
Yao, Y., Zhou, B.: A logic language of granular computing. In: 6th IEEE International Conference on Cognitive Informatics, pp. 178–185. IEEE (2007)
Google Scholar

Download references

Acknowledgement

We thank our anonymous reviewers for their invaluable feedback. This work was supported by the National Natural Science Foundation of China (Grant No.61502486)

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Biao Qi, Jianguo Jiang, Zhixin Shi, Meimei Li & Wei Fan
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Biao Qi, Jianguo Jiang, Zhixin Shi, Meimei Li & Wei Fan

Authors

Biao Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Meimei Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixin Shi .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Christos Douligeris
University of Vienna, Vienna, Austria
Dimitris Karagiannis
University of Piraeus, Piraeus, Greece
Dimitris Apostolou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, B., Jiang, J., Shi, Z., Li, M., Fan, W. (2019). A Novel Method for Highly Imbalanced Classification with Weighted Support Vector Machine. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-29551-6_24
Published: 21 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29550-9
Online ISBN: 978-3-030-29551-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics