Abstract
In real life, the problem of imbalanced data classification is unavoidable and difficult to solve. Traditional SVMs based classification algorithms usually cannot classify highly imbalanced data accurately, and sampling strategies are widely used to help settle the matter. In this paper, we put forward a novel undersampling method i.e., granular weighted SVMs-repetitive under-sampling (GWSVM-RU) for highly imbalanced classification, which is a weighted SVMs version of the granular SVMs-repetitive undersampling (GSVM-RU) once proposed by Yuchun Tang et al. We complete the undersampling operation by extracting the negative information granules repetitively which are obtained through the naive SVMs algorithm, and then combine the negative and positive granules again to compose the new training data sets. Thus we rebalance the original imbalanced data sets and then build new models by weighted SVMs to predict the testing data set. Besides, we explore four other rebalance heuristic mechanisms including cost-sensitive learning, undersampling, oversampling and GSVM-RU, our approach holds the higher classification performance defined by new evaluation metrics including G-Mean, F-Measure and AUC-ROC. Theories and experiments reveal that our approach outperforms other methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)
Chen, C., Liaw, A., Breiman, L., et al.: Using random forest to learn imbalanced data, vol. 110, pp. 1–12. University of California, Berkeley (2004)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. KDD 99, 155–164 (1999)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Keerthi, S.S., Lin, C.J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput. 15(7), 1667–1689 (2003)
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186. Nashville, USA (1997)
Tang, Y., Zhang, Y.Q.: Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In: 2006 IEEE International Conference on Granular Computing, pp. 457–460. IEEE (2006)
Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(1), 281–288 (2009)
Vapnik, V., Vapnik, V.: Statistical Learning Theory, pp. 156–160. Wiley, New York (1998)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Networks 10(5), 988–999 (1999)
Yao, Y., Zhou, B.: A logic language of granular computing. In: 6th IEEE International Conference on Cognitive Informatics, pp. 178–185. IEEE (2007)
Acknowledgement
We thank our anonymous reviewers for their invaluable feedback. This work was supported by the National Natural Science Foundation of China (Grant No.61502486)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Qi, B., Jiang, J., Shi, Z., Li, M., Fan, W. (2019). A Novel Method for Highly Imbalanced Classification with Weighted Support Vector Machine. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-29551-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29550-9
Online ISBN: 978-3-030-29551-6
eBook Packages: Computer ScienceComputer Science (R0)