Skip to main content

A Novel Method for Highly Imbalanced Classification with Weighted Support Vector Machine

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11775))

Abstract

In real life, the problem of imbalanced data classification is unavoidable and difficult to solve. Traditional SVMs based classification algorithms usually cannot classify highly imbalanced data accurately, and sampling strategies are widely used to help settle the matter. In this paper, we put forward a novel undersampling method i.e., granular weighted SVMs-repetitive under-sampling (GWSVM-RU) for highly imbalanced classification, which is a weighted SVMs version of the granular SVMs-repetitive undersampling (GSVM-RU) once proposed by Yuchun Tang et al. We complete the undersampling operation by extracting the negative information granules repetitively which are obtained through the naive SVMs algorithm, and then combine the negative and positive granules again to compose the new training data sets. Thus we rebalance the original imbalanced data sets and then build new models by weighted SVMs to predict the testing data set. Besides, we explore four other rebalance heuristic mechanisms including cost-sensitive learning, undersampling, oversampling and GSVM-RU, our approach holds the higher classification performance defined by new evaluation metrics including G-Mean, F-Measure and AUC-ROC. Theories and experiments reveal that our approach outperforms other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.matlabsky.com/thread-17936-1-1.html.

References

  1. Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7

    Chapter  Google Scholar 

  2. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  3. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  5. Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  6. Chen, C., Liaw, A., Breiman, L., et al.: Using random forest to learn imbalanced data, vol. 110, pp. 1–12. University of California, Berkeley (2004)

    Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  8. Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. KDD 99, 155–164 (1999)

    Article  Google Scholar 

  9. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  Google Scholar 

  10. Keerthi, S.S., Lin, C.J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput. 15(7), 1667–1689 (2003)

    Article  Google Scholar 

  11. Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol. 97, pp. 179–186. Nashville, USA (1997)

    Google Scholar 

  12. Tang, Y., Zhang, Y.Q.: Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In: 2006 IEEE International Conference on Granular Computing, pp. 457–460. IEEE (2006)

    Google Scholar 

  13. Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(1), 281–288 (2009)

    Google Scholar 

  14. Vapnik, V., Vapnik, V.: Statistical Learning Theory, pp. 156–160. Wiley, New York (1998)

    Google Scholar 

  15. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Networks 10(5), 988–999 (1999)

    Article  Google Scholar 

  16. Yao, Y., Zhou, B.: A logic language of granular computing. In: 6th IEEE International Conference on Cognitive Informatics, pp. 178–185. IEEE (2007)

    Google Scholar 

Download references

Acknowledgement

We thank our anonymous reviewers for their invaluable feedback. This work was supported by the National Natural Science Foundation of China (Grant No.61502486)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qi, B., Jiang, J., Shi, Z., Li, M., Fan, W. (2019). A Novel Method for Highly Imbalanced Classification with Weighted Support Vector Machine. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29551-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29550-9

  • Online ISBN: 978-3-030-29551-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics