Skip to main content

An Improved Algorithm of Unbalanced Data SVM

  • Conference paper
Book cover Fuzzy Information and Engineering 2010

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 78))

Abstract

Since SVM is unfair to the rare class for the classification of unbalanced data, a new balancing strategy based on common strategy of undersampling the training data is presented. Firstly, the fuzzy C-means clustering algorithm is used to cluster the unbalanced data sets, and choose the negative class samples whose memberships are greater than a certain threshold (supposing the number of positive class samples is less than that of negative class samples). The selected samples and the positive class of original samples are combined into a new training data set. After that, the new data set are used to train a support vector machine. At last, the simulations on unbalanced data show that the proposed algorithm can compensate the ill-effect of tendency when support vector machine are utilized to deal with the unbalanced data classification. Moreover, compared with the traditional support vector machine and some other improved algorithm, the proposed algorithm performs superior classification ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  2. Wu, G., Chang, E.: Class-boundary alignment for unbalanced dataset learning. In: ICML, workshop on learning from unbalanced data sets II, Washington, DC, vol. 6(1), pp. 7–19 (2003)

    Google Scholar 

  3. Bahlmann, C., Haasdonk, B.: On-line handwriting recognition with support vector machines-a kernel approach. Frontiers in Handwriting Recognition, 49–54 (2002)

    Google Scholar 

  4. Huang, H.P., Liu, Y.H.: Fuzzy support vector machines for pattern recognition and data mining. International Journal of Fuzzy Systems 4(3), 826–835 (2004)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W.: Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16(3), 321–357 (2002)

    MATH  Google Scholar 

  6. Rehan, A., Stephen, K.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)

    Google Scholar 

  7. Bezdek, J.C., Ehrlich, R.: FCM: The fuzzy c-means clustering algorithm. Computer & Geosciences 10(22), 191–203 (1981)

    Google Scholar 

  8. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html

  9. Peng, L., Zhang, K.: Support vector machines based on fuzzy C-means clustering. Industrial Control Computer 19(11), 43–44 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, B., Ha, M., Wang, C. (2010). An Improved Algorithm of Unbalanced Data SVM. In: Cao, By., Wang, Gj., Guo, Sz., Chen, Sl. (eds) Fuzzy Information and Engineering 2010. Advances in Intelligent and Soft Computing, vol 78. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14880-4_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14880-4_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14879-8

  • Online ISBN: 978-3-642-14880-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics