Skip to main content

Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning

  • Conference paper
Advanced Data Mining and Applications (ADMA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

Abstract

Imbalanced data learning (IDL) is one of the most active and important fields in machine learning research. This paper focuses on exploring the efficiencies of four different SVM ensemble methods integrated with under-sampling in IDL. The experimental results on 20 UCI imbalanced datasets show that two new ensemble algorithms proposed in this paper, i.e., CABagE (which is bagging-style) and MABstE (which is boosting-style), can output the SVM ensemble classifiers with better minority-class-recognition abilities than the existing ensemble methods. Further analysis on the experimental results indicates that MABstE has the best overall classification performance, and we believe that this should be attributed to its more robust example-weighting mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter 6, 1–6 (2004)

    Article  Google Scholar 

  2. Yang, Q., Wu, X.: 10 Challenging Problems in Data Mining Research. International Journal of Information Technology & Decision Making 5, 597–604 (2006)

    Article  Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)

    MATH  Google Scholar 

  4. Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  5. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter 6, 20–29 (2004)

    Article  Google Scholar 

  6. Liu, Y., An, A., Huang, X.J.: Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 107–118. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Liu, X.Y., Wu, J.X., Zhou, Z.H.: Exploratory under-Sampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 39, 539–550 (2009)

    Article  Google Scholar 

  8. Dietterich, T.: Ensemble Learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn., pp. 110–125. The MIT Press, Cambridge (2002)

    Google Scholar 

  9. Wang, S.-J., Mathew, A., Chen, Y., Xi, L.-F., Ma, L., Lee, J.: Empirical Analysis of Support Vector Machine Ensemble Classifiers. Expert Systems with Applications 36, 6466–6476 (2008)

    Article  Google Scholar 

  10. Kim, H.-C., Pang, S., Je, H.-M., Kim, D., Bang, S.Y.: Constructing Support Vector Machine Ensemble. Pattern Recognition 36, 2757–2767 (2003)

    Article  MATH  Google Scholar 

  11. Breiman, L.: Bagging Predictors Machine Learning  24, 123–140 (1996)

    Google Scholar 

  12. Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning 36, 105–139 (1999)

    Article  Google Scholar 

  13. Freund, Y., Schapire, R.E., Abe, N.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)

    Google Scholar 

  14. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1088–1099 (2006)

    Article  Google Scholar 

  15. Caruana, R., Niculescu-Mizil, A.: Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–78. ACM, New York (2004)

    Google Scholar 

  16. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)

    MATH  Google Scholar 

  17. Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, Chichester (1999)

    Google Scholar 

  18. Chang, C.-C., Lin, C.-J.: Libsvm: A Library for Support Vector Machines (2001), http://www.Csie.Ntu.Edu.Tw/~Cjlin/Libsvm

  19. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, Z., Hao, Z., Yang, X., Liu, X. (2009). Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03348-3_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03347-6

  • Online ISBN: 978-3-642-03348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics