Skip to main content

A Combination Classification Algorithm Based on Outlier Detection and C4.5

  • Conference paper
Book cover Advanced Data Mining and Applications (ADMA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

Abstract

The performance of traditional classifier skews towards the majority class for imbalanced data, resulting in high misclassification rate for minority samples. To solve this problem, a combination classification algorithm based on outlier detection and C4.5 is presented. The basic idea of the algorithm is to make the data distribution balance by grouping the whole data into rare clusters and major clusters through the outlier factor. Then C4.5 algorithm is implemented to build the decision trees on both the rare clusters and the major clusters respectively. When classifying a new object, the decision tree for evaluation will be chosen according to the type of the cluster which the new object is nearest. We use the datasets from the UCI Machine Learning Repository to perform the experiments and compare the effects with other classification algorithms; the experiments demonstrate that our algorithm performs much better for the extremely imbalanced data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Weiss, G.M.: Mining with Rarity: A Uinfying Framework. Sigkdd Explorations 6(1), 7–19 (2004)

    Article  Google Scholar 

  2. Marcus, A.: Learning when data set s are imbalanced and when costs are unequal and unknown. In: Proc. of t he Workshop on Learning from Imbalanced Data Sets II, ICML, Washington DC (2003)

    Google Scholar 

  3. Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(2), 539–550 (2009)

    Article  Google Scholar 

  4. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. LNCS, pp. 878–887. Springer, Heidelberg (2005)

    Google Scholar 

  5. Guo, H., Viktor, H.L.: Learning from Imbalanced Data Set s with Boosting and Data Generation: The DataBoost-IM Approach. Sigkdd Explorations 6, 30–39 (2003)

    Article  Google Scholar 

  6. Hong, X., Chen, S., Harris, C.J.: A Kernel-Based Two-Class Classifier for Imbalanced Data Sets. IEEE Transactions on Neural Networks 17(6), 786–795 (2007)

    Google Scholar 

  7. Su, C.-T., Chen, L.-S., Yih, Y.: Knowledge acquisition through information granulation for imbalanced data. Expert Systems with applications 31, 531–541 (2006)

    Article  Google Scholar 

  8. Jiang, S., Song, X.: A clustering-based method for unsupervised intrusion detections. Pattern Recognition Letters 5, 802–810 (2006)

    Article  Google Scholar 

  9. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, S., Yu, W. (2009). A Combination Classification Algorithm Based on Outlier Detection and C4.5. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03348-3_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03347-6

  • Online ISBN: 978-3-642-03348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics