Skip to main content

CD: A Coupled Discretization Algorithm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Abstract

Discretization technique plays an important role in data mining and machine learning. While numeric data is predominant in the real world, many algorithms in supervised learning are restricted to discrete variables. Thus, a variety of research has been conducted on discretization, which is a process of converting the continuous attribute values into limited intervals. Recent work derived from entropy-based discretization methods, which has produced impressive results, introduces information attribute dependency to reduce the uncertainty level of a decision table; but no attention is given to the increment of certainty degree from the aspect of positive domain ratio. This paper proposes a discretization algorithm based on both positive domain and its coupling with information entropy, which not only considers information attribute dependency but also concerns deterministic feature relationship. Substantial experiments on extensive UCI data sets provide evidence that our proposed coupled discretization algorithm generally outperforms other seven existing methods and the positive domain based algorithm proposed in this paper, in terms of simplicity, stability, consistency, and accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. An, A., Cercone, N.: Discretization of Continuous Attributes for Learning Classification Rules. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 509–514. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  2. Banda, J.M., Angryk, R.A.: On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images. In: FUZZ-IEEE 2009, pp. 2019–2024 (2009)

    Google Scholar 

  3. Beynon, M.J.: Stability of continuous value discretisation: an application within rough set theory. International Journal of Approximate Reasoning 35, 29–53 (2004)

    Article  MATH  Google Scholar 

  4. Chen, C., Wang, L.: Rough set-based clustering with refinement using Shannon’s entropy theory. Computers and Mathematics with Applications 52(10-11), 1563–1576 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning 15, 319–331 (1996)

    Article  MATH  Google Scholar 

  6. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  7. Liu, W., Chawla, S.: Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 345–356. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies 29, 81–95 (1988)

    Article  MATH  Google Scholar 

  9. Qin, B., Xia, Y., Li, F.: DTU: A Decision Tree for Uncertain Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 4–15. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Son, N.H., Szczuka, M.: Rough sets in KDD. In: PAKDD 2005, pp. 1–91 (2005)

    Google Scholar 

  11. Wang, C., Cao, L., Wang, M., Li, J., Wei, W., Ou, Y.: Coupled nominal similarity in unsupervised learning. In: CIKM 2011, pp. 973–978 (2011)

    Google Scholar 

  12. Wang, G., Zhao, J., An, J., Wu, Y.: A comparative study of algebra viewpoint and information viewpoint in attribute reduction. Fundamenta Informaticae 68, 289–301 (2005)

    MathSciNet  MATH  Google Scholar 

  13. Yang, Y., Webb, G.I.: Discretization for Naive-Bayes learning: managing discretization bias and variance. Machine Learning 74, 39–74 (2009)

    Article  Google Scholar 

  14. Zhang, X., Wu, J., Yang, X., Lu, T.: Estimation of market share by using discretization technology: an application in China mobile. In: ICCS 2008, pp. 466–475 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, C., Wang, M., She, Z., Cao, L. (2012). CD: A Coupled Discretization Algorithm. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30220-6_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30219-0

  • Online ISBN: 978-3-642-30220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics