Skip to main content

Discretizing Continuous Attributes Using Information Theory

  • Conference paper
Computer and Information Sciences - ISCIS 2005 (ISCIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3733))

Included in the following conference series:

  • 2621 Accesses

Abstract

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beran, R.J.: Minimum Hellinger Distances for Parametric Models. Ann. Statistics 5, 445–463 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  2. Kadota, T., Shepp, L.A.: On the Best Finite Set of Linear Observables for discriminating two Gaussian signals. IEEE Transactions on Information Theory 13, 278–284 (1967)

    Article  MATH  Google Scholar 

  3. Boulle, M.: Khiops: A Statistical Discretization Method of Continuous Attributes. Machine Learning 55, 53–69 (2004)

    Article  MATH  Google Scholar 

  4. Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: European Working Session on Learning (1991)

    Google Scholar 

  5. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: 12th Int’l Conf. on Machine Learning (1995)

    Google Scholar 

  6. Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: 13th International Joint Conference of Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  7. Ying, Z.: Minimum Hellinger Distance Estimation for Censored Data. The Annals of Statistics 20(3) (1992)

    Google Scholar 

  8. Kononenko, I.: Inductive and Bayesian Learning in Medical Diagnosis. Applied Artificial Intelligence 7, 317–337 (1993)

    Article  Google Scholar 

  9. Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1968)

    Google Scholar 

  10. Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/~mlearn

  11. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publisher, San Francisco (1993)

    Google Scholar 

  12. Renyi, A.: On Measures of Entropy and Information. In: Proceedings of Fourth Berkeley Symposium, vol. 1, pp. 547–561 (1961)

    Google Scholar 

  13. Weiss, S.M., Galen, R.S., Tapepalli, P.V.: Maximizing the predictive value of production rules. Artificial Intelligence 45, 47–71 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, CH. (2005). Discretizing Continuous Attributes Using Information Theory. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_52

Download citation

  • DOI: https://doi.org/10.1007/11569596_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29414-6

  • Online ISBN: 978-3-540-32085-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics