Abstract
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beran, R.J.: Minimum Hellinger Distances for Parametric Models. Ann. Statistics 5, 445–463 (1977)
Kadota, T., Shepp, L.A.: On the Best Finite Set of Linear Observables for discriminating two Gaussian signals. IEEE Transactions on Information Theory 13, 278–284 (1967)
Boulle, M.: Khiops: A Statistical Discretization Method of Continuous Attributes. Machine Learning 55, 53–69 (2004)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: European Working Session on Learning (1991)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: 12th Int’l Conf. on Machine Learning (1995)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: 13th International Joint Conference of Artificial Intelligence, pp. 1022–1027 (1993)
Ying, Z.: Minimum Hellinger Distance Estimation for Censored Data. The Annals of Statistics 20(3) (1992)
Kononenko, I.: Inductive and Bayesian Learning in Medical Diagnosis. Applied Artificial Intelligence 7, 317–337 (1993)
Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1968)
Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/~mlearn
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publisher, San Francisco (1993)
Renyi, A.: On Measures of Entropy and Information. In: Proceedings of Fourth Berkeley Symposium, vol. 1, pp. 547–561 (1961)
Weiss, S.M., Galen, R.S., Tapepalli, P.V.: Maximizing the predictive value of production rules. Artificial Intelligence 45, 47–71 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, CH. (2005). Discretizing Continuous Attributes Using Information Theory. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_52
Download citation
DOI: https://doi.org/10.1007/11569596_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29414-6
Online ISBN: 978-3-540-32085-2
eBook Packages: Computer ScienceComputer Science (R0)