Abstract
We introduce a numerical measure on sets of partitions of finite sets that is linked to the Goodman-Kruskal association index commonly used in statistics. This measure allows us to define a metric on such partions used for constructing decision trees. Experimental results suggest that by replacing the usual splitting criterion used in C4.5 by a metric criterion based on the Goodman-Kruskal coefficient it is possible, in most cases, to obtain smaller decision trees without sacrificing accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Breiman, L., Friedman, J.H., Ohlsen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984) (republished 1993)
Utgoff, P.E.: Decision tree induction based on efficient tree restructuring. Technical Report 95-18, University of Massachusetts, Amherst (1995)
Utgoff, P.E., Clouse, J.A.: A Kolmogorov-Smirnoff metric for decision tree induction. Technical Report 96-3, University of Massachusetts, Amherst (1996)
de Mántaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)
Simovici, D.A., Jaroszewicz, S.: Generalized conditional entropy and decision trees. In: Proceedings of EGC, Lyon, France, pp. 369–380 (2003)
Daróczy, Z.: Generalized information functions. Information and Control 16, 36–51 (1970)
Goodman, L.A., Kruskal, W.H.: Measures of Association for Cross-Classification, vol. 1. Springer, New York (1980)
Liebtrau, A.M.: Measures of Association. SAGE, Beverly Hills (1983)
Grätzer, G.: General Lattice Theory, 2nd edn. Birkhäuser, Basel (1998)
Simovici, D.A., Tenney, R.L.: Relational Database Systems. Academic Press, New York (1995)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1998)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Simovici, D.A., Jaroszewicz, S. (2004). A Metric Approach to Building Decision Trees Based on Goodman-Kruskal Association Index. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive