The minimum description length based decision tree pruning

Kononenko, Igor

doi:10.1007/BFb0095272

Igor Kononenko¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1531))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

521 Accesses
3 Citations

Abstract

We describe the Minimum Description Length (MDL) based decision tree pruning. A subtree is considered unreliable and therefore is pruned if the description length of the classification of the corresponding subsets of training instances together with the description lengths of each path in the subtree is greater than the description length of the classification of the whole subset of training instances in the current node. We compare the performance of our simple, parameterless, and well-founded MDL method with some other methods on 18 datasets. The classification accuracy using the MDL pruning is comparable to other approaches and the decision trees are nearly optimally pruned which makes our method an attractive tool for obtaining a first approximation of the target decision tree during the knowledge discovery process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

I. Bratko, I. Kononenko. Learning diagnostic rules from incomplete and noisy data. In: B. Phelps (ed.) Interactions in Artificial Intelligence and Statistical Methods, Technical Press.
Google Scholar
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Clasification and Regression Trees. Wadsworth International Group, 1984.
Google Scholar
B. Cestnik. Estimating probabilities: A crucial task in machine learning. Proc. European Conference on Artificial Intelligences ECAI-90, Stochkholm, August 1990, pp. 147–149.
Google Scholar
B. Cestnik and I. Bratko. On estimating probabilities in tree pruning. Proc. European Working Session on Learning, (Porto, March 1991), Y. Kodratoff (ed.), Springer Verlag. pp. 138–150.
Google Scholar
B. Cestnik, I. Kononenko, and I. Bratko. ASSISTANT 86: A knowledge elicitation tool for sophisticated users. In: I. Bratko and N. Lavrac (eds.), Progress in Machine Learning. Wilmslow, England: Sigma Press.
Google Scholar
F. Esposito, D. Malerba, and G. Semeraro. Simplifying decision trees by pruning and grafting: new results. Proc. Europ. Conf. on Machine Learning ECML-95 (N. Lavrac and S. Wrobel, eds.), Springer Verlag, pp. 287–290.
Google Scholar
K. Kira and L. Rendell. A practical approach to feature selection. Proc. Intern. Conf. on Machine Learning ICML-92 (Aberdeen, July 1992) D. Sleeman & P. Edwards (eds.), Morgan Kaufmann, pp. 249–256.
Google Scholar
I. Kononenko. On biases in estimating multivalued attributes. Proc. Int. Joint Conf. on Artificial Intelligence IJCAI-95, Montreal, August 20–25 1995, pp. 1034–1040.
Google Scholar
I. Kononenko and I. Bratko. Information based evaluation criterion for classifier’s performance. Machine Learning 6: 67–80.
Google Scholar
I. Kononenko, I. Bratko, E. Roskar. Experiments in automatic learning of medical diagnostic rules. International School for the Synthesis of Expert’s Knowledge Workshop ISSEK-84, Bled, Slovenia, August 1984.
Google Scholar
I. Kononenko, E. Simec Induction of decision trees using Relief F. in: G. Della Riccia, R. Kruse, and R. Viertl (eds.). Mathematical and Statistical Methods in Artificial Intelligence, Springer Verlag.
Google Scholar
M. Kovacic. Stochastic Inductive Logic Programming. Ph.D. Thesis, University of Ljubljana, March 1995, (available at: http://ai.fri.uni-lj.si/papers/index.html).
Google Scholar
M. Li and P. Vitanyi. An introduction to Kolmogorov Complexity and its applications, Springer Verlag, 1993.
Google Scholar
J. Mingers. An empirical comparison of selection measures for decision tree induction. Machine Learning, 4:227–243.
Google Scholar
P.M. Murphy and D.W. Aha. UCI Repository of machine learning databases [Machine-readable data repository]. Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
T. Niblett and I. Bratko. Learning decision rules in noisy domains. Proc. Expert Systems 86, Brighton, UK, December 1986.
Google Scholar
J.R. Quinlan. Semi-autonomous acquisition of pattern-based knowledge. Machine Intelligence 10 (J. Hayes, D. Michie, and J.H. Pao, eds.), Horwood & Wiley.
Google Scholar
J.R. Quinlan. Simplifying decision trees. Int. J. of Man-Machine Studies, 27: 221–234.
Google Scholar
J.R. Quinlan, C4.5 programs for machine learning, Morgan Kaufmann.
Google Scholar
J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Trans. on Information Theory, 30(4): 629–636.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Inf. Sc., University of Ljubljana, Tržaška 25, SI-1001, Ljubljana, Slovenia
Igor Kononenko

Authors

Igor Kononenko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hing-Yan Lee Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kononenko, I. (1998). The minimum description length based decision tree pruning. In: Lee, HY., Motoda, H. (eds) PRICAI’98: Topics in Artificial Intelligence. PRICAI 1998. Lecture Notes in Computer Science, vol 1531. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0095272

Download citation

DOI: https://doi.org/10.1007/BFb0095272
Published: 20 October 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65271-7
Online ISBN: 978-3-540-49461-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics