A Further Comparison of Simplification Methods for Decision-Tree Induction

Malerba, Donato; Esposito, Floriana; Semeraro, Giovanni

doi:10.1007/978-1-4612-2404-4_35

Donato Malerba³,
Floriana Esposito³ &
Giovanni Semeraro³

Part of the book series: Lecture Notes in Statistics ((LNS,volume 112))

871 Accesses
17 Citations

Abstract

This paper presents an empirical investigation of eight well-known simplification methods for decision trees induced from training data. Twelve data sets are considered to compare both the accuracy and the complexity of simplified trees. The computation of optimally pruned trees is used in order to give a clear definition of bias of the methods towards overpruning and underpruning. The results indicate that the simplification strategies which exploit an independent pruning set do not perform better than the others. Furthermore, some methods show an evident bias towards either underpruning or overpruning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth International, Belmont, CA, 1984.
Google Scholar
W. Buntine and T. Niblett. A further comparison of splitting rules for decision-tree induction. Machine Learning, 8:75–85, 1992.
Google Scholar
B. Cestnik and I. Bratko. On estimating probabilities in tree pruning. In Proceedings of the EWSL-91, pages 138–150, 1991.
Google Scholar
F. Esposito, D. Malerba, and G. Semeraro. Decision tree pruning as a search in the state space. In P. Brazdil, editor, Machine Learning: ECML-93. Springer-Verlag, Berlin, 1993.
Google Scholar
D. H. Fisher. Pessimistic and optimistic induction. Department of Computer Science, Vanderbilt University, 1992.
Google Scholar
R. C. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–90, 1993.
Article MATH Google Scholar
P. M. Murphy and D. W. Aha. UCI repository of machine learning databases [machine-readable data repository]. Department of Information and Computer Science, University of California, Irvine, 1994.
Google Scholar
J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 4:227–243, 1989.
Article Google Scholar
T. Niblett and I. Bratko. Learning decision rules in noisy domains. In Proceedings of Expert Systems 86, Cambridge, 1986. Cambridge University Press.
Google Scholar
T. Niblett. Constructing decision trees in noisy domains. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, Wilmslow, 1987.
Google Scholar
J.R. Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221–234, 1987.
Article Google Scholar
J. R. Quinlan. C.1.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
Google Scholar
C. Schaffer. Overfitting avoidance as bias. Machine Learning, 10:153–178, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Bari, via Orabona, 4, 70126, Bari, Italy
Donato Malerba, Floriana Esposito & Giovanni Semeraro

Authors

Donato Malerba
View author publications
You can also search for this author in PubMed Google Scholar
Floriana Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Semeraro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Vanderbilt University, Box 1679, Station B, Nashville, Tennessee, 37235, USA
Doug Fisher
Department of Economics Institute of Statistics and Econometrics, Free University of Berlin, 14185, Berlin, Garystre 21, Germany
Hans-J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Malerba, D., Esposito, F., Semeraro, G. (1996). A Further Comparison of Simplification Methods for Decision-Tree Induction. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_35

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2404-4_35
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94736-5
Online ISBN: 978-1-4612-2404-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics