Abstract
Building correctly-sized models is a central challenge for induction algorithms. Many approaches to decision tree induction fail this challenge. Under a broad range of circumstances, these approaches exhibit a nearly linear relationship between training set size and tree size, even after accuracy has ceased to increase. These algorithms fail to adjust for the statistical effects of comparing multiple subtrees. Adjusting for these effects produces trees with little or no excess structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International, 1984.
Paul R. Cohen. Empirical Methods for Artificial Intelligence. The MIT Press, Cambridge, 1995.
Paul R. Cohen and David Jensen. Overfitting explained. In Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 115–122, 1997.
George H. John. Robust decision trees: Removing outliers from databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995.
G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):199–127, 1980.
Randy Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence. MIT Press, 1992.
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, Inc., 1993.
J. Ross Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221–234, 1987.
J. Ross Quinlan and R. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation, 80:227–248, 1989.
J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag
About this paper
Cite this paper
Jensen, D., Oates, T., Cohen, P.R. (1997). Building simple models: A case study with decision trees. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052842
Download citation
DOI: https://doi.org/10.1007/BFb0052842
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive