Building simple models: A case study with decision trees

Jensen, David; Oates, Tim; Cohen, Paul R.

doi:10.1007/BFb0052842

David Jensen¹,
Tim Oates¹ &
Paul R. Cohen¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1280))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

862 Accesses
1 Citations

Abstract

Building correctly-sized models is a central challenge for induction algorithms. Many approaches to decision tree induction fail this challenge. Under a broad range of circumstances, these approaches exhibit a nearly linear relationship between training set size and tree size, even after accuracy has ceased to increase. These algorithms fail to adjust for the statistical effects of comparing multiple subtrees. Adjusting for these effects produces trees with little or no excess structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International, 1984.
Google Scholar
Paul R. Cohen. Empirical Methods for Artificial Intelligence. The MIT Press, Cambridge, 1995.
MATH Google Scholar
Paul R. Cohen and David Jensen. Overfitting explained. In Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 115–122, 1997.
Google Scholar
George H. John. Robust decision trees: Removing outliers from databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995.
Google Scholar
G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):199–127, 1980.
Article Google Scholar
Randy Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence. MIT Press, 1992.
Google Scholar
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, Inc., 1993.
Google Scholar
J. Ross Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221–234, 1987.
Article Google Scholar
J. Ross Quinlan and R. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation, 80:227–248, 1989.
Article MATH MathSciNet Google Scholar
J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, 01003, Amherst, MA
David Jensen, Tim Oates & Paul R. Cohen

Authors

David Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Tim Oates
View author publications
You can also search for this author in PubMed Google Scholar
Paul R. Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Xiaohui Liu Paul Cohen Michael Berthold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jensen, D., Oates, T., Cohen, P.R. (1997). Building simple models: A case study with decision trees. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052842

Download citation

DOI: https://doi.org/10.1007/BFb0052842
Published: 19 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics