Skip to main content

Building simple models: A case study with decision trees

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis Reasoning about Data (IDA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1280))

Included in the following conference series:

Abstract

Building correctly-sized models is a central challenge for induction algorithms. Many approaches to decision tree induction fail this challenge. Under a broad range of circumstances, these approaches exhibit a nearly linear relationship between training set size and tree size, even after accuracy has ceased to increase. These algorithms fail to adjust for the statistical effects of comparing multiple subtrees. Adjusting for these effects produces trees with little or no excess structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International, 1984.

    Google Scholar 

  2. Paul R. Cohen. Empirical Methods for Artificial Intelligence. The MIT Press, Cambridge, 1995.

    MATH  Google Scholar 

  3. Paul R. Cohen and David Jensen. Overfitting explained. In Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 115–122, 1997.

    Google Scholar 

  4. George H. John. Robust decision trees: Removing outliers from databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995.

    Google Scholar 

  5. G.V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):199–127, 1980.

    Article  Google Scholar 

  6. Randy Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence. MIT Press, 1992.

    Google Scholar 

  7. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, Inc., 1993.

    Google Scholar 

  8. J. Ross Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies, 27:221–234, 1987.

    Article  Google Scholar 

  9. J. Ross Quinlan and R. Rivest. Inferring decision trees using the minimum description length principle. Information and Computation, 80:227–248, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  10. J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Xiaohui Liu Paul Cohen Michael Berthold

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag

About this paper

Cite this paper

Jensen, D., Oates, T., Cohen, P.R. (1997). Building simple models: A case study with decision trees. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052842

Download citation

  • DOI: https://doi.org/10.1007/BFb0052842

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63346-4

  • Online ISBN: 978-3-540-69520-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics