Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains

Bramer, Max

doi:10.1007/3-540-45675-9_2

Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains

Max Bramer⁷

Conference paper
First Online: 01 January 2002

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2412))

Abstract

The automatic induction of classification rules from examples in the form of a classification tree is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. In some cases this can lead to an excessively large number of rules, many of which have very little predictive value for unseen data. This paper describes a means of reducing overfitting known as J-pruning, based on the J-measure, an information theoretic means of quantifying the information content of a rule. It is demonstrated that using J-pruning generally leads to a substantial reduction in the number of rules generated and an increase in predictive accuracy. The advantage gained becomes more pronounced as the proportion of noise increases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hunt, E.B., Marin J. and Stone, P.J. (1966). Experiments in Induction. Academic Press
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann
Google Scholar
Bramer, M.A. (2002). An Information-Theoretic Approach to the Pre-pruning of Classification Rules. Proceedings of the IFIP World Computer Congress, Montreal 2002.
Google Scholar
Blake, C.L. and Merz, C.J. (1998). UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
Google Scholar
Mingers, J. (1989). An Empirical Comparison of Pruning Methods for Decision Tree Induction. Machine Learning, 4, pp. 227–243
Article Google Scholar
Bramer, M.A. (2002). Using J-Pruning to Reduce Overfitting in Classification Trees. In: Research and Development in Intelligent Systems XVIII. Springer-Verlag, pp. 25–38.
Google Scholar
Smyth, P. and Goodman, R.M. (1991). Rule Induction Using Information Theory. In: Piatetsky-Shapiro, G. and Frawley, W.J. (eds.), Knowledge Discovery in Databases. AAAI Press, pp. 159–176
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Technology, University of Portsmouth, UK
Max Bramer

Authors

Max Bramer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Electronics, UMIST, Manchester, M60 1QD, UK
Hujun Yin , Nigel Allinson & Richard Freeman , &
Department of Computation, UMIST, Manchester, M60 1QD, UK
John Keane
Department of Biomolecular Science, UMIST, Manchester, M60 1QD, UK
Simon Hubbard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bramer, M. (2002). Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_2

Download citation

DOI: https://doi.org/10.1007/3-540-45675-9_2
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44025-3
Online ISBN: 978-3-540-45675-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics