Abstract
Post pruning of decision trees has been a successful approach in many real-world experiments, but over all possible concepts it does not bring any inherent improvement to an algorithm’s performance. This work explores how a PAC-proven decision tree learning algorithm fares in comparison with two variants of the normal top-down induction of decision trees. The algorithm does not prune its hypothesis per se, but it can be understood to do pre-pruning of the evolving tree. We study a backtracking search algorithm, called Rank, for learning rank-minimal decision trees. Our experiments follow closely those performed by Schaffer [20]. They confirm the main findings of Schaffer: in learning concepts with simple description pruning works, for concepts with a complex description and when all concepts are equally likely pruning is injurious, rather than beneficial, to the average performance of the greedy top-down induction of decision trees. Pre-pruning, as a gentler technique, settles in the average performance in the middle ground between not pruning at all and post pruning.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2 (1988) 343–370
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Pacific Grove, CA (1984)
Domingos, P.: A process-oriented heuristic for model selection. In: Shavlik, J. (ed.): Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann, San Francisco, CA (1998) 127–135
Domingos, P.: Occam’s two razors: the sharp and the blunt. In: Agrawal, R., Stolorz, P., Piatetsky-Shapiro, G. (eds.): Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA (1998) 37–43
Domingos, P.: Process-oriented estimation of generalization error. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (to appear)
Ehrenfeucht A., Haussler, D.: Learning decision trees from random examples. Inf. Comput. 82 (1989) 231–246
Elomaa, T.: Tools and techniques for decision tree learning. Report A-1996-2, Department of Computer Science, University of Helsinki (1996)
Elomaa, T., Kivinen, J.: Learning decision trees from noisy examples, Report A-1991-3, Department of Computer Science, University of Helsinki (1991)
Hancock, T., Jiang, T., Li, M., Tromp, J.: Lower bounds on learning decision lists and trees. Inf. Comput. 126 (1996) 114–122
Holder, L. B.: Intermediate decision trees. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1995) 1056–1061
Holte, R. C.: Very simple Classification rules perform well on most commonly used data sets. Mach. Learn. 11 (1993) 63–90
Murthy S. K., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA (1995) 1025–1031
Oates, T., Jensen, D.: The Effects of training set size on decision tree complexity. In: Fisher, D. H. (ed.): Machine Learning: Proceedings of the Fourteenth International Conference, Morgan Kaufmann, San Francisco, CA (1997) 254–261
Quinlan, J. R.: Learning Efficient Classification procedures and their application to chess end games. In: Michalski, R., Carbonell, J., Mitchell, T. (eds.): Machine Learning: An Artificial Intelligence Approach. Tioga, Palo Alto, CA (1983) 391–411
Quinlan, J. R.: Induction of decision trees. Mach. Learn. 1 (1986) 81–106
Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)
Quinlan, J. R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4 (1996) 77–90
Rao, R. B., Gordon, D. F., Spears, W. M.: For every generalization action, is there really an equal and opposite reaction? Analysis of the conservation law for generalization performance. In: Prieditis, A., Russell, S. (eds.): Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, San Francisco, CA (1995) 471–479
Sakakibara, Y.: Noise-tolerant Occam algorithms and their applications to learning decision trees. Mach. Learn. 11 (1993) 37–62
Schaffer, C.: Overfitting avoidance as bias. Mach. Learn. 10 (1993) 153–178
Schaffer, C.: A conservation law for generalization performance. In: Cohen, W. W., Hirsh, H. (eds.): Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, San Francisco, CA (1994) 259–265
Valiant, L. G.: A theory of the learnable. Commun. ACM 27 (1984) 1134–1142
Wang, C., Venkatesh, S. S., Judd, J. S.: Optimal stopping and Effective machine complexity in learning. In: Cowan, J. D., Tesauro, G., Alspector, J. (eds.): Advances in Neural Information Processing Systems, Vol. 6. Morgan Kaufmann, San Francisco, CA (1994) 303–310
Wolpert, D. H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8 (1996) 1341–1390
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elomaa, T. (1999). The Biases of Decision Tree Pruning Strategies. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds) Advances in Intelligent Data Analysis. IDA 1999. Lecture Notes in Computer Science, vol 1642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48412-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-48412-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66332-4
Online ISBN: 978-3-540-48412-7
eBook Packages: Springer Book Archive