Abstract
Experimental evidence shows that many attribute selection criteria involved in the induction of decision trees perform comparably. We set up a theoretical framework that explains this empirical law. It furthermore provides an infinite set of criteria (the C.M. criteria) which contains the most commonly used criteria. We also define C.M. pruning which is suitable in uncertain domains. In such domains, like medicine, some sub-trees which don't lessen the error rate can be relevant to point out some populations of specific interest or to give a representation of a large data file. C.M. pruning allows to keep such sub-trees, even when keeping the sub-trees doesn't increase the classification efficiency. Thus we obtain a consistent framework for both building and pruning decision trees in uncertain domains. We give typical examples in medicine, highlighting routine use of induction in this domain even if the targeted diagnosis cannot be reached for many cases from the findings under investigation.
Preview
Unable to display preview. Download preview PDF.
References
Babic, A., Krusinska, E., & Strömberg, J. E. (1992) Extraction of diagnostic rules using recursive partitioning systems: a comparison of two approaches. Artificial Intelligence in Medicine 4, 373–387.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984) Classification and regression trees. Wadsworth. Statistics probability series. Belmont.
Breiman, L. (1996) Some properties of splitting criteria (technical note). Machine Learning 21, 41–47.
Buntine, W. (1992) Learning classification trees. Statistics and Computing 2, 63–73
Buntine, W., & Niblett, T. (1992) A further comparison of splitting rules for decision-tree induction. Machine Learning 8, 75–85.
Catlett, J. (1991) Overpruning large decision trees. In proceedings of the Twelfth International Joint Conference on Artificial Intelligence IJCAI 91. (pp 764–769). Sydney, Australia.
Crémilleux, B. (1991) Induction automatique: aspects théoriques, le système ARBRE, applications en médecine. Ph D thesis. Joseph Fourier University. Grenoble (France).
Crémilleux, B., & Robert, C. (1996) A Pruning Method for Decision Trees in Uncertain Domains: Applications in Medicine. In proceedings of the workshop Intelligent Data Analysis in Medicine and Pharmacology, ECAI 96. (pp 15–20). Budapest, Hungary.
Crémilleux, B., & Zreik, K. (1996) Le rôle de l'interaction personne-système lors de la production d'arbres de décision. In proceedings of the international Conference on Human-System Learning CAPS 96. (pp 20–31). Caen, France.
Esposito, F., Malerba, D., & Semeraro, G. (1993) Decision tree pruning as search in the state space. In Proceedings of European Conference on Machine Learning ECML 93. (pp 165–184). Vienna (Austria), P. B. Brazdil (Ed.). Lecture notes in artificial intelligence. N∘ 667. Springer-Verlag.
Fayyad, U. M., & Irani, K. B. (1992) The attribute selection problem in decision tree generation. In Proceedings of Tenth National Conference on Artificial Intelligence. (pp 104–110). Cambridge, MA: AAAI Press/MIT Press.
Fayyad, U. M. (1994) Branching on attribute values in decision tree generation. In proceedings of Twelfth National Conference on Artificial Intelligence. (pp 601–606). AAAI Press/MIT Press.
File, P. E., Dugard P. I., & Houston, A. S. (1994) Evaluation of the use of induction in the development of a medical expert system. Computers and Biomedical Research 27, 383–395.
Gams, M., & Petkovsek, M. (1988) Learning from examples in the presence of noise. In proceedings of Eighth International Workshop Expert Systems and Their Applications. (pp 609–624). Avignon, France.
Gascuel, O., & Caraux, G. (1992) Statistical significance in inductive learning. In proceedings of the Tenth European Conference on Artificial Intelligence ECAI 92. (pp 435–439). Vienne, Austria.
Gelfand, S. B., Ravishankar, C. S., & Delp, E. J. (1991) An iterative growing and pruning algorithm for classification tree design. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(2), 163–174.
Goodman, R. M. F., & Smyth, P. (1988) Information-theoretic rule induction. In proceedings of the Eighth European Conference on Artificial Intelligence ECAI 88. (pp 357–362). München, Germany.
Hart, A. (1984) Experience in the use of an inductive system in knowledge engineering. In M. Bramer (Ed.), Research and development in expert systems. Cambridge University Press.
Jalbert, P., Jalbert, H., & Sele, B. (1988) Types of imbalances in human reciprocal translocations: risks at birth. The cytogenetics of mammalian rearrangements, Alan R. Liss. 267–291.
Janssen, F., Schachner, J., Hubbard, J., & Hartman, J. (1987) The risk of deep venous thrombosis: a computerized epidemiologic approach. Surg. Am.
Kern, J., Dezelic, G., Dürrigl, T., & Vuletic, S. (1993) Medical decision making based on inductive learning method. Artificial Intelligence in Medicine 5, 213–223.
Kira, K., & Rendell, L. (1992) A practical approach to feature selection. In Proceedings of the International Conference on Machine Learning. (pp 249–256). Aberdeen, D. Sleeman & P. Edwards (Eds). Morgan Kaufmann.
Kononenko, I. (1994) Estimating attributes: analysis and extensions of RELIEF. In Proceedings of European Conference on Machine Learning ECML 94. (pp 171–182). Catania (Italy), F. Bergadano & L De Raedt (Eds.). Lecture notes in artificial intelligence. N∘ 784. Springer-Verlag.
Kononenko, I. (1995) On biases in estimating multi-valued attributes. In proceedings of the Fourteenth International Joint Conference on Artificial Intelligence IJCAI 95. (pp 1034–1040). Montréal, Canada.
Liu, W. Z., & White, A. P. (1994) The importance of attribute selection measures in decision tree induction. Machine Learning 15, 25–41.
Lopez de Mantaras, R. (1991) A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92.
Marshall, R. (1986) Partitioning methods for classification and decision making in medicine. Statistics in Medicine 5, 517–526.
Mingers, J. (1986) Expert systems — experiments with rule induction. Journal of the Operational Research Society 37(11), 1031–1037.
Mingers, J. (1989) An empirical comparison of selection measures for decision-tree induction. Machine Learning 3, 319–342.
Mingers, J. (1989) An empirical comparison of pruning methods for decision-tree induction. Machine Learning 4, 227–243.
Niblett, T. (1987) Constructing decision trees in noisy domains. In Proceedings of 2nd European Working Sessions on Learning EWSL 87. (pp 67–78). Bled (Yugoslavia), Sigma Press. Wilmslow.
Quinlan, J. R. (1986) Induction of decision trees. Machine Learning 1, 81–106.
Quinlan, J. R., & Rivest, R. L. (1989) Inferring decision trees using the minimum description length principle. Information and Computation 80(3), 227–248.
Quinlan J. R. (1993) C4.5 Programs for Machine Learning. San Mateo, CA. Morgan Kaufmann.
Rockafellar, R. T. (1970) Convex analysis. Princeton University Press. Princeton. New Jersey.
Safavian, S. R., & Landgrebe, D. (1991) A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21(3), 660–674.
Schaffer, C. (1993) Overfitting avoidance as bias. Machine Learning 10, 153–178.
Taylor, C. C., Michie D., & Spiegelhalter, D. J. (1994) Machine learning, neural and statistical classification. Ellis Horwood Series in Artificial Intelligence.
Wallace, C. S., & Patrick, J. D. (1993) Coding decision trees. Mach. Learn.11, 7–22.
White, A. P., & Liu, W. Z. (1994) Bias in Information-Based Measures in Decision Tree Induction. Machine Learning 15, 321–329.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crémilleux, B., Robert, C. (1997). A theoretical framework for decision trees in uncertain domains: Application to medical data sets. In: Keravnou, E., Garbay, C., Baud, R., Wyatt, J. (eds) Artificial Intelligence in Medicine. AIME 1997. Lecture Notes in Computer Science, vol 1211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029447
Download citation
DOI: https://doi.org/10.1007/BFb0029447
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62709-8
Online ISBN: 978-3-540-68448-0
eBook Packages: Springer Book Archive