Abstract
We propose an algorithm to build decision trees when the observed data are probability distributions. This is of interest when one deals with massive database or with probabilistic models. We illustrate our method with a dataset describing districts of Great Britain. Our decision tree yields rules which explain the unemployment rate.
The decision tree in our case is built by replacing the test X > α, which is used to split the nodes in the usual case of real numbers, by the test P(X > α) < β, where α and β are determined through an algorithm based on probabilistic split evaluation criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K. Alsabti, S. Ranka, and V. Singh. Clouds: A decision tree classifier for large datasets. In KDD’98, Août 1998.
A. Baccini and A. Pousse. Point de vue unitaire de la segmentation. quelques conséquences. CRAS, A(280):241, Janvier 1975.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression trees. Wadsworth and brooks, 1984.
M. Chavent. Analyse des données symboliques: une méthode divisive declassification. PhD thesis, Université Paris 9 Dauphine, 1998.
E. Diday. Introduction à l’approche symbolique en analyse des données. Cahier du CEREMADE, univ. Paris Dauphine, N. 8823, 1988.
R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. In KDD’96, 1996.
J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1), 1986.
Schweizer. Distribution functions: numbers of future. In Mathematics of fuzzy systems, pages 137–149. 2nd Napoli meeting, 1985.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aboa, JP., Emilion, R. (2000). Decision trees for probabilistic data. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_39
Download citation
DOI: https://doi.org/10.1007/3-540-44466-1_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive