Abstract
Decision trees are simple structures used in supervised classification learning. The results of the application of decision trees in classification can be notably improved using ensemble methods such as Bagging, Boosting or Randomization, largely used in the literature. Bagging outperforms Boosting and Randomization in situations with classification noise. In this paper, we present an experimental study of the use of different simple decision tree methods for bagging ensemble in supervised classification, proving that simple credal decision trees (based on imprecise probabilities and uncertainty measures) outperforms the use of classical decision tree methods for this type of procedure when they are applied on datasets with classification noise.
This work has been jointly supported by the Spanish Ministry of Education and Science under project TIN2007-67418-C03-03 and by European Regional Development Fund (FEDER); and FPU scholarship programme (AP2004-4678).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abellán, J.: Uncertainty measures on probability intervals from Imprecise Dirichlet model. Int. J. General Systems 35(5), 509–528 (2006)
Abellán, J., Moral, S.: Maximum entropy for credal sets. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 11(5), 587–597 (2003)
Abellán, J., Moral, S.: Building classification trees using the total uncertainty criterion. Int. J. of Intelligent Systems 18(12), 1215–1225 (2003)
Abellán, J., Moral, S.: An algorithm that computes the upper entropy for order-2 capacities. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 14(2), 141–154 (2006)
Abellán, J., Moral, S.: Upper entropy of credal sets. Applications to credal classification. Int. J. of Approximate Reasoning 39(2-3), 235–255 (2005)
Abellán, J., Klir, G.J., Moral, S.: Disaggregated total uncertainty measure for credal sets. Int. J. of General Systems 35(1), 29–44 (2006)
Bernard, J.M.: An introduction to the imprecise Dirichlet model for multinomial data. Int. J. of Approximate Reasoning 39, 123–150 (2005)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone. C.J.: Classification and Regression Trees. Wadsworth Statistics, Probability Series, Belmont (1984)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Demsar, J.: Statistical Comparison of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constucting Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157 (2000)
Fayyad, U.M., Irani, K.B.: Multi-valued interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Mateo (1993)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Thirteenth International Conference on Machine Learning, San Francisco, pp. 148–156 (1996)
Friedman, M.: The use of rank to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 675–701 (1937)
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics 11, 86–92 (1940)
Nemenyi, P.B.: Distribution-free multiple comparison. PhD thesis, Princenton University (1963)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning (1993)
Salzberg, S.L.: On comparison classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1, 317–328 (1997)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Sheskin, D.J.: Handbook of parametric and nonparametric statistical procedures. Chapman & Hall/CRC, Boca Raton (2000)
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)
Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. Roy. Statist. Soc. B 58, 3–57 (1996)
Wilcoxon, F.: Individual comparison by ranking methods. Biometrics 1, 80–83 (1945)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abellán, J., Masegosa, A.R. (2009). An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise. In: Sossai, C., Chemello, G. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2009. Lecture Notes in Computer Science(), vol 5590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02906-6_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-02906-6_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02905-9
Online ISBN: 978-3-642-02906-6
eBook Packages: Computer ScienceComputer Science (R0)