Skip to main content

An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise

  • Conference paper
Book cover Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5590))

Abstract

Decision trees are simple structures used in supervised classification learning. The results of the application of decision trees in classification can be notably improved using ensemble methods such as Bagging, Boosting or Randomization, largely used in the literature. Bagging outperforms Boosting and Randomization in situations with classification noise. In this paper, we present an experimental study of the use of different simple decision tree methods for bagging ensemble in supervised classification, proving that simple credal decision trees (based on imprecise probabilities and uncertainty measures) outperforms the use of classical decision tree methods for this type of procedure when they are applied on datasets with classification noise.

This work has been jointly supported by the Spanish Ministry of Education and Science under project TIN2007-67418-C03-03 and by European Regional Development Fund (FEDER); and FPU scholarship programme (AP2004-4678).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abellán, J.: Uncertainty measures on probability intervals from Imprecise Dirichlet model. Int. J. General Systems 35(5), 509–528 (2006)

    Article  MATH  Google Scholar 

  2. Abellán, J., Moral, S.: Maximum entropy for credal sets. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 11(5), 587–597 (2003)

    Article  MATH  Google Scholar 

  3. Abellán, J., Moral, S.: Building classification trees using the total uncertainty criterion. Int. J. of Intelligent Systems 18(12), 1215–1225 (2003)

    Article  MATH  Google Scholar 

  4. Abellán, J., Moral, S.: An algorithm that computes the upper entropy for order-2 capacities. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems 14(2), 141–154 (2006)

    Article  MATH  Google Scholar 

  5. Abellán, J., Moral, S.: Upper entropy of credal sets. Applications to credal classification. Int. J. of Approximate Reasoning 39(2-3), 235–255 (2005)

    Article  MATH  Google Scholar 

  6. Abellán, J., Klir, G.J., Moral, S.: Disaggregated total uncertainty measure for credal sets. Int. J. of General Systems 35(1), 29–44 (2006)

    Article  MATH  Google Scholar 

  7. Bernard, J.M.: An introduction to the imprecise Dirichlet model for multinomial data. Int. J. of Approximate Reasoning 39, 123–150 (2005)

    Article  MATH  Google Scholar 

  8. Breiman, L., Friedman, J.H., Olshen, R.A., Stone. C.J.: Classification and Regression Trees. Wadsworth Statistics, Probability Series, Belmont (1984)

    Google Scholar 

  9. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  Google Scholar 

  10. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  11. Demsar, J.: Statistical Comparison of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MATH  Google Scholar 

  12. Dietterich, T.G.: An Experimental Comparison of Three Methods for Constucting Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157 (2000)

    Article  Google Scholar 

  13. Fayyad, U.M., Irani, K.B.: Multi-valued interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  14. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Thirteenth International Conference on Machine Learning, San Francisco, pp. 148–156 (1996)

    Google Scholar 

  15. Friedman, M.: The use of rank to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 675–701 (1937)

    Article  Google Scholar 

  16. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics 11, 86–92 (1940)

    Article  MATH  Google Scholar 

  17. Nemenyi, P.B.: Distribution-free multiple comparison. PhD thesis, Princenton University (1963)

    Google Scholar 

  18. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  19. Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning (1993)

    Google Scholar 

  20. Salzberg, S.L.: On comparison classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1, 317–328 (1997)

    Article  Google Scholar 

  21. Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)

    Article  MATH  Google Scholar 

  22. Sheskin, D.J.: Handbook of parametric and nonparametric statistical procedures. Chapman & Hall/CRC, Boca Raton (2000)

    MATH  Google Scholar 

  23. Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)

    Book  MATH  Google Scholar 

  24. Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. Roy. Statist. Soc. B 58, 3–57 (1996)

    MATH  Google Scholar 

  25. Wilcoxon, F.: Individual comparison by ranking methods. Biometrics 1, 80–83 (1945)

    Article  Google Scholar 

  26. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abellán, J., Masegosa, A.R. (2009). An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise. In: Sossai, C., Chemello, G. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2009. Lecture Notes in Computer Science(), vol 5590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02906-6_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02906-6_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02905-9

  • Online ISBN: 978-3-642-02906-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics