Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3944))

Included in the following conference series:

Abstract

Decision trees, a popular choice for classification, have their limitation in providing probability estimates, requiring smoothing at the leaves. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. In this work, we show that an ensemble of decision trees significantly improves the quality of the probability estimates produced at the decision tree leaves. The ensemble overcomes the myopia of the leaf frequency based estimates. We show the effectiveness of the probabilistic decision trees as a part of the Predictive Uncertainty Challenge. We also include three additional highly imbalanced datasets in our study. We show that the ensemble methods significantly improve not only the quality of the probability estimates but also the AUC for the imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, P.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)

    MATH  Google Scholar 

  2. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36(1,2) (1999)

    Google Scholar 

  3. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. epartment of Information and Computer Sciences, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html

  4. Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30(6), 1145–1159 (1997)

    Article  Google Scholar 

  5. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  Google Scholar 

  6. Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling TEchnique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  7. Cohen, W.: Learning to Classify English Text with ILP Methods. In: Proceedings of the 5th International Workshop on Inductive Logic Programming, pp. 3–24. Department of Computer Science, Katholieke Universiteit Leuven (1995)

    Google Scholar 

  8. Cussents, J.: Bayes and pseudo-bayes estimates of conditional probabilities and their reliabilities. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667. Springer, Heidelberg (1993)

    Google Scholar 

  9. Draper, B., Baek, K.: Bagging in computer vision. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 144–149 (1998)

    Google Scholar 

  10. Dietterich, T.: An empirical comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 40(2), 139–157 (2000)

    Article  Google Scholar 

  11. Domingos, P.: Why does bagging work? a bayesian account and its implications. In: Proceedings of Third International Conference Knowledge Discovery and Data Mining, pp. 155–158 (1997)

    Google Scholar 

  12. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155 (1998)

    Google Scholar 

  13. Ezawa, J.K., Singh, M., Norton, W.S.: Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management. In: Proceedings of the International Conference on Machine Learning, ICML 1996, Bari, Italy, pp. 139–147. Morgan Kauffman, San Francisco (1996)

    Google Scholar 

  14. Fayyad, U., Kohavi, R.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  15. Fawcett, T., Provost, F.: Combining Data Mining and Machine Learning for Effective User Profile. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 8–13. AAAI, Menlo Park (1996)

    Google Scholar 

  16. De Groot, M., Fienberg, S.: The Comparison and Evaluation of Forecasters. Statistician 32, 12–22 (1983)

    Article  Google Scholar 

  17. Ho, T.K.: The random subspace method for constructing decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  18. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5) (2002)

    Google Scholar 

  19. Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)

    Article  Google Scholar 

  20. Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)

    Article  MATH  Google Scholar 

  21. Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)

    Google Scholar 

  22. Mladenić, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and Naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning, pp. 258–267. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  23. Minsky, M.: Logical versus analogical, symbolic versus connectionist, neat versus scruffy. AI Magazine 12 (1991)

    Google Scholar 

  24. Provost, F., Domingos, P.: Tree induction for probability-based rankings. Machine Learning 52(3) (2003)

    Google Scholar 

  25. Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, Madison, WI, pp. 445–453. Morgan Kauffmann, San Francisco (1998)

    Google Scholar 

  26. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–215 (1994)

    Google Scholar 

  27. Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de

  28. Quinlan, J.R.: Simplifying decision trees. International Journal of Man Machine Studies 27, 227–248 (1987)

    Article  Google Scholar 

  29. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)

    Google Scholar 

  30. Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)

    Google Scholar 

  31. Sylvester, J., Chawla, N.V.: Evolutionary ensembles: Combining learning agents using genetic algorithms. In: AAAI Workshop on Multiagent Learning, pp. 46–51 (2005)

    Google Scholar 

  32. Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proceedings of the Twelth International Conference on Machine Learning, pp. 506–514 (1995)

    Google Scholar 

  33. Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, P.: Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography. International Journal of Pattern Recognition and Artificial Intelligence 7(6), 1417–1436 (1993)

    Article  Google Scholar 

  34. Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chawla, N.V. (2006). Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds) Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. MLCW 2005. Lecture Notes in Computer Science(), vol 3944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11736790_4

Download citation

  • DOI: https://doi.org/10.1007/11736790_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33427-9

  • Online ISBN: 978-3-540-33428-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics