Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees

Chawla, Nitesh V.

doi:10.1007/11736790_4

Nitesh V. Chawla²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3944))

Included in the following conference series:

Machine Learning Challenges Workshop

2345 Accesses
7 Citations

Abstract

Decision trees, a popular choice for classification, have their limitation in providing probability estimates, requiring smoothing at the leaves. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. In this work, we show that an ensemble of decision trees significantly improves the quality of the probability estimates produced at the decision tree leaves. The ensemble overcomes the myopia of the leaf frequency based estimates. We show the effectiveness of the probabilistic decision trees as a part of the Predictive Uncertainty Challenge. We also include three additional highly imbalanced datasets in our study. We show that the ensemble methods significantly improve not only the quality of the probability estimates but also the AUC for the imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, P.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
MATH Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36(1,2) (1999)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. epartment of Information and Computer Sciences, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html
Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30(6), 1145–1159 (1997)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling TEchnique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Cohen, W.: Learning to Classify English Text with ILP Methods. In: Proceedings of the 5th International Workshop on Inductive Logic Programming, pp. 3–24. Department of Computer Science, Katholieke Universiteit Leuven (1995)
Google Scholar
Cussents, J.: Bayes and pseudo-bayes estimates of conditional probabilities and their reliabilities. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667. Springer, Heidelberg (1993)
Google Scholar
Draper, B., Baek, K.: Bagging in computer vision. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 144–149 (1998)
Google Scholar
Dietterich, T.: An empirical comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Domingos, P.: Why does bagging work? a bayesian account and its implications. In: Proceedings of Third International Conference Knowledge Discovery and Data Mining, pp. 155–158 (1997)
Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155 (1998)
Google Scholar
Ezawa, J.K., Singh, M., Norton, W.S.: Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management. In: Proceedings of the International Conference on Machine Learning, ICML 1996, Bari, Italy, pp. 139–147. Morgan Kauffman, San Francisco (1996)
Google Scholar
Fayyad, U., Kohavi, R.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Google Scholar
Fawcett, T., Provost, F.: Combining Data Mining and Machine Learning for Effective User Profile. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 8–13. AAAI, Menlo Park (1996)
Google Scholar
De Groot, M., Fienberg, S.: The Comparison and Evaluation of Forecasters. Statistician 32, 12–22 (1983)
Article Google Scholar
Ho, T.K.: The random subspace method for constructing decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5) (2002)
Google Scholar
Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)
Article Google Scholar
Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)
Article MATH Google Scholar
Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Google Scholar
Mladenić, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and Naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning, pp. 258–267. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Minsky, M.: Logical versus analogical, symbolic versus connectionist, neat versus scruffy. AI Magazine 12 (1991)
Google Scholar
Provost, F., Domingos, P.: Tree induction for probability-based rankings. Machine Learning 52(3) (2003)
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, Madison, WI, pp. 445–453. Morgan Kauffmann, San Francisco (1998)
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–215 (1994)
Google Scholar
Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de
Quinlan, J.R.: Simplifying decision trees. International Journal of Man Machine Studies 27, 227–248 (1987)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)
Google Scholar
Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)
Google Scholar
Sylvester, J., Chawla, N.V.: Evolutionary ensembles: Combining learning agents using genetic algorithms. In: AAAI Workshop on Multiagent Learning, pp. 46–51 (2005)
Google Scholar
Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proceedings of the Twelth International Conference on Machine Learning, pp. 506–514 (1995)
Google Scholar
Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, P.: Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography. International Journal of Pattern Recognition and Artificial Intelligence 7(6), 1417–1436 (1993)
Article Google Scholar
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Nitesh V. Chawla

Authors

Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Max Planck Institute for Biological Cybernetics, Spemannstr. 38, Tübingen, Germany
Joaquin Quiñonero-Candela
Bar Ilan University, 52900, Ramat Gan, Israel
Ido Dagan
ITC-IRST, Trento, Italy
Bernardo Magnini
Université d’Evry-Val d’Essonne, IBISC CNRS FRE 2873 and GENPOLE, 523, Place des terrasses, 91000, Evry, France
Florence d’Alché-Buc

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chawla, N.V. (2006). Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds) Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. MLCW 2005. Lecture Notes in Computer Science(), vol 3944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11736790_4

Download citation

DOI: https://doi.org/10.1007/11736790_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33427-9
Online ISBN: 978-3-540-33428-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics