Abstract
Decision trees, a popular choice for classification, have their limitation in providing probability estimates, requiring smoothing at the leaves. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. In this work, we show that an ensemble of decision trees significantly improves the quality of the probability estimates produced at the decision tree leaves. The ensemble overcomes the myopia of the leaf frequency based estimates. We show the effectiveness of the probabilistic decision trees as a part of the Predictive Uncertainty Challenge. We also include three additional highly imbalanced datasets in our study. We show that the ensemble methods significantly improve not only the quality of the probability estimates but also the AUC for the imbalanced datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, P.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36(1,2) (1999)
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. epartment of Information and Computer Sciences, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html
Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30(6), 1145–1159 (1997)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling TEchnique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Cohen, W.: Learning to Classify English Text with ILP Methods. In: Proceedings of the 5th International Workshop on Inductive Logic Programming, pp. 3–24. Department of Computer Science, Katholieke Universiteit Leuven (1995)
Cussents, J.: Bayes and pseudo-bayes estimates of conditional probabilities and their reliabilities. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667. Springer, Heidelberg (1993)
Draper, B., Baek, K.: Bagging in computer vision. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 144–149 (1998)
Dietterich, T.: An empirical comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 40(2), 139–157 (2000)
Domingos, P.: Why does bagging work? a bayesian account and its implications. In: Proceedings of Third International Conference Knowledge Discovery and Data Mining, pp. 155–158 (1997)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155 (1998)
Ezawa, J.K., Singh, M., Norton, W.S.: Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management. In: Proceedings of the International Conference on Machine Learning, ICML 1996, Bari, Italy, pp. 139–147. Morgan Kauffman, San Francisco (1996)
Fayyad, U., Kohavi, R.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Fawcett, T., Provost, F.: Combining Data Mining and Machine Learning for Effective User Profile. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 8–13. AAAI, Menlo Park (1996)
De Groot, M., Fienberg, S.: The Comparison and Evaluation of Forecasters. Statistician 32, 12–22 (1983)
Ho, T.K.: The random subspace method for constructing decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5) (2002)
Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)
Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)
Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Mladenić, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and Naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning, pp. 258–267. Morgan Kaufmann, San Francisco (1999)
Minsky, M.: Logical versus analogical, symbolic versus connectionist, neat versus scruffy. AI Magazine 12 (1991)
Provost, F., Domingos, P.: Tree induction for probability-based rankings. Machine Learning 52(3) (2003)
Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, Madison, WI, pp. 445–453. Morgan Kauffmann, San Francisco (1998)
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–215 (1994)
Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de
Quinlan, J.R.: Simplifying decision trees. International Journal of Man Machine Studies 27, 227–248 (1987)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)
Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)
Sylvester, J., Chawla, N.V.: Evolutionary ensembles: Combining learning agents using genetic algorithms. In: AAAI Workshop on Multiagent Learning, pp. 46–51 (2005)
Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proceedings of the Twelth International Conference on Machine Learning, pp. 506–514 (1995)
Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, P.: Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography. International Journal of Pattern Recognition and Artificial Intelligence 7(6), 1417–1436 (1993)
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chawla, N.V. (2006). Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds) Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. MLCW 2005. Lecture Notes in Computer Science(), vol 3944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11736790_4
Download citation
DOI: https://doi.org/10.1007/11736790_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33427-9
Online ISBN: 978-3-540-33428-6
eBook Packages: Computer ScienceComputer Science (R0)