Bounds on the moments for an ensemble of random decision trees

Dhurandhar, Amit

doi:10.1007/s10115-014-0768-5

Bounds on the moments for an ensemble of random decision trees

Regular Paper
Published: 19 July 2014

Volume 44, pages 279–298, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Amit Dhurandhar¹

150 Accesses
1 Citation
Explore all metrics

Abstract

An ensemble of random decision trees is a popular classification technique, especially known for its ability to scale to large domains. In this paper, we provide an efficient strategy to compute bounds on the moments of the generalization error computed over all datasets of a particular size drawn from an underlying distribution, for this classification technique. Being able to estimate these moments can help us gain insights into the performance of this model. As we will see in the experimental section, these bounds tend to be significantly tighter than the state-of-the-art Breiman’s bounds based on strength and correlation and hence more useful in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Entropy Based Ensemble Classifiers

Combining Predictions Under Uncertainty: The Case of Random Decision Trees

Bootstrap bias corrections for ensemble methods

Article 30 November 2016

Giles Hooker & Lucas Mentch

Notes

These probabilities and \( P\left[ Y(x)\!\ne \!y \right] \) are conditioned on \(x\). We omit explicitly writing the conditional since it improves readability and is obvious from the context.
For further details refer to [3] and [5].
This is after splitting the continuous attributes.
Partitioned into 3 categories high, medium and low.

References

Anandkumar A, Foster D, Hsu D, Kakade S, Liu Y (2012) A spectral algorithm for latent dirichlet allocation. In: NIPS. Lake Tahoe, USA, pp 926–934
Boots B, Gordon G (2012) Two manifold problems with applications to nonlinear system identification. In: ICML. Edinburgh, Scotland, UK, p 338
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Bshouty N, Long P (2010) Finding planted partitions in nearly linear time using arrested spectral clustering. In: ICML. Haifa, Israel, pp 135–142
Buttrey S, Kobayashi I (2003) On strength and correlation in random forests. In : Proceedings of the 2003 joint statistical meetings, section on statistical computing
Connor-Linton J (2003) Chi square tutorial. http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html
Dhurandhar A, Dobra A (2008) Probabilistic characterization of random decision trees. J Mach Learn Res 9:2321–2348
Google Scholar
Dhurandhar A, Dobra A (2009) Semi-analytical method for analyzing models and model selection measures based on moment analysis. ACM Trans Knowl Discov Data Min
Dhurandhar A, Dobra A (2012) Distribution free bounds for relational classification. Knowl Inf Syst
Dhurandhar A, Dobra A (2012) Probabilistic characterization of nearest neighbor classifiers. Int J Mach Learn Cybern
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York, p 654
Fan W, Wang H, Yu PS, Ma S (2003) Is random model better? On its accuracy and efficiency. In: ICDM ’03: proceedings of the third IEEE international conference on data mining, IEEE Computer Society, Washington, DC, USA, pp 51–58
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) Elements of statistical learning, 2nd edn. Springer, Berlin
Book Google Scholar
Langford John (December 2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306
Google Scholar
Liu F, Ting K, Fan W (2005) Maximizing tree diversity by building complete-random decision trees. In: PAKDD, pp 605–610
McAllester D (1999) Pac-bayesian model averaging. In: Proceedings of the twelfth annual conference on computational learning theory. ACM Press, pp 164–170
Mcallester D (2003) Simplified pac-bayesian margin bounds. In COLT, pp 203–215
Roy S, Bose R (1953) Simultaneous confidence interval estimation. Ann Math Stat 24(3):513–536
Article Google Scholar
Sison C, Glaz J (1995) Simultaneous confidence intervals and sample size determination for multinomial proportions. JASA 90(429):366–369
Article Google Scholar
Tong Y (1980) Probabilistic inequalities for multivariate distributions, 1st edn. Academic Press, Waltham
Google Scholar
Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14(3):299–326
Article Google Scholar
Zhang X, Yuan Q, Zhao S, Fan W, Zheng W, Wang Z (2010) Multi-label classification without the multi-label cost. In: SDM ’10: proceedings of the siam conference on data mining, pp 778–789

Download references

Acknowledgments

I would like to thank the editor and the anonymous reviewers for their constructive comments. I would also like to thank Katherine Dhurandhar for proofreading the paper.

Author information

Authors and Affiliations

IBM T.J. Watson, Yorktown Heights, NY, USA
Amit Dhurandhar

Authors

Amit Dhurandhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Dhurandhar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhurandhar, A. Bounds on the moments for an ensemble of random decision trees. Knowl Inf Syst 44, 279–298 (2015). https://doi.org/10.1007/s10115-014-0768-5

Download citation

Received: 17 September 2013
Revised: 04 May 2014
Accepted: 03 July 2014
Published: 19 July 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10115-014-0768-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bounds on the moments for an ensemble of random decision trees

Abstract

Access this article

Similar content being viewed by others

Cross-Entropy Based Ensemble Classifiers

Combining Predictions Under Uncertainty: The Case of Random Decision Trees

Bootstrap bias corrections for ensemble methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bounds on the moments for an ensemble of random decision trees

Abstract

Access this article

Similar content being viewed by others

Cross-Entropy Based Ensemble Classifiers

Combining Predictions Under Uncertainty: The Case of Random Decision Trees

Bootstrap bias corrections for ensemble methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation