Consistency of randomized and finite sized decision tree ensembles

Ahmad, Amir; Halawani, Sami M.; Albidewi, Ibrahim A.

doi:10.1007/s10044-011-0260-8

Consistency of randomized and finite sized decision tree ensembles

Theoretical Advances
Published: 15 December 2011

Volume 17, pages 97–104, (2014)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Amir Ahmad¹,
Sami M. Halawani¹ &
Ibrahim A. Albidewi²

324 Accesses
3 Citations
Explore all metrics

Abstract

Regression via classification (RvC) is a method in which a regression problem is converted into a classification problem. A discretization process is used to covert continuous target value to classes. The discretized data can be used with classifiers as a classification problem. In this paper, we use a discretization method, Extreme Randomized Discretization, in which bin boundaries are created randomly to create ensembles. We present an ensemble method for RvC problems. We show theoretically for a set of problems that if the number of bins is three, the proposed ensembles for RvC perform better than RvC with the equal-width discretization method. We use these results to show that infinite-sized ensembles, consisting of finite-sized decision trees, created by a pure randomized method (split points are created randomly), are not consistent. We also theoretically show, using a set of regression problems, that the performance of these ensembles is dependent on the size of member decision trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ahmad A (2010) Data transformation for decision tree ensembles, Ph.D. thesis, School of Computer Science, University of Manchester, Manchester
Bartlett PL, Traskin M (2007) Adaboost is consistent. J Mach Learn Res 8:2347–2368
Google Scholar
Biau G (2010) Analysis of a random forests model, Technical report, Universit Paris
Gerard Biau, Luc Devroye (2008) Consistency of random forests and other averaging classifiers. J Mach Learning Res 9:2015–2033
MATH MathSciNet Google Scholar
Bishop CM (2008) Pattern recognition and machine learning. Springer-Verlag, New York
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH MathSciNet Google Scholar
Breiman L (2000) Some infinite theory for predictor ensembles. Technical Report 577, Statistics Department, University of California, Berkeley
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923
Article Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. Proc Conf Multiple Classifier Syst 1857:1–15
Article Google Scholar
Dougherty J, Kahavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the twelth international conference on machine learning, California
Fan W, McCloskey J, Yu PS (2006) A general framework for accurate and fast regression by data summarization in random decision trees. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 136–146
Fan W, Wang H, Yu PS, Ma S (2003) Is random model better? on its accuracy and efficiency. In: Proceedings of third IEEE international conference on data mining (ICDM2003), pp 51–58
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MATH MathSciNet Google Scholar
Fumera G, Roli F, Serrau A (2008) A theoretical analysis of bagging as a linear combination of classifiers. IEEE Transact Pattern Anal Mach Intell 30(7):1293–1299
Article Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article MATH Google Scholar
Gyorfi L, Lugosi G, Devroye L (1996) A probabilistic theory of pattern recognition, Springer, Berlin
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Transact Pattern Anal Mach Intell 12(10):993–1001
Article Google Scholar
Ho TK (1998) The Random subspace method for constructing decision forests. IEEE Transact Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Indurkhya N, Weiss SM (2001) Solving regression problems with rule-based ensemble classifiers, ACM international conference knowledge discovery and data mining (KDD01), pp 287–292
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, Hoboken
Lin Y, Jeon y (2006) Random forests and adaptive Neighbors. J Am Stat Assoc474 (101): 578–590
Article MathSciNet Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
Torgo L, Gama J (1996) Regression by classification. Advances in Artificial Intelligence, pp 51–60
Torgo L, Gama J (1997) Regression using classification algorithms. Intell Data Anal 4(1):275–292
Article Google Scholar
Torgo L, Gama J (1997) Search-based Class Discretization, Proceedings of the 9th European Conference on Machine Learning, pp 266–273
Tumer K, Ghosh J (1996) Correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

Download references

Author information

Authors and Affiliations

Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh, Saudi Arabia
Amir Ahmad & Sami M. Halawani
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Ibrahim A. Albidewi

Authors

Amir Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Sami M. Halawani
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim A. Albidewi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Ahmad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmad, A., Halawani, S.M. & Albidewi, I.A. Consistency of randomized and finite sized decision tree ensembles. Pattern Anal Applic 17, 97–104 (2014). https://doi.org/10.1007/s10044-011-0260-8

Download citation

Received: 05 May 2011
Accepted: 29 November 2011
Published: 15 December 2011
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10044-011-0260-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistency of randomized and finite sized decision tree ensembles

Abstract

Access this article

Similar content being viewed by others

Ensembles for multi-target regression with random output selections

Ensemble of randomized soft decision trees for robust classification

Building an Ensemble of Classifiers via Randomized Models of Ensemble Members

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Consistency of randomized and finite sized decision tree ensembles

Abstract

Access this article

Similar content being viewed by others

Ensembles for multi-target regression with random output selections

Ensemble of randomized soft decision trees for robust classification

Building an Ensemble of Classifiers via Randomized Models of Ensemble Members

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation