An extensive empirical comparison of ensemble learning methods for binary classification

Narassiguin, Anil; Bibimoune, Mohamed; Elghazel, Haytham; Aussem, Alex

doi:10.1007/s10044-016-0553-z

An extensive empirical comparison of ensemble learning methods for binary classification

Short Paper
Published: 20 May 2016

Volume 19, pages 1093–1128, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Anil Narassiguin^1,2,
Mohamed Bibimoune¹,
Haytham Elghazel¹ &
…
Alex Aussem¹

799 Accesses
12 Citations
Explore all metrics

Abstract

We present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, including Boosting, Bagging, Random Forests, Rotation Forests, Arc-X4, Class-Switching and their variants, as well as more recent techniques like Random Patches. These algorithms were compared against each other in terms of threshold, ranking/ordering and probability metrics over nineteen UCI benchmark data sets with binary labels. We also examine the influence of two base learners, CART and Extremely Randomized Trees, on the bias–variance decomposition and the effect of calibrating the models via Isotonic Regression on each performance metric. The selected data sets were already used in various empirical studies and cover different application domains. The source code and the detailed results of our study are publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman & Hall/CRC, Boca Raton
Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139
Article Google Scholar
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the ICML, pp 161–168
Chen N, Ribeiro B, Chen A (2015) Comparative study of classifier ensembles for cost-sensitive credit risk assessment. Intell Data Anal 19(1):127–144
Google Scholar
Zhang C, Zhang J (2008) Rotboost: a technique for combining rotation forest and adaboost. Pattern Recognit Lett 29(10):1524–1536
Article Google Scholar
Rodríguez JJ, Kuncheva L, Alonso CJ (2006) A rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Article Google Scholar
Louppe G, Geurts P (2012) Ensembles on random patches. In: Proceedings of the ECML/PKDD, pp 346–361
Geurts P, Ernst D, Wehenkel W (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article MATH Google Scholar
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the ICML, pp 625–632
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. In: Wadsworth
Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Hernández-Lobato D, Martínez-Muñoz G, Suárez A (2013) How large should ensembles of classifiers be? Pattern Recognit 46(5):1323–1336
Article MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MathSciNet MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MathSciNet MATH Google Scholar
Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet MATH Google Scholar
Shivaswamy PK, Jebara T (2011) Variance penalizing adaboost. In: Proceedings of the NIPS, pp 1908–1916
Breiman L (1996) Bias, variance, and arcing classifiers. Statistics Department, University of California at Berkeley, Berkeley
Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 1998:28
MathSciNet MATH Google Scholar
Breiman L (2000) Randomizing outputs to increase prediction accuracy. Mach Learn 40(3):229–242
Article MATH Google Scholar
Martínez-Muñoz G, Suárez A (2005) Switching class labels to generate classification ensembles. Pattern Recognit 38(10):1483–1494
Article Google Scholar
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Kong EB, Dietterich TG (1995) Error-correcting output coding corrects bias and variance. In: Proceedings of the ICML, pp 313–321
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the KDD, pp 69–78
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the ICML, pp 609–616
Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A (2008) Advancing feature selection research—ASU feature selection repository. Technical report. Arizona State University
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, Irvine
Google Scholar
Ben-Dor A, Bruhn L, Laboratories A, Friedmann N, Schummer M, Nachman I, Washington U, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–584
Article Google Scholar
Golub R, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Schummer M, Ng WV, Bumgarnerd RE (1999) Comparative hybridization of an array of 21,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas. Gene 238(2):375–385
Article Google Scholar
Liu K, Huang D (2008) Cancer classification using rotation forest. Comput Biol Med 38(5):601–610
Article Google Scholar
Slonim DK, Tamayo P, Mesirov JP, Golub TR, Lander ES (2000) Class prediction and discovery using gene expression data. In: Proceedings of the fourth annual international conference on computational molecular biology, pp 263–272
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Kuncheva L, Rodríguez JJ (2007) An experimental study on rotation forest ensembles. In: Proceedings of the 7th international workshop of multiple classifier systems (MCS), pp 459–468
Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. In: Proceedings of the ICML, pp 211–218
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58
Article Google Scholar
Kohavi R, Wolpert D (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the ICML, pp 275–283
Domingos P (2000) A unified bias–variance decomposition and its applications. In: Proceedings of the ICML, pp 231–238
James G (2003) Variance and bias for general loss functions. Mach Learn 51(2):115–135
Article MATH Google Scholar
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196
Article Google Scholar
Valentini G, Dietterich TG (2004) Bias–variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775
MathSciNet MATH Google Scholar
Bouckaert RR (2008) Practical bias variance decomposition. In: Proceedings of the Australasian conference on artificial intelligence, pp 247–257

Download references

Author information

Authors and Affiliations

LIRIS UMR CNRS 5205, Université Lyon 1, 69622, Lyon, France
Anil Narassiguin, Mohamed Bibimoune, Haytham Elghazel & Alex Aussem
EASYTRUST, 71 Boulevard National, 92250, La garenne colombes, France
Anil Narassiguin

Authors

Anil Narassiguin
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Bibimoune
View author publications
You can also search for this author in PubMed Google Scholar
Haytham Elghazel
View author publications
You can also search for this author in PubMed Google Scholar
Alex Aussem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haytham Elghazel.

Appendix

This section provides the tables that present the results of the experiments for each ensemble method on each data set for both uncalibrated and calibrated models. Due to space limitation, the tables are presented in landscape form. More specifically, Tables 12, 13 and 14 present the classification accuracies, the AUC and the RMS, respectively, for the uncalibrated models. Tables 15, 16 and 17 present the same results, respectively, for the calibrated models. On the other hand, the differences in performance between methods in terms of win/tie/loss statuses are depicted in Tables 18, 20 and 22 for uncalibrated models in Tables 19, 21 and 23 for calibrated ones. Finally, Fig. 10 displays the relative variations of \(\kappa\) and accuracy when the baseline classification model is changed.

Table 12 Classification accuracy and standard deviation of CART and ensemble methods

Full size table

Table 13 AUC and standard deviation of CART and ensemble methods

Full size table

Table 14 1-RMS and standard deviation of CART and ensemble methods

Full size table

Table 15 Accuracy and standard deviation of calibrated CART and ensemble methods

Full size table

Table 16 AUC and standard deviation of calibrated CART and ensemble methods

Full size table

Table 17 1-RMS and standard deviation of calibrated CART and ensemble methods

Full size table

Table 18 Pairwise t test comparisons of the first group of uncalibrated models in terms of accuracy

Full size table

Table 19 Pairwise t test comparisons of the first group of calibrated models in terms of accuracy

Full size table

Table 20 Pairwise t test comparisons of the first group of uncalibrated models in terms of AUC

Full size table

Table 21 Pairwise t test comparisons of the first group of calibrated models in terms of AUC

Full size table

Table 22 Pairwise t test comparisons of the first group of uncalibrated models in terms of RMS

Full size table

Table 23 Pairwise t test comparisons of the first group of calibrated models in terms of RMS

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narassiguin, A., Bibimoune, M., Elghazel, H. et al. An extensive empirical comparison of ensemble learning methods for binary classification. Pattern Anal Applic 19, 1093–1128 (2016). https://doi.org/10.1007/s10044-016-0553-z

Download citation

Received: 27 March 2015
Accepted: 04 May 2016
Published: 20 May 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s10044-016-0553-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extensive empirical comparison of ensemble learning methods for binary classification

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation