Performance of global–local hybrid ensemble versus boosting and bagging ensembles

Baumgartner, Dustin; Serpen, Gursel

doi:10.1007/s13042-012-0094-8

Performance of global–local hybrid ensemble versus boosting and bagging ensembles

Original Article
Published: 25 April 2012

Volume 4, pages 301–317, (2013)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Dustin Baumgartner¹ &
Gursel Serpen¹

296 Accesses
16 Citations
Explore all metrics

Abstract

This study compares the classification performance of a hybrid ensemble, which is called the global–local hybrid ensemble that employs both local and global learners against data manipulation ensembles including bagging and boosting variants. A comprehensive simulation study is performed on 46 UCI machine learning repository data sets using prediction accuracy and SAR performance metrics and along with rigorous statistical significance tests. Simulation results for comparison of classification performances indicate that global–local hybrid ensemble outperforms or ties with bagging and boosting ensemble variants in all cases. This suggests that the global–local ensemble has a more robust performance profile since its performance is less sensitive to variation with respect to the problem domain, or equivalently the data sets. This performance robustness is realized at the expense of increased complexity of the global–local ensemble since at least two types of learners, e.g. one global and another one local, must be trained. A complementary diversity analysis of global–local hybrid ensemble and base learners used for bagging and boosting ensembles on select data sets in the classifier projection space provides both an explanation and support for the performance related findings of this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards an optimally pruned classifier ensemble

Article 02 November 2014

Ensemble Method Combination: Bagging and Boosting

Top-k Parametrized Boost

References

Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Google Scholar
Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Baumgartner D, Serpen G (2009) Large experiment and evaluation tool for WEKA classifiers. In: 5th international conference on data mining. Las Vegas, pp 340–346
Baumgartner D, Serpen G (2012) A design heuristic for hybrid ensembles. Intell Data Anal 16(2):233–246
Google Scholar
Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180
Article Google Scholar
Battista B, Fumera G, Roli F (2010) Multiple classifier systems for robust classifier design in adversarial environments. Int J Mach Learn Cybern 1:27–41
Article Google Scholar
Bian S, Wang W (2007) On diversity and accuracy of homogeneous and heterogeneous ensembles. Int J Hybrid Intell Syst 4:103–128
MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MathSciNet MATH Google Scholar
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6:5–20
Article Google Scholar
Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the 21st international conference on machine learning, pp 137–144
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, pp 69–78
Canuto AM, Abreu MC, Oliveira LM, Xavier JC, Santos AM (2007) Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognit Lett 28:472–486
Article Google Scholar
Cawley GC, Talbot NL (n.d.) Miscellaneous Matlab Software. http://theoval.sys.uea.ac.uk/~gcc/matlab/default.html. Accessed January 2009
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. Lect Notes Comput Sci 1857:1–15
Article Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64
Article MathSciNet MATH Google Scholar
Dzeroski S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54:255–273
Article MATH Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11:56–92
Article Google Scholar
Hand DJ, Vinciotti V (2003) Local versus global models for classification problems: fitting models where it matters. Am Stat 57(2):124–131
Article MathSciNet Google Scholar
Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 571–595
Kotsiantis SB, Pintelas PE (2004) A hybrid decision support tool—using ensemble of classifiers. Int Conf Enterp Inf Syst (ICEIS) 2:448–456
Google Scholar
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship to ensemble accuracy. Mach Learn 51:181–207
Article MATH Google Scholar
Kuncheva LI (2003) That elusive diversity in classifier ensembles. Lect Notes Comput Sci 2652:1126–1138
Article Google Scholar
Luengo J, Garcia S, Herra F (2007) A study on the use of statistical tests for experimentation with neural networks. In: Proceedings of the 9th international work-conference on artificial neural networks. Lecture notes on computer science, vol 4507, pp 72–79
Mitchell TM (1997) Machine learning. McGraw-Hill, NY
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198
MATH Google Scholar
Ott RL, Longnecker M (2001) An introduction to statistical methods and data analysis, 5th edn. Duxbury, Pacific Grove
Google Scholar
Pekalska E, Duin RP, Skurichina M (2002) A discussion on the classifier projection space for classifier combining. In: Roli F, Kittler J (eds) 3rd international workshop on multiple classifier systems, MCS02, vol 2364. Springer, Cagliari, pp 137–148
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Article Google Scholar
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215
Article MATH Google Scholar
Quinlan JR (1996) Bagging, boosting, and C4.5. In: Proceedings of the 13th national conference on artificial intelligence, pp 725–730
Ricci F, Aha DW (1998) Error-correcting output codes for local learners. In: 10th european conference on machine learning, ECML. Springer, Berlin, pp 280–291
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409
Article Google Scholar
Seewald K, Furnkranz J (2001) An evaluation of grading classifiers. In: Proceedings of the 4th international conferences on advances in intelligent data analysis, pp 115–124
Seewald K (2002) How to make stacking better and faster while also taking care of an unknown weakness. In: Proceedings of the nineteenth international conference on machine learning, pp 554–561
Skalak D (1996) The sources of increased accuracy for two proposed boosting algorithms. In: AAAI ’96 workshop on integrating multiple learned models for improving and scaling machine learning algorithms
Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271
Article Google Scholar
Witten H, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–260
Article MathSciNet Google Scholar
Yates WB, Patridge D (1996) Use of methodological diversity to improve neural network generalization. Neural Comput Appl 4(2):114–128
Article Google Scholar
Zhiwen Y, Zhongkai D, Wong HS, Tan L (2010) Identifying protein kinase-specific phosphorylation sites based on the bagging-Adaboost ensemble approach. IEEE Trans Nanobiosci 9(2):132–143
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering and Computer Science Department, University of Toledo, Toledo, USA
Dustin Baumgartner & Gursel Serpen

Authors

Dustin Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
Gursel Serpen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gursel Serpen.

Appendix: Classification performance for data manipulation ensembles

See Tables 5, 6, 7 and 8.

Table 5 Prediction accuracy performance of the bagging ensembles

Full size table

Table 6 SAR performance of the bagging ensembles

Full size table

Table 7 Prediction accuracy performance of the AdaBoost ensembles

Full size table

Table 8 SAR performance of the AdaBoost ensembles

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baumgartner, D., Serpen, G. Performance of global–local hybrid ensemble versus boosting and bagging ensembles. Int. J. Mach. Learn. & Cyber. 4, 301–317 (2013). https://doi.org/10.1007/s13042-012-0094-8

Download citation

Received: 05 March 2012
Accepted: 10 April 2012
Published: 25 April 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s13042-012-0094-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance of global–local hybrid ensemble versus boosting and bagging ensembles

Abstract

Access this article

Similar content being viewed by others

Towards an optimally pruned classifier ensemble

Ensemble Method Combination: Bagging and Boosting

Top-k Parametrized Boost

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Classification performance for data manipulation ensembles

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance of global–local hybrid ensemble versus boosting and bagging ensembles

Abstract

Access this article

Similar content being viewed by others

Towards an optimally pruned classifier ensemble

Ensemble Method Combination: Bagging and Boosting

Top-k Parametrized Boost

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Classification performance for data manipulation ensembles

Appendix: Classification performance for data manipulation ensembles

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation