An Empirical Methodology to Analyze the Behavior of Bagging

Pinto, Fábio; Soares, Carlos; Mendes-Moreira, João

doi:10.1007/978-3-319-14717-8_16

An Empirical Methodology to Analyze the Behavior of Bagging

Fábio Pinto²²,
Carlos Soares²² &
João Mendes-Moreira²²

Conference paper

3184 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Abstract

In this paper we propose and apply a methodology to study the relationship between the performance of bagging and the characteristics of the bootstrap samples. The methodology consists of 1) an extensive set of experiments to estimate the empirical distribution of performance of the population of all possible ensembles that can be created with those bootstraps and 2) a metalearning approach to analyze that distribution based on characteristics of the bootstrap samples and their relationship with the complete training set. Given the large size of the population of all ensembles, we empirically show that it is possible to apply the methodology to a sample. We applied the methodology to 53 classification datasets for ensembles of 20 and 100 models. Our results show that diversity is crucial for an important bootstrap and we show evidence of a metric that can measure diversity without any learning process involved. We also found evidence that the best bootstraps have a predictive power very similar to the one presented by the training set using naive models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Brazdil, P., Carrier, C.G., Soares, C., Vilalta, R.: Metalearning: Applications to data mining. Springer (2008)
Google Scholar
Brazdil, P.B., Soares, C., Da Costa, J.P.: Ranking learning algorithms: Using ibl and meta-learning on accuracy and time results. Machine Learning 50(3), 251–277 (2003)
Article MATH Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. In: Proceedings of the 17th ICML, pp. 743–750 (2000)
Google Scholar
Blake, C., Merz, C.J.: {UCI} repository of machine learning databases (1998)
Google Scholar
Breiman, L., et al.: Heuristics of instability and stabilization in model selection. The Annals of Statistics 24(6), 2350–2383 (1996)
Article MATH MathSciNet Google Scholar
Friedman, J.H.: On bias, variance, 0/1loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)
Article Google Scholar
Domingos, P.: Why does bagging work? a bayesian account and its implications. In: KDD, pp. 155–158. Citeseer (1997)
Google Scholar
Friedman, J.H., Hall, P.: On bagging and nonlinear estimation. Journal of Statistical Planning and Inference 137(3), 669–683 (2007)
Article MATH MathSciNet Google Scholar
Büchlmann, P., Yu, B.: Analyzing bagging. Annals of Statistics, 927–961 (2002)
Google Scholar
Grandvalet, Y.: Bagging equalizes influence. Machine Learning 55(3), 251–270 (2004)
Article MATH Google Scholar
Kalousis, A., Gama, J., Hilario, M.: On data and algorithms: Understanding inductive performance. Machine Learning 54(3), 275–312 (2004)
Article MATH Google Scholar
Wang, X., Smith-Miles, K., Hyndman, R.: Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series. Neurocomputing 72(10), 2581–2594 (2009)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics, 79–86 (1951)
Google Scholar
Peng, Y.H., Flach, P.A., Soares, C., Brazdil, P.B.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002)
Chapter Google Scholar
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Article MATH Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)
Article MATH Google Scholar
Peterson, A.H., Martinez, T.: Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on Meta-learning, pp. 68–75 (2005)
Google Scholar
Lee, J.W., Giraud-Carrier, C.: A metric for unsupervised metalearning. Intelligent Data Analysis 15(6), 827–841 (2011)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
Article Google Scholar
Fisher, W.D.: On grouping for maximum homogeneity. Journal of the American Statistical Association 53(284), 789–798 (1958)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

INESC TEC/Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, Porto, Portugal, 4200-465
Fábio Pinto, Carlos Soares & João Mendes-Moreira

Authors

Fábio Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Soares
View author publications
You can also search for this author in PubMed Google Scholar
João Mendes-Moreira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, P.R. China
Xudong Luo
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Jeffrey Xu Yu
Guanxi Normal University, Guilin, P.R. China
Zhi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pinto, F., Soares, C., Mendes-Moreira, J. (2014). An Empirical Methodology to Analyze the Behavior of Bagging. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-14717-8_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics