Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Zaman, M. Faisal; Hirose, Hideo

doi:10.1007/s00354-011-0303-0

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Published: 04 August 2011

Volume 29, pages 277–292, (2011)
Cite this article

New Generation Computing Aims and scope Submit manuscript

M. Faisal Zaman¹ &
Hideo Hirose¹

407 Accesses
16 Citations
Explore all metrics

Abstract

Classification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training set and the complexity of the classifier. It has been both theoretically and empirically established that the classification performance (hence bias and variance) of a single classifier can be improved partially by using a suitable ensemble method of the classifier and resampling the original training set. In this paper, we have empirically examined the bias-variance decomposition of three different types of ensemble methods with different training sample sizes consisting of 10% to maximum 63% of the observations from the original training sample. First ensemble is bagging, second one is a boosting type ensemble named adaboost and the last one is a bagging type hybrid ensemble method, called bundling. All the ensembles are trained on training samples constructed with small subsampling ratios (SSR) 0.10, 0.20, 0.30, 0.40, 0.50 and bootstrapping. The experiments are all done on 20 UCI Machine Learning repository datasets and designed to find out the optimal training sample size (smaller than the original training sample) for each ensemble and then find out the optimal ensemble with smaller trianing sets with respect to the bias-variance performance. The bias-variance decomposition of bundling shows that this ensemble method with small subsamples has significantly lower bias and variance than subsampled and bootstrapped version of bagging and adaboost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble Method Combination: Bagging and Boosting

A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise

Using Bag-of-Little Bootstraps for Efficient Ensemble Learning

References

Asuncion, A. and Newman, D. J., UCI Repository of Machine Learning, School of Information and Computer Science, University of California, Irvine. Available at http://www.ics.uci.edu/mlearn/MLRepository.html, 2007.
Bauer E., Kohavi R.: “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants”. Machine Learn. 36(1–2), 105–139 (1999)
Article Google Scholar
Breiman L.: “Bagging predictors”. Machine Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Breiman, L., “Out-of-bag estimation,” Technical Report, Statistics Department, University of Berkeley CA 94708, 1996.
Breiman L.: “Random Forests”. Machine Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Dems̆ar J.: “Statistical comparisons of classifiers over multiple datasets”. J. Mach. Learn. Research 7, 1–30 (2006)
MathSciNet Google Scholar
Dietterich, T. G., “Ensemble methods in machine learning,” in Proc. of 1st International Workshop on Multiple Classifier Systems 2001 (Kittler, J. and Roli, F. eds.), Springer-Verlag, Berlin/Heidelberg, 2001.
Freund, Y. and Schapire, R., “Experiments with a New boosting algorithm,” in Proc. of the Thirteenth International Conference on Machine Learning 1996, Morgan Kaufmann, San Francisco, pp. 148–156, 1996.
Hastie, T., Tibshirani, R. and Freidman, J., The Elements of Statistical Learning: Data Mining, Inference and Prediction (Second Edition), Springer-Verlag, New York, 2009.
Hothorn T., Lausen B.: “Bundling classifiers by bagging trees”. Comput. Statist. Data Anal. 49, 1068–1078 (2005)
Article MathSciNet MATH Google Scholar
Kuncheva L.I., Combining Pattern Classifiers, Methods and Algorithms, John Wiley and Sons, 2004.
Martínez-Munõz G., Suárez A.: “Out-of-bag estimation of the optimal sample size in bagging”. Pattern Recognition 43(1), 143–152 (2010)
Article MATH Google Scholar
Politis, D., Romano, J. P. and Wolf, M., Subsampling, Springer Series in Statistics, Springer, Berlin, 1999.
Schapire, R. E “The Boosting Approach to Machine Learning : An Overview,” in Proc. of MSRI Workshop on Nonlinear Estimation and Classification 2003 (Denison, D. D., Hansen, M. H., Holmes, C. C., Mallick, B. and Yu, B. eds.: Lecture Notes in Statistics), 171, pp. 149–172, Springer-Verlag, 2003.
Webb G.I.: “MultiBoosting: A technique for combining boosting and wagging”. Machine Learn., 40(2), 159–196 (2000)
Article Google Scholar
Zaman, F. and Hirose, H., “A Comparitive Study on the Performance of Several Ensemble Methods with Low Subsampling Ratio,” in 2nd Asian Conference on Information Intelligent Database Systems 2010 (Nguyen, N. T., Le, M. T. and Swiatek, J. eds.), Lecture Notes in Computer Science, 5991, Part-II, pp. 320– 329, Springer-Verlag, 2010.
Zhang C.X., Zhang J.S., Zhang G.Y.: “An efficient modified boosting method for solving classification problems”. J. Comput. Applied Mathemat. 214, 381–392 (2008)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu Institute of Technology, 680–4 Kawazu, Iizuka, Fukuoka, Japan
M. Faisal Zaman & Hideo Hirose

Authors

M. Faisal Zaman
View author publications
You can also search for this author in PubMed Google Scholar
Hideo Hirose
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Faisal Zaman.

About this article

Cite this article

Zaman, M.F., Hirose, H. Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets. New Gener. Comput. 29, 277–292 (2011). https://doi.org/10.1007/s00354-011-0303-0

Download citation

Received: 24 March 2011
Revised: 31 March 2011
Published: 04 August 2011
Issue Date: July 2011
DOI: https://doi.org/10.1007/s00354-011-0303-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Abstract

Access this article

Similar content being viewed by others

Ensemble Method Combination: Bagging and Boosting

A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise

Using Bag-of-Little Bootstraps for Efficient Ensemble Learning

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

Abstract

Access this article

Similar content being viewed by others

Ensemble Method Combination: Bagging and Boosting

A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise

Using Bag-of-Little Bootstraps for Efficient Ensemble Learning

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation