Skip to main content
Log in

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Classification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training set and the complexity of the classifier. It has been both theoretically and empirically established that the classification performance (hence bias and variance) of a single classifier can be improved partially by using a suitable ensemble method of the classifier and resampling the original training set. In this paper, we have empirically examined the bias-variance decomposition of three different types of ensemble methods with different training sample sizes consisting of 10% to maximum 63% of the observations from the original training sample. First ensemble is bagging, second one is a boosting type ensemble named adaboost and the last one is a bagging type hybrid ensemble method, called bundling. All the ensembles are trained on training samples constructed with small subsampling ratios (SSR) 0.10, 0.20, 0.30, 0.40, 0.50 and bootstrapping. The experiments are all done on 20 UCI Machine Learning repository datasets and designed to find out the optimal training sample size (smaller than the original training sample) for each ensemble and then find out the optimal ensemble with smaller trianing sets with respect to the bias-variance performance. The bias-variance decomposition of bundling shows that this ensemble method with small subsamples has significantly lower bias and variance than subsampled and bootstrapped version of bagging and adaboost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Asuncion, A. and Newman, D. J., UCI Repository of Machine Learning, School of Information and Computer Science, University of California, Irvine. Available at http://www.ics.uci.edu/mlearn/MLRepository.html, 2007.

  2. Bauer E., Kohavi R.: “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants”. Machine Learn. 36(1–2), 105–139 (1999)

    Article  Google Scholar 

  3. Breiman L.: “Bagging predictors”. Machine Learn. 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  4. Breiman, L., “Out-of-bag estimation,” Technical Report, Statistics Department, University of Berkeley CA 94708, 1996.

  5. Breiman L.: “Random Forests”. Machine Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Dems̆ar J.: “Statistical comparisons of classifiers over multiple datasets”. J. Mach. Learn. Research 7, 1–30 (2006)

    MathSciNet  Google Scholar 

  7. Dietterich, T. G., “Ensemble methods in machine learning,” in Proc. of 1st International Workshop on Multiple Classifier Systems 2001 (Kittler, J. and Roli, F. eds.), Springer-Verlag, Berlin/Heidelberg, 2001.

  8. Freund, Y. and Schapire, R., “Experiments with a New boosting algorithm,” in Proc. of the Thirteenth International Conference on Machine Learning 1996, Morgan Kaufmann, San Francisco, pp. 148–156, 1996.

  9. Hastie, T., Tibshirani, R. and Freidman, J., The Elements of Statistical Learning: Data Mining, Inference and Prediction (Second Edition), Springer-Verlag, New York, 2009.

  10. Hothorn T., Lausen B.: “Bundling classifiers by bagging trees”. Comput. Statist. Data Anal. 49, 1068–1078 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kuncheva L.I., Combining Pattern Classifiers, Methods and Algorithms, John Wiley and Sons, 2004.

  12. Martínez-Munõz G., Suárez A.: “Out-of-bag estimation of the optimal sample size in bagging”. Pattern Recognition 43(1), 143–152 (2010)

    Article  MATH  Google Scholar 

  13. Politis, D., Romano, J. P. and Wolf, M., Subsampling, Springer Series in Statistics, Springer, Berlin, 1999.

  14. Schapire, R. E “The Boosting Approach to Machine Learning : An Overview,” in Proc. of MSRI Workshop on Nonlinear Estimation and Classification 2003 (Denison, D. D., Hansen, M. H., Holmes, C. C., Mallick, B. and Yu, B. eds.: Lecture Notes in Statistics), 171, pp. 149–172, Springer-Verlag, 2003.

  15. Webb G.I.: “MultiBoosting: A technique for combining boosting and wagging”. Machine Learn., 40(2), 159–196 (2000)

    Article  Google Scholar 

  16. Zaman, F. and Hirose, H., “A Comparitive Study on the Performance of Several Ensemble Methods with Low Subsampling Ratio,” in 2nd Asian Conference on Information Intelligent Database Systems 2010 (Nguyen, N. T., Le, M. T. and Swiatek, J. eds.), Lecture Notes in Computer Science, 5991, Part-II, pp. 320– 329, Springer-Verlag, 2010.

  17. Zhang C.X., Zhang J.S., Zhang G.Y.: “An efficient modified boosting method for solving classification problems”. J. Comput. Applied Mathemat. 214, 381–392 (2008)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Faisal Zaman.

About this article

Cite this article

Zaman, M.F., Hirose, H. Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets. New Gener. Comput. 29, 277–292 (2011). https://doi.org/10.1007/s00354-011-0303-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-011-0303-0

Keywords

Navigation