Skip to main content

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

  • Conference paper
  • First Online:
Book cover Multiple Classifier Systems (MCS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2364))

Included in the following conference series:

Abstract

In combining classifiers, it is believed that diverse ensembles perform better than non-diverse ones. In order to test this hypothesis, we study the accuracy and diversity of ensembles obtained in bagging and boosting applied to the nearest mean classifier. In our simulation study we consider two diversity measures: the Q statistic and the disagreement measure. The experiments, carried out on four data sets have shown that both diversity and the accuracy of the ensembles depend on the training sample size. With exception of very small training sample sizes, both bagging and boosting are more useful when ensembles consist of diverse classifiers. However, in boosting the relationship between diversity and the efficiency of ensembles is much stronger than in bagging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Chandrasekaran, B.: Dimensionality and Sample Size Considerations in Pattern Recognition Practice. In: Krishnaiah, P.R., Kanal, L.N. (eds.): Handbook of Statistics, Vol. 2. North-Holland, Amsterdam (1987) 835–855

    Google Scholar 

  2. Lam, L.: Classifier Combinations: Implementations and Theoretical Issues. In: Kittler, J., Roli, F. (eds.): Multiple Classifier Systems (Proc. of the First Int. Workshop MCS, Cagliari, Italy). Lecture Notes in Computer Science, Vol. 1857, Springer-Verlag, Berlin (2000) 78–86

    Google Scholar 

  3. Cunningham, P., Carney, J.: Diversity versus Quality in Classification Ensembles Based on Feature Selection. Tech. Report TCD-CS-2000-02, Dept. of Computer Science, Trinity College, Dublin (2000)

    Google Scholar 

  4. Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Is Independence Good for Combining Classifiers? In: Proc. of the 15th Int. Conference on Pattern Recognition, Vol. 2, Barcelona, Spain (2000) 169–171

    Google Scholar 

  5. Breiman, L.: Bagging predictors. In: Machine Learning Journal 24(2) (1996) 123–140

    MATH  MathSciNet  Google Scholar 

  6. Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: Proc. of the 13th Int. Conference (1996) 148–156

    Google Scholar 

  7. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press (1990) 400–407

    Google Scholar 

  8. Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. In: Machine Learning 36 (1999) 105–142

    Article  Google Scholar 

  9. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.): Multiple Classifier Systems (Proc. of the First Int. Workshop MCS, Cagliari, Italy). Lecture Notes in Computer Science, Vol. 1857, Springer-Verlag, Berlin (2000) 1–15

    Google Scholar 

  10. Quinlan, J.R.: Bagging, Boosting, and C4.5. In: Proc. of the 14th National Conference on Artificial Intelligence (1996)

    Google Scholar 

  11. Skurichina, M.: Stabilizing Weak Classifiers. PhD thesis, Delft University of Technology, Delft, The Netherlands (2001)

    Google Scholar 

  12. Avnimelech, R., Intrator, N.: Boosting Regression Estimators. In: Neural Computation 11 (1999) 499–520

    Article  Google Scholar 

  13. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. In: Journal of Computer and System Sciences 55(1) (1997) 119–139

    Article  MATH  MathSciNet  Google Scholar 

  14. Skurichina, M., Duin, R.P.W.: The Role of Combining Rules in Bagging and Boosting. In: Ferri, F.J., Inesta, J.M., Amin, A., Pudil, P. (eds.): Advances in Pattern Recognition (Proc. of the Joint Int. Workshops SSPR and SPR, Alicante, Spain). Lecture Notes in Computer Science, Vol. 1876, Springer-Verlag, Berlin (2000) 631–640

    Google Scholar 

  15. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman&Hall, New York (1993)

    MATH  Google Scholar 

  16. Cortes, C., Vapnik, V.: Support Vector Networks. In: Machine Learning 20 (1995) 273–297

    MATH  Google Scholar 

  17. Breiman, L.: Arcing Classifiers. In: Annals of Statistics 26(3) (1998) 801–849

    Article  MATH  MathSciNet  Google Scholar 

  18. Kuncheva, L.I., Whitaker, C.J.: Measures of Diversity in Classifier Ensembles (submitted)

    Google Scholar 

  19. Yule, G.U.: On the Association of Attributes in Statistics. In: Phil. Transactions A(194) (1900) 257–319

    Article  Google Scholar 

  20. Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8) (1998) 832–844

    Article  Google Scholar 

  21. Skalak, D.B.: The Sources of Increased Accuracy for Two Proposed Boosting Algorithms. In: Proc. of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop (1996)

    Google Scholar 

  22. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Skurichina, M., Kuncheva, L.I., Duin, R.P.W. (2002). Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy. In: Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2002. Lecture Notes in Computer Science, vol 2364. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45428-4_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45428-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43818-2

  • Online ISBN: 978-3-540-45428-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics