Skip to main content

An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise

  • Conference paper
  • First Online:
Multiple Classifier Systems (MCS 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2709))

Included in the following conference series:

Abstract

Boosting algorithms are a means of building a strong ensemble classifier by aggregating a sequence of weak hypotheses. In this paper we consider three of the best-known boosting algorithms: Adaboost [9], Logitboost [11] and Brownboost [8]. These algorithms are adaptive, and work by maintaining a set of example and class weights which focus the attention of a base learner on the examples that are hardest to classify. We conduct an empirical study to compare the performance of these algorithms, measured in terms of overall test error rate, on five real data sets. The tests consist of a series of cross-validatory samples. At each validation, we set aside one third of the data chosen at random as a test set, and fit the boosting algorithm to the remaining two thirds, using binary stumps as a base learner. At each stage we record the final training and test error rates, and report the average errors within a 95% confidence interval. We then add artificial class noise to our data sets by randomly reassigning 20% of class labels, and repeat our experiment. We find that Brownboost and Logitboost prove less likely than Adaboost to overfit in this circumstance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.

    Google Scholar 

  2. E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.

    Article  MathSciNet  Google Scholar 

  3. E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:105–142, 1999.

    Article  Google Scholar 

  4. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, U.S., 1984.

    Google Scholar 

  5. T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. AI Magazine, 18:97–136, 1997.

    Google Scholar 

  6. C. Domingo and O. Watanabe. Madaboost: A modification of adaboost. In Thirteenth Annual Conference on Computational Learning Theory, 2000.

    Google Scholar 

  7. Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121, 1995.

    Google Scholar 

  8. Y. Freund. An adaptive version of the boost by majority algorithm. Machine Learning 43, 3:293–318, 2001.

    Article  Google Scholar 

  9. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Second European Conference on Computational Learning Theory, 1995.

    Google Scholar 

  10. Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14:771–780, 1999.

    Google Scholar 

  11. J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28:337–374, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  12. D. J. Hand. Construction and Assessment of Classification Rules. John Wiley & Sons, Chichester, 1997.

    MATH  Google Scholar 

  13. W. Jiang. Some results on weakly accurate base learners for boosting regression and classification. In Proceedings of the First International Workshop on Multiple Classifier Systems, pages 87–96, 2000.

    Google Scholar 

  14. M. Kearns and L. G. Valiant. Learning boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88, Harvard University Aiken Computation Laboratory, 1988.

    Google Scholar 

  15. O. L. Mangasarian and W. H. Wolberg. Cancer diagnosis via linear programming. SIAM News, 23(5):1–18, 1990.

    Google Scholar 

  16. R. A. McDonald, I. A. Eckley, and D. J. Hand. A multi-class extension to the brownboost algorithm. In Submission.

    Google Scholar 

  17. J. R. Quinlan. The effect of noise on concept learning. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, volume 2, San Mateo, CA, 1986. Morgan Kauffmann.

    Google Scholar 

  18. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  19. J. R. Quinlan. Bagging, boosting and c4.5. AAAI/IAAI, 1:725–730, 1996.

    Google Scholar 

  20. R. E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990.

    Google Scholar 

  21. R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:1651–1686, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  22. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:297–336, 1999.

    Article  MATH  Google Scholar 

  23. L. G. Valiant. A theory of the learnable. Artificial Intelligence and Language Processing, 27:1134–1142, 1984.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McDonald, R.A., Hand, D.J., Eckley, I.A. (2003). An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise. In: Windeatt, T., Roli, F. (eds) Multiple Classifier Systems. MCS 2003. Lecture Notes in Computer Science, vol 2709. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44938-8_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-44938-8_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40369-2

  • Online ISBN: 978-3-540-44938-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics