An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise

McDonald, Ross A.; Hand, David J.; Eckley, Idris A.

doi:10.1007/3-540-44938-8_4

Ross A. McDonald⁶,
David J. Hand⁶ &
Idris A. Eckley⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2709))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

971 Accesses

Abstract

Boosting algorithms are a means of building a strong ensemble classifier by aggregating a sequence of weak hypotheses. In this paper we consider three of the best-known boosting algorithms: Adaboost [9], Logitboost [11] and Brownboost [8]. These algorithms are adaptive, and work by maintaining a set of example and class weights which focus the attention of a base learner on the examples that are hardest to classify. We conduct an empirical study to compare the performance of these algorithms, measured in terms of overall test error rate, on five real data sets. The tests consist of a series of cross-validatory samples. At each validation, we set aside one third of the data chosen at random as a test set, and fit the boosting algorithm to the remaining two thirds, using binary stumps as a base learner. At each stage we record the final training and test error rates, and report the average errors within a 95% confidence interval. We then add artificial class noise to our data sets by randomly reassigning 20% of class labels, and repeat our experiment. We find that Brownboost and Logitboost prove less likely than Adaboost to overfit in this circumstance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Boosting methods for multi-class imbalanced data classification: an experimental review

Article Open access 01 September 2020

Ensemble Method Combination: Bagging and Boosting

Online AdaBoost-based methods for multiclass problems

Article 01 March 2019

References

UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Google Scholar
E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.
Article MathSciNet Google Scholar
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36:105–142, 1999.
Article Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, U.S., 1984.
Google Scholar
T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. AI Magazine, 18:97–136, 1997.
Google Scholar
C. Domingo and O. Watanabe. Madaboost: A modification of adaboost. In Thirteenth Annual Conference on Computational Learning Theory, 2000.
Google Scholar
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121, 1995.
Google Scholar
Y. Freund. An adaptive version of the boost by majority algorithm. Machine Learning 43, 3:293–318, 2001.
Article Google Scholar
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Second European Conference on Computational Learning Theory, 1995.
Google Scholar
Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14:771–780, 1999.
Google Scholar
J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28:337–374, 2000.
Article MATH MathSciNet Google Scholar
D. J. Hand. Construction and Assessment of Classification Rules. John Wiley & Sons, Chichester, 1997.
MATH Google Scholar
W. Jiang. Some results on weakly accurate base learners for boosting regression and classification. In Proceedings of the First International Workshop on Multiple Classifier Systems, pages 87–96, 2000.
Google Scholar
M. Kearns and L. G. Valiant. Learning boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88, Harvard University Aiken Computation Laboratory, 1988.
Google Scholar
O. L. Mangasarian and W. H. Wolberg. Cancer diagnosis via linear programming. SIAM News, 23(5):1–18, 1990.
Google Scholar
R. A. McDonald, I. A. Eckley, and D. J. Hand. A multi-class extension to the brownboost algorithm. In Submission.
Google Scholar
J. R. Quinlan. The effect of noise on concept learning. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, volume 2, San Mateo, CA, 1986. Morgan Kauffmann.
Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Google Scholar
J. R. Quinlan. Bagging, boosting and c4.5. AAAI/IAAI, 1:725–730, 1996.
Google Scholar
R. E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990.
Google Scholar
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:1651–1686, 1998.
Article MATH MathSciNet Google Scholar
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:297–336, 1999.
Article MATH Google Scholar
L. G. Valiant. A theory of the learnable. Artificial Intelligence and Language Processing, 27:1134–1142, 1984.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London
Ross A. McDonald & David J. Hand
Shell Research Ltd, UK
Idris A. Eckley

Authors

Ross A. McDonald
View author publications
You can also search for this author in PubMed Google Scholar
David J. Hand
View author publications
You can also search for this author in PubMed Google Scholar
Idris A. Eckley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Terry Windeatt
Dept. of Electrical and Electronic Engineering, University of Cagliari, Piazza D’Ami, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McDonald, R.A., Hand, D.J., Eckley, I.A. (2003). An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise. In: Windeatt, T., Roli, F. (eds) Multiple Classifier Systems. MCS 2003. Lecture Notes in Computer Science, vol 2709. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44938-8_4

Download citation

DOI: https://doi.org/10.1007/3-540-44938-8_4
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40369-2
Online ISBN: 978-3-540-44938-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics