Article

Experimental comparisons of online and batch versions of bagging and boosting

Authors:
Nikunj C. Oza

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

,
Stuart Russell

University of California, Berkeley, CA

University of California, Berkeley, CA
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 359–364https://doi.org/10.1145/502512.502565

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 359–364

ABSTRACT

Bagging and boosting are well-known ensemble learning methods. They combine multiple learned base models with the aim of improving generalization performance. To date, they have been used primarily in batch mode, i.e., they require multiple passes through the training data. In previous work, we presented online bagging and boosting algorithms that only require one pass through the training data and presented experimental results on some relatively small datasets. Through additional experiments on a variety of larger synthetic and real datasets, this paper demonstrates that our online versions perform comparably to their batch counterparts in terms of classification accuracy. We also demonstrate the substantial reduction in running time we obtain with our online algorithms because they require fewer passes through the training data.

References

1.S.D. Bay. The UCI KDD archive, 1999. (URL: http://kdd.ics.uci.edu).Google Scholar
2.C. Blake, E. Keogh, and C.J. Merz. UCI repository of machine learning databases, 1999. (URL: http: / /www.ics.uci.edu /~mlearn /MLRepository.htmI).Google Scholar
3.L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996. Google ScholarCross Ref
4.Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997. Google ScholarDigital Library
5.O. L. Mangasarian, R. Setiono, and W. H. Wolberg. Pattern recognition via linear programming: Theory and application to medical diagnosis. In Thomas F. Coleman and Yuying Li, editors, Large-Scale Numerical Optimization, pages 22-30. SIAM Publications, 1990.Google Scholar
6.Nikunj C. Oza and Stuart Russell. Experimental comparisons of online and batch versions of bagging and boosting. Technical report, Electrical Engineering and Computer Science Department, University of California, Berkeley, CA. In prepaxation.Google Scholar
7.Nikunj C. Oza and Stuart Russell. Online bagging and boosting. In Artificial Intelligence and Statistics 2001, pages 105-112. Morgan Kanfmann, 2001.Google Scholar
8.P.E. Utgoff, N.C. Berkman, and J.A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, 29(1):5-44, 1997. Google ScholarDigital Library

Index Terms

Experimental comparisons of online and batch versions of bagging and boosting
1. Computing methodologies
  1. Machine learning
    1. Learning settings
2. Theory of computation

Recommendations

Using boosting to prune bagging ensembles

Boosting is used to determine the order in which classifiers are aggregated in a bagging ensemble. Early stopping in the aggregation of the classifiers in the ordered bagging ensemble allows the identification of subensembles that require less memory ...
Read More
Bagging, boosting, and C4.S
AAAI'96: Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of ...
Read More
Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets
Abstract
Classification performance of an ensemble method can be deciphered by studying the bias and variance contribution to its classification error. Statistically, the bias and variance of a single classifier is controlled by the size of the training ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 163
  Total Citations
  View Citations
- 1,409
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Experimental comparisons of online and batch versions of bagging and boosting

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using boosting to prune bagging ensembles

Bagging, boosting, and C4.S

Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets