Synonyms
Definition
Bagging (Bootstrap Aggregating) uses “majority voting” to combine the output of different inductive models, constructed from bootstrap samples of the same training set. A bootstrap has the same size as the training data, and is uniformly sampled from the original training set with replacement. That is, after an example is selected from the training set, it is still kept in the training set for subsequent sampling and the same example could be selected multiple times into the same bootstrap sample. When the training set is sufficiently large, on average, a bootstrap sample has 63.2% unique examples from the original training set, and the rest are duplicates. In order to make full use of bagging, typically, one need to generate at least 50 bootstrap samples and construct 50 classifiers using these samples. During prediction, the class label receiving the most votes or most predictions from the base level 50 classifiers will be the final...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Amit Y. and Geman D. Shape quantization and recognition with randomized trees. Neural Comput., 9(7):1545–1588, 1997.
Bradford J.P., Kunz C., Kohavi R., Brunk C., and Brodley C.E. Pruning decision trees with misclassification costs. In Proc. Eur. Conf. Mach. Learn., 131–136, 1998.
Breiman L. Bagging predictors. Mach. Learn., 24(2):123–140, 1996.
Buntine W. Learning classification trees. In Artificial Intelligence frontiers in statistics, D.J. Hand (ed.). Chapman & Hall, London, 1993, pp. 182–201.
Domingos P. Occam’s two razors: The sharp and the blunt. In Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, 1998.
Fan W., Greengrass E., McCloskey J., Yu P.S., and Drummey K. Effective estimation of posterior probabilities: Explaining the accuracy of randomized decision tree approaches. In Proc. IEEE Int. Conf. on Data Mining, 2005, pp. 154–161.
Fan W., Wang H., Yu P.S., and Ma S. Is random model better? on its accuracy and efficiency. In Proc. 19th Int. Conf. on Data Engineering, 2003.
Freund Y. and Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. Comput. Syst. Sci., 55(1):119–139, 1997.
Gehrke J., Ganti V., Ramakrishnan R., and Loh W.-Y. BOAT-optimistic decision tree construction. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 1999.
Kearns M. and Mansour Y. On the boosting ability of top-down decision tree learning algorithms. In Proc. Annual ACM Symp. on the Theory of Computing, 1996, pp. 459–468.
Mehta M., Rissanen J., and Agrawal R. MDL-based decision tree pruning. In Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, 1995, pp. 216–221.
Quinlan R. C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos, CA, 1993.
Shawe-Taylor J. and Cristianini N. Data-dependent structural risk minimisation for perceptron decision trees. In Advances in Neural Information Processing Systems 10, M. Jordan, M. Kearns, and S. Solla (eds.). MIT Press, Cambridge, MA, 1998, pp. 336–342.
Zhang K., Xu Z., Peng J., and Buckles B.P. Learning through changes: An empirical study of dynamic behaviors of probability estimation trees. In Proc. IEEE Int. Conf. on Data Mining, 2005, pp. 817–820.
Zhang K. and Fan W. Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Know. Inf. Syst., 14(3), 2008.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this entry
Cite this entry
Fan, W., Zhang, K. (2009). Bagging. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_567
Download citation
DOI: https://doi.org/10.1007/978-0-387-39940-9_567
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering