Boosting with Diverse Base Classifiers

Dasgupta, Sanjoy; Long, Philip M.

doi:10.1007/978-3-540-45167-9_21

Sanjoy Dasgupta⁸ &
Philip M. Long⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

5304 Accesses
7 Citations

Abstract

We establish a new bound on the generalization error rate of the Boost-by-Majority algorithm. The bound holds when the algorithm is applied to a collection of base classifiers that contains a “diverse” subset of “good” classifiers, in a precisely defined sense. We describe cross-validation experiments that suggest that Boost-by-Majority can be the basis of a practically useful learning method, often improving on the generalization of AdaBoost on large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ali, K.M., Pazzani, M.J.: Error reduction through learning multiple descriptions. Machine Learning 24, 173–202 (1996)
Google Scholar
Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the Association for Computing Machinery 44(4), 616–631 (1997)
MathSciNet Google Scholar
Amit, Y., Blanchard, G.: Multiple randomized classifiers: MRCL (2001) (manuscript)
Google Scholar
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory 44(2), 525–536 (1998)
Article MATH MathSciNet Google Scholar
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. Technical Report 638, Department of Statistics, U.C. Berkeley (2003)
Google Scholar
Blanchard, G., Lugosi, G., Vayatis, N.: On the rate of convergence of regularized boosting methods (2003) (manuscript)
Google Scholar
Breiman, L.: Prediction games and arcing algorithms. Neural Computation 11(7) (1999)
Google Scholar
Breiman, L.: Some ininity theory for predictor ensembles. Technical Report 577, Statistics Department, UC Berkeley (2000)
Google Scholar
Breiman, L.: Arcing classiiers. The Annals of Statistics (1998)
Google Scholar
Bülmann, P., Yu, B.: Boosting with the l2 loss: regression and classification. Journal of the American Statistical Association (to appear)
Google Scholar
Dubhashi, D., Ranjan, D.: Balls and bins: A study in negative dependence. Random Structures & Algorithms 13(2), 99–124 (1998)
Article MATH MathSciNet Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classiication of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Google Scholar
Freund, Y.: Boosting a weak learning algorithm by majority. Information and Computation 121(2), 256–285 (1995)
Article MATH MathSciNet Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning (1996)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the 2nd European Conference on Computational Learning Theory, pp. 23-37 (1995)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. The Annals of Statistics 38(2), 337–407 (2000)
Article MathSciNet Google Scholar
Gavinsky, D.: Optimally-smooth adaptive boosting and application to agnostic learning. In: Proceedings of the 13th International Workshop on Algorithmic Learning Theory (2002)
Google Scholar
Grove, A.J., Schuurmans, D.: Boosting in the limit: Maximizing the margin of learned ensembles. In: Proceedings of the 15th National Conference on Artifical Intelligence (1998)
Google Scholar
Hajnal, A., Maass, W., Pudlák, P., Szegedy, M., Turán, G.: Threshold circuits of bounded depth. Journal of Computer and System Sciences 46, 129–154 (1993)
Google Scholar
Impagliazzo, R.: Hard-core distributions for somewhat hard problems. In: IEEE Symposium on Foundations of Computer Science, pp. 538-545 (1995)
Google Scholar
Jiang, W.: Process consistency for AdaBoost. Annals of Statistics (to appear)
Google Scholar
Klivans, A., Servedio, R.A.: Boosting and hard-core sets. In: IEEE Symposium on Foundations of Computer Science, pp. 624-633 (1999)
Google Scholar
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classiiers. Annals of Statistics 30(1) (2002)
Google Scholar
Koltchinskii, V., Panchenko, D.: Complexities of convex combinations and bounding the generalization error in classiication (2003) (manuscript)
Google Scholar
Langford, J., Shawe-Taylor, J.: PAC-bayes and margins. In: NIPS (2002)
Google Scholar
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg (2001)
MATH Google Scholar
Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93(443), 1032–1044 (1998)
Google Scholar
Long, P.M.: Minimum majority classiication and boosting. In: Proceedings of the The 18th National Conference on Artificial Intelligence (2002)
Google Scholar
Long, P.M., Vega, V.B.: Boosting and microarray data. Machine Learning (to appear)
Google Scholar
Lugosi, G., Vayatis, N.: On the bayes-risk consistency of regularized boosting methods. Annals of Statistics (2004); Preliminary version in COLT 2002 (2002)
Google Scholar
Mannor, S., Meir, R., Mendelson, S.: The consistency of boosting algorithms (2001) (manuscript)
Google Scholar
Mannor, S., Meir, R., Zhang, T.: The consistency of greedy algorithms for classification. In: Proc. 15th Annual Conference on Computational Learning Theory (2002)
Google Scholar
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proc. 14th International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Mason, L., Baxter, J., Bartlett, P.L., Frean, M.: Boosting algorithms as gradient descent. In: Advances in Neural Information Processing Systems, vol. 12, pp. 512–518. MIT Press, Cambridge (2000)
Google Scholar
Mason, L., Bartlett, P.L., Baxter, J.: Improved generalization through explicit optimization of margins. Machine Learning 38(3), 243–255 (2000)
Article MATH Google Scholar
McAllester, D.: Simplified PAC-Bayesian margin bounds. In: Proceedings of the 2003 Conference on Computational Learning Theory (2003)
Google Scholar
McAllester, D.A.: PAC-Bayesian model averaging. In: Proc. 12th Annu. Conf. on Comput. Learning Theory, pp. 164–170. ACM Press, New York (1999)
Google Scholar
McAllester, D.A.: Some PAC-Bayesian theorems. Machine Learning 37(3), 355–363 (1999)
Article MATH Google Scholar
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
MATH Google Scholar
Niyogi, P., Pierrot, J.-B., Siohan, O.: On decorrelating classifiers and combining them (2001) (manuscript), see http://people.cs.uchicago.edu/~niyogi/decorrelation.ps
Pisier, G.: Remarques sur un resultat non publi’e de B. Maurey. Sem. d’Analyse Fonctionelle 1(12), 1980–1981 (1981)
Google Scholar
Quinlan, J.: Bagging, boosting and c4.5. In: Proceedings of the 13th National Conference on Artifiicial Intelligence, pp. 725–730. AAAI/MIT Press (1996)
Google Scholar
Rätsch, G., Warmuth, M.K.: Marginal boosting. In: Proceedings of the Annual Conference on Computational Learning Theory (2002)
Google Scholar
Rosset, S., Zhu, J., Hastie, T.: Boosting as a regularized path to a maximum margin classiier. In: NIPS (2002)
Google Scholar
Schapire, R.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics 26(5), 1651–1686 (1998)
Article MATH MathSciNet Google Scholar
Southey, F., Schuurmans, D., Ghodsi, A.: Regularized greedy importance sampling. In: NIPS 2002 (2002)
Google Scholar
West, M., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98(20), 11462–11467 (2001)
Article Google Scholar
Zhang, T.: Statistical behavior and consistency of classiication methods based on convex risk minimization. Annals of Statistics (to appear)
Google Scholar
Zhang, T., Yu, B.: Boosting with early stopping: convergence and consistency. Technical Report 635, Statistics Department, UC Berkeley (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California at San Diego,
Sanjoy Dasgupta
Genome Institute of Singapore,
Philip M. Long

Authors

Sanjoy Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar
Philip M. Long
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MPI for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Bernhard Schölkopf
University of California, Santa Cruz
Manfred K. Warmuth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dasgupta, S., Long, P.M. (2003). Boosting with Diverse Base Classifiers. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-45167-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics