Abstract
We consider the problem of optimality, in a minimax sense, and adaptivity to the margin and to regularity in binary classification. We prove an oracle inequality, under the margin assumption (low noise condition), satisfied by an aggregation procedure which uses exponential weights. This oracle inequality has an optimal residual: (logM/n)κ/(2κ− 1) where κ is the margin parameter, M the number of classifiers to aggregate and n the number of observations. We use this inequality first to construct minimax classifiers under margin and regularity assumptions and second to aggregate them to obtain a classifier which is adaptive both to the margin and regularity. Moreover, by aggregating plug-in classifiers (only logn), we provide an easily implementable classifier adaptive both to the margin and to regularity.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers under margin condition (2005) (Preprint PMA-998), available at: http://www.proba.jussieu.fr/mathdoc/preprints/index.html#2005
Barron, A., Leung, G.: Information theory and mixing least-square regressions (manuscript, 2004)
Barron, A., Li, J.: Mixture density estimation. Biometrics 53, 603–618 (1997)
Bartlett, P., Freund, Y., Lee, W.S., Schapire, R.E.: Boosting the margin: a new explanantion for the effectiveness of voting methods. Annals of Statistics 26, 1651–1686 (1998)
Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, Classification and Risk Bounds, Technical Report 638, Department of Statistics, U.C. Berkeley (2003), available at: http://stat-www.berkeley.edu/tech-reports/638.pdf
Blanchard, G., Bousquet, O., Massart, P.: Statistical Performance of Support Vector Machines (2004), available at: http://mahery.math.u-psud.fr/~blanchard/publi/
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: A survey of some recent advances. ESAIM: Probability and statistics 9, 325–375 (2005)
Blanchard, G., Lugosi, G., Vayatis, N.: On the rate of convergence of regularized boosting classifiers. JMLR 4, 861–894 (2003)
Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Statist. 30(4), 927–961 (2002)
Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2002)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
Catoni, O.: Statistical Learning Theory and Stochastic Optimization, Ecole d’été de Probabilités de Saint-Flour, Lecture Notes in Mathematics. Springer, N.Y. (2001)
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30, 1–50 (2002)
Koltchinskii, V.: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. Ann. Statist. (to appear, 2005)
Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32(1), 30–55 (2004)
Lecué, G.: Simultaneous adaptation to the margin and to complexity in classification (2005), available at: http://hal.ccsd.cnrs.fr/ccsd-00009241/en/
Lecué, G.: Optimal rates of aggregation in classification (2006), available at: https://hal.ccsd.cnrs.fr/ccsd-00021233
Massart, P.: Some applications of concentration inequalities to Statistics. Probability Theory. Annales de la Faculté des Sciences de Toulouse 2, 245–303 (2000), volume spécial dédié à Michel Talagrand
Massart, P.: Concentration inequalities and Model Selection. Lectures notes of Saint Flour (2004)
Schölkopf, B., Smola, A.: Learning with kernels. MIT press, Cambridge University (2002)
Steinwart, I., Scovel, C.: Fast Rates for Support Vector Machines using Gaussian Kernels (2004), Los Alamos National Laboratory Technical Report LA-UR 04-8796 (submitted to Annals of Statistics)
Steinwart, I., Scovel, C.: Fast Rates for Support Vector Machines. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 279–294. Springer, Heidelberg (2005)
Tsybakov, A.B.: Optimal rates of aggregation. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 303–313. Springer, Heidelberg (2003)
Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1), 135–166 (2004)
Tsybakov, A.B.: Introduction à l’estimation non-paramétrique. Springer, Heidelberg (2004)
Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383 (1990)
Yang, Y.: Mixing strategies for density estimation. Ann. Statist. 28(1), 75–87 (2000)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1), 56–85 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lecué, G. (2006). Optimal Oracle Inequality for Aggregation of Classifiers Under Low Noise Condition. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_28
Download citation
DOI: https://doi.org/10.1007/11776420_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)