Abstract
We derive new margin-based inequalities for the probability of error of classifiers.The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes.In particular, the bounds involve quantities such as the empirical fat-shattering dimension and covering number measured on the training data, as opposed to their worst-case counterparts traditionally used in such analyses, and appear to be sharper and more general than recent results involving empirical complexity measures.In addition, we also develop an alternative data-based bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is more easily computable than the empirical covering number or fat-shattering dimension.W e also show an example of efficient computation of the new bounds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44:615–631, 1997.
M. Anthony and P.L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999.
M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207–217, 1993.
P. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation. Machine Learning, to appear, 2001.
P. Bartlett and G. Lugosi. An inequality for uniform deviations of sample averages from their means. Statistics and Probability Letters, 44:55–62, 1999.
P.L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44:525–536, 1998.
S. Boucheron, G. Lugosi, and P. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16:277–292, 2000.
L. Devroye, L. Györfi, and G Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-V erlag, New York, 1996.
L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, New York, 2000.
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.
M. Kearns and R.E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer Systems Sciences, 48:464–497, 1994.
V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. manuscript, 2000.
M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-V erlag, New York, 1991.
C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148–188.Cam bridge University Press, Cambridge, 1989.
J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:1926–1940, 1998.
J. Shawe-Taylor and R.C. Williamson. Generalization performance of classifiers in terms of observed covering numbers. In H.U. Simon P. Fischer, editor, Computational Learning Theory: Proceedings of the Fourth European Conference, EuroCOLT’ 99, pages 153–167. Springer, Berlin, 1999. Lecture Notes in Artificial Intelligence 1572.
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974.(in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kégl, B., Linder, T., Lugosi, G. (2001). Data-Dependent Margin-Based Generalization Bounds for Classification. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_24
Download citation
DOI: https://doi.org/10.1007/3-540-44581-1_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive