Data-Dependent Margin-Based Generalization Bounds for Classification

Kégl, Balázs; Linder, Tamás; Lugosi, Gábor

doi:10.1007/3-540-44581-1_24

Balázs Kégl³,
Tamás Linder⁴ &
Gábor Lugosi⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

International Conference on Computational Learning Theory

2011 Accesses

Abstract

We derive new margin-based inequalities for the probability of error of classifiers.The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes.In particular, the bounds involve quantities such as the empirical fat-shattering dimension and covering number measured on the training data, as opposed to their worst-case counterparts traditionally used in such analyses, and appear to be sharper and more general than recent results involving empirical complexity measures.In addition, we also develop an alternative data-based bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is more easily computable than the empirical covering number or fat-shattering dimension.W e also show an example of efficient computation of the new bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44:615–631, 1997.
Article MATH MathSciNet Google Scholar
M. Anthony and P.L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999.
MATH Google Scholar
M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207–217, 1993.
Article MATH MathSciNet Google Scholar
P. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation. Machine Learning, to appear, 2001.
Google Scholar
P. Bartlett and G. Lugosi. An inequality for uniform deviations of sample averages from their means. Statistics and Probability Letters, 44:55–62, 1999.
Article MATH MathSciNet Google Scholar
P.L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44:525–536, 1998.
Article MATH MathSciNet Google Scholar
S. Boucheron, G. Lugosi, and P. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16:277–292, 2000.
Article MATH MathSciNet Google Scholar
L. Devroye, L. Györfi, and G Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-V erlag, New York, 1996.
MATH Google Scholar
L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, New York, 2000.
Google Scholar
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.
Article MATH MathSciNet Google Scholar
M. Kearns and R.E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer Systems Sciences, 48:464–497, 1994.
Article MATH MathSciNet Google Scholar
V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. manuscript, 2000.
Google Scholar
M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-V erlag, New York, 1991.
Google Scholar
C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148–188.Cam bridge University Press, Cambridge, 1989.
Google Scholar
J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:1926–1940, 1998.
Article MATH MathSciNet Google Scholar
J. Shawe-Taylor and R.C. Williamson. Generalization performance of classifiers in terms of observed covering numbers. In H.U. Simon P. Fischer, editor, Computational Learning Theory: Proceedings of the Fourth European Conference, EuroCOLT’ 99, pages 153–167. Springer, Berlin, 1999. Lecture Notes in Artificial Intelligence 1572.
Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974.(in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Operations Research, University of Montreal, CP 6128 succ.Cen tre-Ville, Montréal, Canada, H3C 3J7
Balázs Kégl
Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6
Tamás Linder
Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi

Authors

Balázs Kégl
View author publications
You can also search for this author in PubMed Google Scholar
Tamás Linder
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Lugosi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
David Helmbold
Research School of Information Sciences and Engineering Department of Telecommunications Engineering, Australian National University, Canberra, 0200, Australia
Bob Williamson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kégl, B., Linder, T., Lugosi, G. (2001). Data-Dependent Margin-Based Generalization Bounds for Classification. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_24

Download citation

DOI: https://doi.org/10.1007/3-540-44581-1_24
Published: 13 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics