Agnostic Learning Nonconvex Function Classes

Mendelson, Shahar; Williamson, Robert C.

doi:10.1007/3-540-45435-7_1

Shahar Mendelson³ &
Robert C. Williamson³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2375))

Included in the following conference series:

International Conference on Computational Learning Theory

1170 Accesses
1 Citations

Abstract

We consider the sample complexity of agnostic learning with respect to squared loss. It is known that if the function class F used for learning is convex then one can obtain better sample complexity bounds than usual. It has been claimed that there is a lower bound that showed there was an essential gap in the rate. In this paper we show that the lower bound proof has a gap in it. Although we do not provide a definitive answer to its validity. More positively, we show one can obtain “fast” sample complexity bounds for nonconvex F for “most” target conditional expectations. The new bounds depend on the detailed geometry of F, in particular the distance in a certain sense of the target’s conditional expectation from the set of nonuniqueness points of the class F.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson, “Localized Rademacher averages”, in COLT2002 (these proceedings).
Google Scholar
Shai Ben-David and Michael Lindenbaum, “Learning Distributions by their Density Levels — A Paradigm for Learning without a Teacher,” in Computational Learning Theory — EUROCOLT’95, pages 53–68 (1995).
Google Scholar
Dietrich Braess, Nonlinear Approximation Theory, Springer-Verlag, Berlin, 1986.
MATH Google Scholar
Richard M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics 63, Cambridge University Press 1999.
Google Scholar
David Haussler, “Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications,” Information and Computation, 100, 78–150 (1992).
Article MATH MathSciNet Google Scholar
Michael J. Kearns, Robert E. Schapire and Linda M. Sellie, “Toward Efficient Agnostic Learning,” pages 341–352 in Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM press, New York, 1992.
Chapter Google Scholar
Wee Sun Lee, Agnostic Learning and Single Hidden Layer Neural Networks, Ph.D. Thesis, Australian National University, 1996.
Google Scholar
Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “Efficient Agnostic Learning of Neural Networks with Bounded Fan-in,” IEEE Trans. on Information Theory, 42(6), 2118–2132 (1996).
Article MATH MathSciNet Google Scholar
Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “The Importance of Convexity in Learning with Squared Loss” IEEE Transactions on Information Theory 44(5), 1974–1980, 1998 (earlier version in Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 140–146, 1996.)
Article MATH MathSciNet Google Scholar
Shahar Mendelson, “Improving the sample complexity using global data,” IEEE transactions on Information Theory, to appear. http://axiom.anu.edu.au/~shahar
Shahar Mendelson “Rademacher averages and phase transitions in Glivenko-Cantelli classes” IEEE transactions on Information Theory, 48(1), 251–263, (2002).
Article MATH MathSciNet Google Scholar
Shahar Mendelson “A few remarks on Statistical Learning Theory”, preprint. http://axiom.anu.edu.au/~shahar
S. B. Stechkin, “Approximation Properties of Sets in Normed Linear Spaces,” Revue de mathematiques pures et appliquees, 8, 5–18, (1963) [in Russian].
MATH Google Scholar
M. Talagrand, “Sharper bounds for Gaussian and empirical processes”, Annals of Probability, 22(1), 28–76, (1994).
Article MATH MathSciNet Google Scholar
Aad W. van der Vaart and Jon A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.
MATH Google Scholar
Frederick A. Valentine, Convex Sets, McGraw-Hill, San Francisco, 1964.
MATH Google Scholar
L. P. Vlasov, “Approximative Properties of Sets in Normed Linear Spaces,” Russian Mathematical Surveys, 28(6), 1–66, (1973).
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT, 0200, Australia
Shahar Mendelson & Robert C. Williamson

Authors

Shahar Mendelson
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Williamson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT, 0200, Australia
Jyrki Kivinen
Computer Science Department, University of Illinois at Chicago, 851 S. Morgan St., Chicago, IL, 60607, USA
Robert H. Sloan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mendelson, S., Williamson, R.C. (2002). Agnostic Learning Nonconvex Function Classes. In: Kivinen, J., Sloan, R.H. (eds) Computational Learning Theory. COLT 2002. Lecture Notes in Computer Science(), vol 2375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45435-7_1

Download citation

DOI: https://doi.org/10.1007/3-540-45435-7_1
Published: 25 June 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43836-6
Online ISBN: 978-3-540-45435-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics