Skip to main content

Agnostic Learning Nonconvex Function Classes

  • Conference paper
  • First Online:
Computational Learning Theory (COLT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2375))

Included in the following conference series:

Abstract

We consider the sample complexity of agnostic learning with respect to squared loss. It is known that if the function class F used for learning is convex then one can obtain better sample complexity bounds than usual. It has been claimed that there is a lower bound that showed there was an essential gap in the rate. In this paper we show that the lower bound proof has a gap in it. Although we do not provide a definitive answer to its validity. More positively, we show one can obtain “fast” sample complexity bounds for nonconvex F for “most” target conditional expectations. The new bounds depend on the detailed geometry of F, in particular the distance in a certain sense of the target’s conditional expectation from the set of nonuniqueness points of the class F.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson, “Localized Rademacher averages”, in COLT2002 (these proceedings).

    Google Scholar 

  2. Shai Ben-David and Michael Lindenbaum, “Learning Distributions by their Density Levels — A Paradigm for Learning without a Teacher,” in Computational Learning Theory — EUROCOLT’95, pages 53–68 (1995).

    Google Scholar 

  3. Dietrich Braess, Nonlinear Approximation Theory, Springer-Verlag, Berlin, 1986.

    MATH  Google Scholar 

  4. Richard M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics 63, Cambridge University Press 1999.

    Google Scholar 

  5. David Haussler, “Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications,” Information and Computation, 100, 78–150 (1992).

    Article  MATH  MathSciNet  Google Scholar 

  6. Michael J. Kearns, Robert E. Schapire and Linda M. Sellie, “Toward Efficient Agnostic Learning,” pages 341–352 in Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM press, New York, 1992.

    Chapter  Google Scholar 

  7. Wee Sun Lee, Agnostic Learning and Single Hidden Layer Neural Networks, Ph.D. Thesis, Australian National University, 1996.

    Google Scholar 

  8. Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “Efficient Agnostic Learning of Neural Networks with Bounded Fan-in,” IEEE Trans. on Information Theory, 42(6), 2118–2132 (1996).

    Article  MATH  MathSciNet  Google Scholar 

  9. Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “The Importance of Convexity in Learning with Squared Loss” IEEE Transactions on Information Theory 44(5), 1974–1980, 1998 (earlier version in Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 140–146, 1996.)

    Article  MATH  MathSciNet  Google Scholar 

  10. Shahar Mendelson, “Improving the sample complexity using global data,” IEEE transactions on Information Theory, to appear. http://axiom.anu.edu.au/~shahar

  11. Shahar Mendelson “Rademacher averages and phase transitions in Glivenko-Cantelli classes” IEEE transactions on Information Theory, 48(1), 251–263, (2002).

    Article  MATH  MathSciNet  Google Scholar 

  12. Shahar Mendelson “A few remarks on Statistical Learning Theory”, preprint. http://axiom.anu.edu.au/~shahar

  13. S. B. Stechkin, “Approximation Properties of Sets in Normed Linear Spaces,” Revue de mathematiques pures et appliquees, 8, 5–18, (1963) [in Russian].

    MATH  Google Scholar 

  14. M. Talagrand, “Sharper bounds for Gaussian and empirical processes”, Annals of Probability, 22(1), 28–76, (1994).

    Article  MATH  MathSciNet  Google Scholar 

  15. Aad W. van der Vaart and Jon A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.

    MATH  Google Scholar 

  16. Frederick A. Valentine, Convex Sets, McGraw-Hill, San Francisco, 1964.

    MATH  Google Scholar 

  17. L. P. Vlasov, “Approximative Properties of Sets in Normed Linear Spaces,” Russian Mathematical Surveys, 28(6), 1–66, (1973).

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mendelson, S., Williamson, R.C. (2002). Agnostic Learning Nonconvex Function Classes. In: Kivinen, J., Sloan, R.H. (eds) Computational Learning Theory. COLT 2002. Lecture Notes in Computer Science(), vol 2375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45435-7_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45435-7_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43836-6

  • Online ISBN: 978-3-540-45435-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics