Skip to main content
Log in

Distribution-free consistency of empirical risk minimization and support vector regression

  • Original Article
  • Published:
Mathematics of Control, Signals, and Systems Aims and scope Submit manuscript

Abstract

In this paper, we focus on the generalization ability of the empirical risk minimization technique in the framework of agnostic learning, and consider the support vector regression method as a special case. We give a set of analytic conditions that characterize the empirical risk minimization methods and their approximations that are distribution-free consistent. Then utilizing the weak topology of the feature space, we show that the support vector regression, possibly with a discontinuous kernel, is distribution-free consistent. Moreover, a tighter generalization error bound is shown to be achieved in certain cases if the value of the regularization parameter grows as the sample size increases. The results carry over to the ν-support vector regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alon N, Ben-David S, Cesa-Bianchi N, Haussler D (1997) Scale-sensitive dimensions, uniform convergence, and learnability. J Assoc Comput Mach 44(4): 615–631

    MATH  MathSciNet  Google Scholar 

  2. Bartlett P, Long P, Williamson R (1996) Fat-shattering and the learnability of real-valued functions. J Comput Syst Sci 52(3): 434–452

    Article  MATH  MathSciNet  Google Scholar 

  3. Bartlett PL, Kulkarni SR, Posner SE (1997) Covering numbers for real-valued function classes. IEEE Trans Inform Theory 43(5): 1721–1724

    Article  MATH  MathSciNet  Google Scholar 

  4. Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2(3): 499–526

    Article  MATH  MathSciNet  Google Scholar 

  5. Champeney DC (1987) A handbook of Fourier theorems. Cambridge University Press, New York

    MATH  Google Scholar 

  6. Chang C-C, Lin C-J (2002) Training ν-support vector regression: theory and algorithms. Neural Comput 14(8): 1959–1977

    Article  MATH  Google Scholar 

  7. Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3): 131–159

    Article  MATH  Google Scholar 

  8. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  9. Cucker F, Smale S (2001) On the mathematical foundations of learning. Bull (New Series) Am Math Soc 39(1): 1–49

    Article  MathSciNet  Google Scholar 

  10. Davies PL (1993) Aspects of robust linear regression. Ann Stat 21(4): 1843–1899

    Article  MATH  Google Scholar 

  11. Donoho DL, Liu RC (1998) The “automatic” robustness of minimum distance functionals. Ann Stat 16(2): 552–586

    Article  MathSciNet  Google Scholar 

  12. Donoho DL, Liu RC (1998) Pathologies of some minimum distance estimators. Ann Stat 16(2): 587–608

    Article  MathSciNet  Google Scholar 

  13. Evgeniou T, Pontil M (1999) On the V γ dimension for regression in reproducing kernel Hilbert spaces. In: Watanabe O, Yokomori T (eds) Proceedings of the 14th international conference on algorithmic learning theory. Lecture Notes in Computer Science, vol 1720. Springer, Berlin, pp 106–117

  14. Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1): 1–50

    Article  MATH  MathSciNet  Google Scholar 

  15. Guo Y, Bartlett PL, Shawe-Taylor J, Williamson RC (2002) Covering numbers for support vector machines. IEEE Trans Inform Theory 48(1): 239–250

    Article  MATH  MathSciNet  Google Scholar 

  16. Halmos PR (1982) A Hilbert space problem book. Springer, New York, NY

    MATH  Google Scholar 

  17. Haussler D (1992) Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform Comput 100(1): 78–150

    Article  MATH  MathSciNet  Google Scholar 

  18. Kelley JL, Namioka I et al (1976) Linear topological spaces. Springer, New York

    MATH  Google Scholar 

  19. Kimeldorf G, Wahba G (1971) Some results on Tchebychffian spline functions. J Math Anal Appl 33: 82–95

    Article  MATH  MathSciNet  Google Scholar 

  20. Neumann MH (1997) Optimal change-point estimation in inverse problems. Scand J Stat 24(4): 503–521

    Article  MATH  Google Scholar 

  21. Oyama E, Agah A, MacDorman KF, Maeda T, Tachi S (2001) A modular neural network architecture for inverse kinematics model learning. Neurocomputing 38–40: 797–805

    Article  Google Scholar 

  22. Schölkopf, B, Burges, CJC, Smola, AJ (eds) (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge

    Google Scholar 

  23. Schölkopf B, Shawe-Taylor J, Smola AJ, Williamson RC (1999) Generalization bounds via eigenvalues of the Gram matrix. Technical Report 1999-035, NeuroCOLT

  24. Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5): 1207–1245

    Article  Google Scholar 

  25. Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inform Theory 44(5): 1926–1940

    Article  MATH  MathSciNet  Google Scholar 

  26. Shiryayev AN (1996) Probability. Springer, New York

    Google Scholar 

  27. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3): 199–222

    Article  MathSciNet  Google Scholar 

  28. Steinwart I (2002) Support vector machines are universally consistent. J Complex 18: 768–791

    Article  MATH  MathSciNet  Google Scholar 

  29. Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12(9): 2013–2036

    Article  Google Scholar 

  30. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  31. Vidyasagar M (1997) A theory of learning and generalization. Springer, London

    MATH  Google Scholar 

  32. Wang Y (1995) Jump and sharp cusp detection by wavelets. Biometrika 82(2): 385–397

    Article  MATH  MathSciNet  Google Scholar 

  33. Williamson RC, Smola AJ, Schölkopf B (2001) Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. IEEE Trans Inform Theory 47(6): 2516–2532

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji-Woong Lee.

Additional information

This work was supported in part by the AFOSR/DARPA MURI Center under Grant No. F49620-95-1-0524.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, JW., Khargonekar, P.P. Distribution-free consistency of empirical risk minimization and support vector regression. Math. Control Signals Syst. 21, 111–125 (2009). https://doi.org/10.1007/s00498-009-0041-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00498-009-0041-8

Keywords