Abstract
In this paper, we focus on the generalization ability of the empirical risk minimization technique in the framework of agnostic learning, and consider the support vector regression method as a special case. We give a set of analytic conditions that characterize the empirical risk minimization methods and their approximations that are distribution-free consistent. Then utilizing the weak topology of the feature space, we show that the support vector regression, possibly with a discontinuous kernel, is distribution-free consistent. Moreover, a tighter generalization error bound is shown to be achieved in certain cases if the value of the regularization parameter grows as the sample size increases. The results carry over to the ν-support vector regression.
Similar content being viewed by others
References
Alon N, Ben-David S, Cesa-Bianchi N, Haussler D (1997) Scale-sensitive dimensions, uniform convergence, and learnability. J Assoc Comput Mach 44(4): 615–631
Bartlett P, Long P, Williamson R (1996) Fat-shattering and the learnability of real-valued functions. J Comput Syst Sci 52(3): 434–452
Bartlett PL, Kulkarni SR, Posner SE (1997) Covering numbers for real-valued function classes. IEEE Trans Inform Theory 43(5): 1721–1724
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2(3): 499–526
Champeney DC (1987) A handbook of Fourier theorems. Cambridge University Press, New York
Chang C-C, Lin C-J (2002) Training ν-support vector regression: theory and algorithms. Neural Comput 14(8): 1959–1977
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3): 131–159
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Cucker F, Smale S (2001) On the mathematical foundations of learning. Bull (New Series) Am Math Soc 39(1): 1–49
Davies PL (1993) Aspects of robust linear regression. Ann Stat 21(4): 1843–1899
Donoho DL, Liu RC (1998) The “automatic” robustness of minimum distance functionals. Ann Stat 16(2): 552–586
Donoho DL, Liu RC (1998) Pathologies of some minimum distance estimators. Ann Stat 16(2): 587–608
Evgeniou T, Pontil M (1999) On the V γ dimension for regression in reproducing kernel Hilbert spaces. In: Watanabe O, Yokomori T (eds) Proceedings of the 14th international conference on algorithmic learning theory. Lecture Notes in Computer Science, vol 1720. Springer, Berlin, pp 106–117
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1): 1–50
Guo Y, Bartlett PL, Shawe-Taylor J, Williamson RC (2002) Covering numbers for support vector machines. IEEE Trans Inform Theory 48(1): 239–250
Halmos PR (1982) A Hilbert space problem book. Springer, New York, NY
Haussler D (1992) Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform Comput 100(1): 78–150
Kelley JL, Namioka I et al (1976) Linear topological spaces. Springer, New York
Kimeldorf G, Wahba G (1971) Some results on Tchebychffian spline functions. J Math Anal Appl 33: 82–95
Neumann MH (1997) Optimal change-point estimation in inverse problems. Scand J Stat 24(4): 503–521
Oyama E, Agah A, MacDorman KF, Maeda T, Tachi S (2001) A modular neural network architecture for inverse kinematics model learning. Neurocomputing 38–40: 797–805
Schölkopf, B, Burges, CJC, Smola, AJ (eds) (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge
Schölkopf B, Shawe-Taylor J, Smola AJ, Williamson RC (1999) Generalization bounds via eigenvalues of the Gram matrix. Technical Report 1999-035, NeuroCOLT
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5): 1207–1245
Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inform Theory 44(5): 1926–1940
Shiryayev AN (1996) Probability. Springer, New York
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3): 199–222
Steinwart I (2002) Support vector machines are universally consistent. J Complex 18: 768–791
Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12(9): 2013–2036
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Vidyasagar M (1997) A theory of learning and generalization. Springer, London
Wang Y (1995) Jump and sharp cusp detection by wavelets. Biometrika 82(2): 385–397
Williamson RC, Smola AJ, Schölkopf B (2001) Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. IEEE Trans Inform Theory 47(6): 2516–2532
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the AFOSR/DARPA MURI Center under Grant No. F49620-95-1-0524.
Rights and permissions
About this article
Cite this article
Lee, JW., Khargonekar, P.P. Distribution-free consistency of empirical risk minimization and support vector regression. Math. Control Signals Syst. 21, 111–125 (2009). https://doi.org/10.1007/s00498-009-0041-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00498-009-0041-8