Abstract
We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite dimensional unit ball in feature space into a finite dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence we are able to theoretically explain the effect of the choice of kernel functions on the generalization performance of support vector machines.
Supported by the Australian Research Council and the DFG (# Ja 379/71).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M.A. Aizerman, E.M. Braverman, and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.
S. Akashi. An operator theoretical characterization of ε-entropy in gaussian processes. Kodai Mathematical Journal, 9:58–67, 1986.
S. Akashi. The asymptotic behaviour of ɛ-entropy of a compact positive operator. Journal of Mathematical Analysis and Applications, 153:250–257, 1990.
N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive Dimensions, Uniform Convergence, and Learnability. Journal of the ACM, 44(4):615–631, 1997.
M. Anthony. Probabilistic analysis of learning in artificial neural networks: The pac model and its variants. Neural Computing Surveys, 1:1–47, 1997. http://www.icsi.berkeley.edu/~jagota/NCS.
Robert Ash. Information Theory. Interscience Publishers, New York, 1965.
B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, 5th Annual ACM Workshop on COLT, pages 144–152, Pittsburgh, PA, 1992. ACM Press.
B. Carl and I. Stephani. Entropy, compactness, and the approximation of operators. Cambridge University Press, Cambridge, UK, 1990.
Bernd Carl. Entropy numbers of diagonal operators with an application to eigenvalue problems. Journal of Approximation Theory, 32:135–150, 1981.
Bernd Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces. Annales de l’Institut Fourier, 35(3):79–118, 1985.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.
Martin Defant and Marius Junge. Characterization of weak type by the entropy distribution of r-nuclear operators. Studia Mathematica, 107(1):1–14, 1993.
Y. Gordon, H. König, and C. Schütt. Geometric and probabilistic estimates for entropy and approximation numbers of operators. Journal of Approximation Theory, 49:219–239, 1987.
Leonid Gurvits. A note on a scale-sensitive dimension of linear bounded functionals in Banach spaces. Technical report, NEC Research Institute, 1997. To appear in ALT97 Proceedings.
D. Jagerman. ɛ-entropy and approximation of bandlimited functions. SIAM Journal on Applied Mathematics, 17(2):362–377, 1969.
Marius Junge and Martin Defant. Some estimates of entropy numbers. Israel Journal of Mathematics, 84:417–433, 1993.
V.I. Kolchinskiĩ. Operators of type p and metric entropy. Teoriya VeroyatnosteĩI Matematicheskaya Statistika, 38:69–76, 135, 1988. (In Russian. MR 89j:60007).
V.I. Kolchinskiĩ. Entropic order of operators in banach spaces and the central limit theorem. Theory of Probability and its Applications, 36(2):303–315, 1991.
A.N. Kolmogorov and V.M. Tihomirov. ɛ-entropy and ĩ-capacity of sets in functional spaces. American Mathematical Society Translations, Series 2, 17:277–364, 1961.
H. König. Eigenvalue Distribution of Compact Operators. Birkhäuser Verlag, Basel, 1986.
T. Koski, L.-E. Persson, and J. Peetre. ɛ-entropy ɛ-rate, and interpolation spaces revisited with an application to linear communication channels. Journal of Mathematical Analysis and Applications, 186:265–276, 1994.
Alain Pajor. Sous-espaces l 1 n des espaces de Banach. Hermann, Paris, 1985.
Albrecht Pietsch. Operator ideals. North-Holland, Amsterdam, 1980.
L.S. Pontriagin and L.G. Schnirelmann. Sur une propriété métrique de la dimension. Annals of Mathematics, 33:156–162, 1932.
R.T. Prosser. The ɛ–Entropy and ɛ–Capacity of Certain Time–Varying Channels. Journal of Mathematical Analysis and Applications, 16: 553–573, 1966.
R.T. Prosser and W.L. Root. The ɛ-entropy and ɛ-capacity of certain timeinvariant channels. Journal of Mathematical Analysis and its Applications, 21:233–241, 1968.
C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.
J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.
J. Shawe-Taylor and Robert C. Williamson. Generalization performance of classifiers in terms of observed covering numbers. 4th European Conference on Computational Learning Theory.
A.J. Smola, B. Schölkopf, and K.-R. Müller. The connection between regularization operators and support vector kernels. Neural Networks, 1998. in press.
H. Triebel. Interpolationseigenschaften von Entropie-und Durchmesseridealen kompackter Operatoren. Studia Mathematica, 34:89–107, 1970.
V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition [in Russian]. Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).
V.N. Vapnik and A.Ya. Chervonenkis. Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications, 26(3):532–553, 1981.
R.C. Williamson, A. Smola, and B. Schölkopf. Generalization performance of regularization networks and support vector machines entropy numbers of compact operators. Technical report, Neurocolt Technical Report 1998-019, 1998. ftp://www.neurocolt.com/pub/neurocolt/tech reports/1998/98019.ps.Z.
Robert C. Williamson, Bernhard Schölkopf, and Alex Smola. A Maximum Margin Miscellany. Typescript, March 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Williamson, R.C., Smola, A.J., Schölkopf, B. (1999). Entropy Numbers, Operators and Support Vector Kernels. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_23
Download citation
DOI: https://doi.org/10.1007/3-540-49097-3_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65701-9
Online ISBN: 978-3-540-49097-5
eBook Packages: Springer Book Archive