Entropy Numbers, Operators and Support Vector Kernels

Williamson, Robert C.; Smola, Alex J.; Schölkopf, Bernhard

doi:10.1007/3-540-49097-3_23

Robert C. Williamson³,
Alex J. Smola⁴ &
Bernhard Schölkopf⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1572))

Included in the following conference series:

European Conference on Computational Learning Theory

669 Accesses
9 Citations

Abstract

We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite dimensional unit ball in feature space into a finite dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence we are able to theoretically explain the effect of the choice of kernel functions on the generalization performance of support vector machines.

Supported by the Australian Research Council and the DFG (# Ja 379/71).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.A. Aizerman, E.M. Braverman, and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.
Google Scholar
S. Akashi. An operator theoretical characterization of ε-entropy in gaussian processes. Kodai Mathematical Journal, 9:58–67, 1986.
Article MATH MathSciNet Google Scholar
S. Akashi. The asymptotic behaviour of ɛ-entropy of a compact positive operator. Journal of Mathematical Analysis and Applications, 153:250–257, 1990.
Article MATH MathSciNet Google Scholar
N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive Dimensions, Uniform Convergence, and Learnability. Journal of the ACM, 44(4):615–631, 1997.
Article MATH MathSciNet Google Scholar
M. Anthony. Probabilistic analysis of learning in artificial neural networks: The pac model and its variants. Neural Computing Surveys, 1:1–47, 1997. http://www.icsi.berkeley.edu/~jagota/NCS.
MathSciNet Google Scholar
Robert Ash. Information Theory. Interscience Publishers, New York, 1965.
MATH Google Scholar
B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, 5th Annual ACM Workshop on COLT, pages 144–152, Pittsburgh, PA, 1992. ACM Press.
Google Scholar
B. Carl and I. Stephani. Entropy, compactness, and the approximation of operators. Cambridge University Press, Cambridge, UK, 1990.
MATH Google Scholar
Bernd Carl. Entropy numbers of diagonal operators with an application to eigenvalue problems. Journal of Approximation Theory, 32:135–150, 1981.
Article MATH MathSciNet Google Scholar
Bernd Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces. Annales de l’Institut Fourier, 35(3):79–118, 1985.
MATH MathSciNet Google Scholar
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.
MATH Google Scholar
Martin Defant and Marius Junge. Characterization of weak type by the entropy distribution of r-nuclear operators. Studia Mathematica, 107(1):1–14, 1993.
MATH MathSciNet Google Scholar
Y. Gordon, H. König, and C. Schütt. Geometric and probabilistic estimates for entropy and approximation numbers of operators. Journal of Approximation Theory, 49:219–239, 1987.
Article MATH MathSciNet Google Scholar
Leonid Gurvits. A note on a scale-sensitive dimension of linear bounded functionals in Banach spaces. Technical report, NEC Research Institute, 1997. To appear in ALT97 Proceedings.
Google Scholar
D. Jagerman. ɛ-entropy and approximation of bandlimited functions. SIAM Journal on Applied Mathematics, 17(2):362–377, 1969.
Article MATH MathSciNet Google Scholar
Marius Junge and Martin Defant. Some estimates of entropy numbers. Israel Journal of Mathematics, 84:417–433, 1993.
Article MATH MathSciNet Google Scholar
V.I. Kolchinskiĩ. Operators of type p and metric entropy. Teoriya VeroyatnosteĩI Matematicheskaya Statistika, 38:69–76, 135, 1988. (In Russian. MR 89j:60007).
Google Scholar
V.I. Kolchinskiĩ. Entropic order of operators in banach spaces and the central limit theorem. Theory of Probability and its Applications, 36(2):303–315, 1991.
Article MathSciNet Google Scholar
A.N. Kolmogorov and V.M. Tihomirov. ɛ-entropy and ĩ-capacity of sets in functional spaces. American Mathematical Society Translations, Series 2, 17:277–364, 1961.
MathSciNet Google Scholar
H. König. Eigenvalue Distribution of Compact Operators. Birkhäuser Verlag, Basel, 1986.
MATH Google Scholar
T. Koski, L.-E. Persson, and J. Peetre. ɛ-entropy ɛ-rate, and interpolation spaces revisited with an application to linear communication channels. Journal of Mathematical Analysis and Applications, 186:265–276, 1994.
Article MATH MathSciNet Google Scholar
Alain Pajor. Sous-espaces l ¹_n des espaces de Banach. Hermann, Paris, 1985.
Google Scholar
Albrecht Pietsch. Operator ideals. North-Holland, Amsterdam, 1980.
Google Scholar
L.S. Pontriagin and L.G. Schnirelmann. Sur une propriété métrique de la dimension. Annals of Mathematics, 33:156–162, 1932.
Article MathSciNet Google Scholar
R.T. Prosser. The ɛ–Entropy and ɛ–Capacity of Certain Time–Varying Channels. Journal of Mathematical Analysis and Applications, 16: 553–573, 1966.
Article MATH MathSciNet Google Scholar
R.T. Prosser and W.L. Root. The ɛ-entropy and ɛ-capacity of certain timeinvariant channels. Journal of Mathematical Analysis and its Applications, 21:233–241, 1968.
Article MATH MathSciNet Google Scholar
C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.
MathSciNet Google Scholar
J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.
Article MATH MathSciNet Google Scholar
J. Shawe-Taylor and Robert C. Williamson. Generalization performance of classifiers in terms of observed covering numbers. 4th European Conference on Computational Learning Theory.
Google Scholar
A.J. Smola, B. Schölkopf, and K.-R. Müller. The connection between regularization operators and support vector kernels. Neural Networks, 1998. in press.
Google Scholar
H. Triebel. Interpolationseigenschaften von Entropie-und Durchmesseridealen kompackter Operatoren. Studia Mathematica, 34:89–107, 1970.
MATH MathSciNet Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
MATH Google Scholar
V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition [in Russian]. Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).
Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications, 26(3):532–553, 1981.
Article MathSciNet Google Scholar
R.C. Williamson, A. Smola, and B. Schölkopf. Generalization performance of regularization networks and support vector machines entropy numbers of compact operators. Technical report, Neurocolt Technical Report 1998-019, 1998. ftp://www.neurocolt.com/pub/neurocolt/tech reports/1998/98019.ps.Z.
Robert C. Williamson, Bernhard Schölkopf, and Alex Smola. A Maximum Margin Miscellany. Typescript, March 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering, Australian National University, Canberra, ACT 0200, Australia
Robert C. Williamson
GMD FIRST, Rudower Chaussee 5, 12489, Berlin, Germany
Alex J. Smola & Bernhard Schölkopf

Authors

Robert C. Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Alex J. Smola
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Schölkopf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lehrstuhl für Informatik II, Universität Dortmund, D-44221, Dortmund, Germany
Paul Fischer
Fakultät für Mathematik, Ruhr Universität Bochum, D-44780, Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Williamson, R.C., Smola, A.J., Schölkopf, B. (1999). Entropy Numbers, Operators and Support Vector Kernels. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_23

Download citation

DOI: https://doi.org/10.1007/3-540-49097-3_23
Published: 19 November 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65701-9
Online ISBN: 978-3-540-49097-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics