Skip to main content

Entropy Numbers, Operators and Support Vector Kernels

  • Conference paper
  • First Online:
Computational Learning Theory (EuroCOLT 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1572))

Included in the following conference series:

Abstract

We derive new bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs are based on a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite dimensional unit ball in feature space into a finite dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence we are able to theoretically explain the effect of the choice of kernel functions on the generalization performance of support vector machines.

Supported by the Australian Research Council and the DFG (# Ja 379/71).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M.A. Aizerman, E.M. Braverman, and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.

    Google Scholar 

  2. S. Akashi. An operator theoretical characterization of ε-entropy in gaussian processes. Kodai Mathematical Journal, 9:58–67, 1986.

    Article  MATH  MathSciNet  Google Scholar 

  3. S. Akashi. The asymptotic behaviour of ɛ-entropy of a compact positive operator. Journal of Mathematical Analysis and Applications, 153:250–257, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  4. N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive Dimensions, Uniform Convergence, and Learnability. Journal of the ACM, 44(4):615–631, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  5. M. Anthony. Probabilistic analysis of learning in artificial neural networks: The pac model and its variants. Neural Computing Surveys, 1:1–47, 1997. http://www.icsi.berkeley.edu/~jagota/NCS.

    MathSciNet  Google Scholar 

  6. Robert Ash. Information Theory. Interscience Publishers, New York, 1965.

    MATH  Google Scholar 

  7. B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, 5th Annual ACM Workshop on COLT, pages 144–152, Pittsburgh, PA, 1992. ACM Press.

    Google Scholar 

  8. B. Carl and I. Stephani. Entropy, compactness, and the approximation of operators. Cambridge University Press, Cambridge, UK, 1990.

    MATH  Google Scholar 

  9. Bernd Carl. Entropy numbers of diagonal operators with an application to eigenvalue problems. Journal of Approximation Theory, 32:135–150, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  10. Bernd Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces. Annales de l’Institut Fourier, 35(3):79–118, 1985.

    MATH  MathSciNet  Google Scholar 

  11. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.

    MATH  Google Scholar 

  12. Martin Defant and Marius Junge. Characterization of weak type by the entropy distribution of r-nuclear operators. Studia Mathematica, 107(1):1–14, 1993.

    MATH  MathSciNet  Google Scholar 

  13. Y. Gordon, H. König, and C. Schütt. Geometric and probabilistic estimates for entropy and approximation numbers of operators. Journal of Approximation Theory, 49:219–239, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  14. Leonid Gurvits. A note on a scale-sensitive dimension of linear bounded functionals in Banach spaces. Technical report, NEC Research Institute, 1997. To appear in ALT97 Proceedings.

    Google Scholar 

  15. D. Jagerman. ɛ-entropy and approximation of bandlimited functions. SIAM Journal on Applied Mathematics, 17(2):362–377, 1969.

    Article  MATH  MathSciNet  Google Scholar 

  16. Marius Junge and Martin Defant. Some estimates of entropy numbers. Israel Journal of Mathematics, 84:417–433, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  17. V.I. Kolchinskiĩ. Operators of type p and metric entropy. Teoriya VeroyatnosteĩI Matematicheskaya Statistika, 38:69–76, 135, 1988. (In Russian. MR 89j:60007).

    Google Scholar 

  18. V.I. Kolchinskiĩ. Entropic order of operators in banach spaces and the central limit theorem. Theory of Probability and its Applications, 36(2):303–315, 1991.

    Article  MathSciNet  Google Scholar 

  19. A.N. Kolmogorov and V.M. Tihomirov. ɛ-entropy and ĩ-capacity of sets in functional spaces. American Mathematical Society Translations, Series 2, 17:277–364, 1961.

    MathSciNet  Google Scholar 

  20. H. König. Eigenvalue Distribution of Compact Operators. Birkhäuser Verlag, Basel, 1986.

    MATH  Google Scholar 

  21. T. Koski, L.-E. Persson, and J. Peetre. ɛ-entropy ɛ-rate, and interpolation spaces revisited with an application to linear communication channels. Journal of Mathematical Analysis and Applications, 186:265–276, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  22. Alain Pajor. Sous-espaces l 1 n des espaces de Banach. Hermann, Paris, 1985.

    Google Scholar 

  23. Albrecht Pietsch. Operator ideals. North-Holland, Amsterdam, 1980.

    Google Scholar 

  24. L.S. Pontriagin and L.G. Schnirelmann. Sur une propriété métrique de la dimension. Annals of Mathematics, 33:156–162, 1932.

    Article  MathSciNet  Google Scholar 

  25. R.T. Prosser. The ɛ–Entropy and ɛ–Capacity of Certain Time–Varying Channels. Journal of Mathematical Analysis and Applications, 16: 553–573, 1966.

    Article  MATH  MathSciNet  Google Scholar 

  26. R.T. Prosser and W.L. Root. The ɛ-entropy and ɛ-capacity of certain timeinvariant channels. Journal of Mathematical Analysis and its Applications, 21:233–241, 1968.

    Article  MATH  MathSciNet  Google Scholar 

  27. C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.

    MathSciNet  Google Scholar 

  28. J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  29. J. Shawe-Taylor and Robert C. Williamson. Generalization performance of classifiers in terms of observed covering numbers. 4th European Conference on Computational Learning Theory.

    Google Scholar 

  30. A.J. Smola, B. Schölkopf, and K.-R. Müller. The connection between regularization operators and support vector kernels. Neural Networks, 1998. in press.

    Google Scholar 

  31. H. Triebel. Interpolationseigenschaften von Entropie-und Durchmesseridealen kompackter Operatoren. Studia Mathematica, 34:89–107, 1970.

    MATH  MathSciNet  Google Scholar 

  32. V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.

    MATH  Google Scholar 

  33. V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition [in Russian]. Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).

    Google Scholar 

  34. V.N. Vapnik and A.Ya. Chervonenkis. Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications, 26(3):532–553, 1981.

    Article  MathSciNet  Google Scholar 

  35. R.C. Williamson, A. Smola, and B. Schölkopf. Generalization performance of regularization networks and support vector machines entropy numbers of compact operators. Technical report, Neurocolt Technical Report 1998-019, 1998. ftp://www.neurocolt.com/pub/neurocolt/tech reports/1998/98019.ps.Z.

  36. Robert C. Williamson, Bernhard Schölkopf, and Alex Smola. A Maximum Margin Miscellany. Typescript, March 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Williamson, R.C., Smola, A.J., Schölkopf, B. (1999). Entropy Numbers, Operators and Support Vector Kernels. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-49097-3_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65701-9

  • Online ISBN: 978-3-540-49097-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics