Abstract
Wahba’s classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a self-contained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.
Keywords
- Feature Space
- Kernel Method
- Neural Information Processing System
- Reproduce Kernel Hilbert Space
- Kernel Principal Component Analysis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M.A. Aizerman, É.M. Braverman, and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950.
P.L. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 43–54, Cambridge, MA, 1999. MIT Press.
C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer-Verlag, New York, 1984.
B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July 1992. ACM Press.
O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information ProcessingSystems 13. MIT Press, 2001.
D. Cox and F.O’ Sullivan. Asymptotic analysis of penalized likelihood and related estimators. Annals of Statistics, 18:1676–1695, 1990.
L. Csató and M. Opper. Sparse representation for Gaussian process models. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information ProcessingSystems 13. MIT Press, 2001.
Y. Freund and R.E. Schapire. Large margin classification using the perceptron algorithm. In J. Shavlik, editor, Machine Learning: Proceedings of the Fifteenth International Conference, San Francisco, CA, 1998. Morgan Kaufmann.
T.-T. Frieß, N. Cristianini, and C. Campbell. The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In J. Shavlik, editor, 15th International Conf. Machine Learning, pages 188–196. Morgan Kaufmann Publishers, 1998.
C. Gentile. Approximate maximal margin classification with respect to an arbitrary norm. Unpublished.
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995.
D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSCCRL-99-10, Computer Science Department, University of California at Santa Cruz, 1999.
G.S. Kimeldorf and G. Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41:495–502, 1970.
G.S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82–95, 1971.
J. Kivinen, A.J. Smola, P. Wankadia, and R.C. Williamson. On-line algorithms for kernel methods. in preparation, 2001.
A. Kowalczyk. Maximal margin perceptron. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 75–113, Cambridge, MA, 2000. MIT Press.
H. Lodhi, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Technical Report 2000-79, NeuroCOLT, 2000. Published in: T.K. Leen, T.G. Dietterich and V. Tresp (eds.), Advances in Neural Information ProcessingSystems 13, MIT Press, 2001.
O.L. Mangasarian. Nonlinear Programming. McGraw-Hill, New York, NY, 1969.
J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London, A 209:415–446, 1909.
B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods-Support Vector Learning, pages 327–352. MIT Press, Cambridge, MA, 1999.
A. Smola, T. Frieß, and B. Schölkopf. Semiparametric support vector and linear programming machines. In M.S. Kearns, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Information ProcessingSystems 11, pages 585–591, Cambridge, MA, 1999. MIT Press.
V. Vapnik. The Nature of Statistical Learning Theory. Springer, NY, 1995.
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
G. Wahba. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 69–88, Cambridge, MA, 1999. MIT Press.
C. Watkins. Dynamic alignment kernels. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, 2000. MIT Press.
C.K.I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M.I. Jordan, editor, Learningand Inference in Graphical Models. Kluwer, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schölkopf, B., Herbrich, R., Smola, A.J. (2001). A Generalized Representer Theorem. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_27
Download citation
DOI: https://doi.org/10.1007/3-540-44581-1_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive