Abstract
Bayesian methods allow for a simple and intuitive representation of the function spaces used by kernel methods. This chapter describes the basic principles of Gaussian Processes, their implementation and their connection to other kernel-based Bayesian estimation methods, such as the Relevance Vector Machine.
Keywords
- Gaussian Process
- Neural Information Processing System
- Reproduce Kernel Hilbert Space
- Relevance Vector Machine
- Laplace Approximation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The present article is based on 62.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
K. P. Bennett, A. Demiriz, and J. Shawe-Taylor. A column generation algorithm for boosting. In P. Langley, editor, Proceedings of the International Conference on Machine Learning, San Francisco, 2000. Morgan Kaufmann Publishers.
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
C. M. Bishop and M. E. Tipping. Variational relevance vector machines. In Proceedings of 16th Conference on Uncertainty in Artificial Intelligence UAI’2000, pages 46–53, 2000.
P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In J. Shavlik, editor, Proceedings of the International Conference on Machine Learning, pages 82–90, San Francisco, California, 1998. Morgan Kaufmann Publishers. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.Z.
S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. Siam Journal of Scientific Computing, 20(1):33–61, 1999.
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991.
H. Cramér. Mathematical Methods of Statistics. Princeton University Press, 1946.
L. Csató and M. Opper. Sparse representation for Gaussian process models. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 444–450. MIT Press, 2001.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B, 39(1):1–22, 1977.
S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo. Physics Letters B, 195:216–222, 1995.
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representation. Technical report, IBM Watson Research Center, New York, 2000.
R. Fletcher. Practical Methods of Optimization. John Wiley and Sons, New York, 1989.
G. Fung and O. L. Mangasarian. Data selection for support vector machine classifiers. In Proceedings of KDD’2000, 2000. also: Data Mining Institute Technical Report 00-02, University of Wisconsin, Madison.
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman and Hall, London, 1995.
M. Gibbs and D. J. C. Mackay. Variational Gaussian process classifiers. Technical report, Cavendish Laboratory, Cambridge, UK, 1998.
M. N. Gibbs. Bayesian Gaussian Methods for Regression and Classification. PhD thesis, University of Cambridge, 1997.
Mark Gibbs and David J. C. Mackay. Efficient implementation of Gaussian processes. Technical report, Cavendish Laboratory, Cambridge, UK, 1997. available at http://www.wol.ra.phy.cam.ac.uk/mng10/GP/.
P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, 1981.
F. Girosi. Models of noise and robust estimates. A.I. Memo 1287, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1991.
D. Goldfarb and K. Scheinberg. A product-form cholesky factorization method for handling dense columns in interior point methods for linear programming. Technical report, IBM Watson Research Center, Yorktown Heights, 2001.
G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD, 3rd edition, 1996.
I. S. Gradshteyn and I. M. Ryzhik. Table of integrals, series, and products. Academic Press, New York, 1981.
T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classification on pairwise proximity data. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 438–444, Cambridge, MA, 1999. MIT Press.
D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSCCRL-99-10, Computer Science Department, UC Santa Cruz, 1999.
R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, 2002.
Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines: Estimating the Bayes point in kernel space. In Proceedings of IJCAI Workshop Support Vector Machines, pages 23–27, 1999.
R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985.
P. J. Huber. Robust statistics: a review. Annals of Statistics, 43:1041, 1972.
T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. Technical Report AITR-1668, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1999.
T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 487–493, Cambridge, MA, 1999. MIT Press.
T. S. Jaakkola and M. I. Jordan. Computing upper and lower bounds on likelihoods in untractable networks. In Proceedings of the 12th Conference on Uncertainty in AI. Morgan Kaufmann Publishers, 1996.
W. James and C. Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, volume 1, pages 361–380, Berkeley, 1960. University of California Press.
T. Jebara and T. Jaakkola. Feature selection and dualities in maximum entropy discrimination. In Uncertainty In Artificial Intelligence, 2000.
M. I. Jordan and C. M. Bishop. An Introduction to Probabilistic Graphical Models. MIT Press, 2002.
M. I. Jordan, Z. Gharamani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In Learning in Graphical Models, volume M.I. Jordan, pages 105–162. Kluwer Academic, 1998.
M. S. Lewicki and T. J. Sejnowski. Learning nonlinear overcomplete representations for efficient coding. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, Advances in Neural Information Processing Systems 10, pages 556–562, Cambridge, MA, 1998. MIT Press.
D. G. Luenberger. Introduction to Linear and Nonlinear Programming. Addison-Wesley, Reading, MA, 1973.
H. Lütkepohl. Handbook of Matrices. John Wiley and Sons, Chichester, 1996.
D. J. C. MacKay. Bayesian Methods for Adaptive Models. PhD thesis, Computation and Neural Systems, California Institute of Technology, Pasadena, CA, 1991.
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720–736, 1992.
S. Mallat and Z. Zhang. Matching Pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41:3397–3415, 1993.
O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Operations Research, 13:444–452, 1965.
B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal of Computing, 25(2):227–234, 1995.
R. Neal. Priors for infinite networks. Technical Report CRG-TR-94-1, Dept. of Computer Science, University of Toronto, 1994.
R. Neal. Bayesian Learning in Neural Networks. Springer, 1996.
Radford M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical report, Dept. of Computer Science, University of Toronto, 1993. CRGTR-93-1.
B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
M. Opper and O. Winther. Mean field methods for classification with Gaussian processes. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 309–315, Cambridge, MA, 1999. MIT Press.
M. Opper and O. Winther. Gaussian processes and SVM: mean field and leaveone-out. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 311–326, Cambridge, MA, 2000. MIT Press.
J. Platt. Probabilities for SV machines. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–73, Cambridge, MA, 2000. MIT Press.
T. Poggio. On optimal nonlinear associative recall. Biological Cybernetics, 19:201–209, 1975.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). Cambridge University Press, Cambridge, 1992. ISBN 0-521-43108-5.
C. Rasmussen. Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression. PhD thesis, Department of Computer Science, University of Toronto, 1996. ftp://ftp.cs.toronto.edu/pub/carl/thesis.ps.gz.
G. Rätsch, S. Mika, and A.J. Smola. Adapting codes und embeddings for polychotomies. In Neural Information Processing Systems, volume 15. MIT Press, 2002. to appear.
B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.
R. T. Rockafellar. Convex Analysis, volume 28 of Princeton Mathematics Series. Princeton University Press, 1970.
P. Ruján and M. Marchand. Computing the Bayes kernel classifier. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 329–347, Cambridge, MA, 2000. MIT Press.
Pál Ruján. Playing billiards in version space. Neural Computation, 9:99–122, 1997.
B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola. Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5):1000–1017, 1999.
B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods-Support Vector Learning, pages 327–352. MIT Press, Cambridge, MA, 1999.
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
M. Seeger. Bayesian methods for support vector machines and Gaussian processes. Master’s thesis, University of Edinburgh, Division of Informatics, 1999.
J. Skilling. Maximum Entropy and Bayesian Methods. Cambridge University Press, 1988.
A. Smola, B. Schölkopf, and G. Rätsch. Linear programs for automatic accuracy control in regression. In Ninth International Conference on Artificial Neural Networks, Conference Publications No. 470, pages 575–580, London, 1999. IEE.
A. J. Smola. Learning with Kernels. PhD thesis, Technische Universität Berlin, 1998. GMD Research Series No. 25.
A. J. Smola and P. L. Bartlett. Sparse greedy Gaussian process regression. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 619–625. MIT Press, 2001.
A. J. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In P. Langley, editor, Proceedings of the International Conference on Machine Learning, pages 911–918, San Francisco, 2000. Morgan Kaufmann Publishers.
A.J. Smola and S.V.N. Vishwanathan. Cholesky factorization for rank-k modifications of diagonal matrices. SIAM Journal of Matrix Analysis, 2002. submitted.
C. Soon-Ong, A.J. Smola, and R.C. Williamson. Superkernels. In Neural Information Processing Systems, volume 15. MIT Press, 2002. to appear.
D. J. Spiegelhalter and S. L. Lauritzen. Sequential updating of conditional probabilities on directed graphical structures. Networks, 20:579–605, 1990.
J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, New York, second edition, 1993.
M. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001.
V. Tresp. A Bayesian committee machine. Neural Computation, 12(11):2719–2741, 2000.
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281–287, Cambridge, MA, 1997. MIT Press.
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
C. Watkins. Dynamic alignment kernels. CSD-TR-98-11, Royal Holloway, University of London, Egham, Surrey, UK, 1999.
G. N. Watson. A Treatise on the Theory of Bessel Functions. Cambridge University Press, Cambridge, UK, 2 edition, 1958.
C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan, editor, Learning and Inference in Graphical Models. Kluwer Academic, 1998.
C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In Micheal Jordan, editor, Learning and Inference in Graphical Models, pages 599–621. MIT Press, 1999.
C. K. I. Williams and C. E. Rasmussen. Gaussian processes for regression. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 514–520, Cambridge, MA, 1996. MIT Press.
Christoper K. I. Williams and Matthias Seeger. Using the Nystrom method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 682–688, Cambridge, MA, 2001. MIT Press.
Christopher K. I. Williams and David Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 20(12):1342–1351, 1998.
T. Zhang. Some sparse approximation bounds for regression problems. In Proc. 18th International Conf. on Machine Learning, pages 624–631. Morgan Kaufmann, San Francisco, CA, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Smola, A.J., Schölkopf, B. (2003). Bayesian Kernel Methods. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_3
Download citation
DOI: https://doi.org/10.1007/3-540-36434-X_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00529-2
Online ISBN: 978-3-540-36434-4
eBook Packages: Springer Book Archive