Abstract
In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th Int. Conf. on Machine Learning, Madison, WI, pp. 515–521 (1998)
Williams, C., Rasmussen, C.: Gaussian Processes for Regression. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, NIPS, vol. 8. MIT Press, Cambridge (1995)
Suykens, J.A.K., De Brabanter, J., Lukas, L., Vanderwalle, J.: Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48, 85–105 (2002)
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London A 209, 415–446 (1909)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines (and other kernel-based learning methods). Cambridge University Press, Cambridge (2000)
Schölkopf, B., Smola, A.J.: Learning with kernels - support vector machines, regularization, optimization and beyond. MIT Press, Cambridge (2002)
Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Satchwell, C.: Finding error bars (the easy way). Neural Computing Applications Forum 5 (1994)
Lowe, D., Zapart, C.: Point-wise confidence interval estimation by neural networks: A comparative study based on automotive engine calibration. Neural Computing and Applications 8, 77–85 (1999)
Nix, D.A., Weigend, A.S.: Estimating the mean and variance of the target probability distribution. In: Proceedings of the IEEE International Conference on Neural Networks, Orlando, FL, vol. 1, pp. 55–60 (1994)
Williams, P.M.: Using neural networks to model conditional multivariate densities. Neural Computation 8, 843–854 (1996)
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 33, 82–95 (1971)
Schölkopf, B., Herbrich, R., Smola, A.J.: A generalised representer theorem. In: Proceedings of the Fourteenth International Conference on Computational Learning Theory, Amsterdam, The Netherlands, pp. 416–426 (2001)
Cawley, G.C., Talbot, N.L.C., Foxall, R.J., Dorling, S.R., Mandic, D.P.: Heteroscedastic kernel ridge regression. Neurocomputing 57, 105–124 (2004)
Foxall, R.J., Cawley, G.C., Talbot, N.L.C., Dorling, S.R., Mandic, D.P.: Heteroscedastic regularised kernel regression for prediction of episodes of poor air quality. In: Proceedings of the European Symposium on Artificial Neural Networks (ESANN 2002), Bruges, Belgium, pp. 19–24 (2002)
Yuan, M., Wahba, G.: Doubly penalized likelihood estimator in heteroscedastic regression. Statistics and Probability Letters 69, 11–20 (2004)
Nabney, I.T.: Efficient training of RBF networks for classification. In: Proceedings of the Ninth International Conference on Artificial Neural Networks, Edinburgh, United Kingdom, vol. 1, pp. 210–215 (1999)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B 36, 111–147 (1974)
Luntz, A., Brailovsky, V.: On estimation of characters obtained in statistical procedure of recognition (in Russian). Techicheskaya Kibernetica 3 (1969)
Cawley, G.C., Talbot, N.L.C.: Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. Pattern Recognition 36, 2585–2592 (2003)
Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Computation 7, 117–143 (1995)
Bishop, C.M., Qazaz, C.S.: Bayesian inference of noise levels in regression. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 59–64. Springer, Heidelberg (1996)
Goldberg, P.W., Williams, C.K.I., Bishop, C.M.: Regression with input-dependent noise: A Gaussian process treatment. In: Jordan, M., Kearns, M., Solla, S. (eds.) Advances in Neural Information Processing Systems, vol. 10, pp. 493–499. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cawley, G.C., Talbot, N.L.C., Chapelle, O. (2006). Estimating Predictive Variances with Kernel Ridge Regression. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds) Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment. MLCW 2005. Lecture Notes in Computer Science(), vol 3944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11736790_5
Download citation
DOI: https://doi.org/10.1007/11736790_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33427-9
Online ISBN: 978-3-540-33428-6
eBook Packages: Computer ScienceComputer Science (R0)