Abstract
In this paper we propose an efficient method for model selection. We apply this method to select the degree of regularization, and either the number of basis functions or the parameters of a kernel function to be used in a regression of the data. The method combines the well-known Bayesian approach with the maximum likelihood method. The Bayesian approach is applied to a set of models with conventional priors that depend on unknown parameters, and the maximum likelihood method is used to determine these parameters. When parameter values determine the complexity of a model, a determination of model complexity is thus obtained. Under the assumption of Gaussian noise the method leads to a computationally feasible procedure for determining the optimum number of basis functions and the degree of regularization in ridge regression. This procedure is an inexpensive alternative to cross-validation. In the non-Gaussian case we show connections to support vectors methods. We also present experimental results comparing this method to other methods of model complexity selection, including cross-validation.
A very preliminary version of this research [CCGH99] was presented at the IJCAI99 Workshop on Support Vector Machines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. A. Aizerman, E. M. Braverman. and L. 1. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 25: 821–837. 1964.
H. Akaike. Statistical predictor identification. Annals of the Institute for Statistical Mathematics, 22: 203–217, 1970.
C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/—mlearn/MLRcpository.html.
A. Chervonenkis, P. Chervonenkis, A. Gammerman. and M. Herhster. A combined bayesian - maximum likelihood approach to model selection. In Proceedings of IJCAI99 Workshop on Support Vector Machines.,Stockholm, I999.
V. Cherkassky, F. Mulier, and V. Vapnik. Comparison of vc-method with classical methods for model selection. In Proceeding of the World Congress on Neural Networks, pages 957–962, 1996.
P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik, 31: 377–403, 1979.
H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 155. The MIT Press, 1997.
W. Härdle. Applied Nonparametric Regression. Springer Verlag, Berlin, 1992.
A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12: 55–67, 1970.
D. Harrison and D.L. Rubinfeld. Hedonic prices and the demand for clean air. J. Environ. Economics Management, 5: 81–102, 1978.
Michael Kearns, Yishay Mansour, Andrew Y. Ng, and Dana Ron. An experimental and theoretical comparison of model selection methods. Machine Learning, 27: 7–50, 1997.
D. G. Krige. A review of the development of geostatistics in south africa. In M. Guarascio, M. David, and C. Huijbregts, editors, Advanced geostatistics in the mining industry, pages 279–293. Reidel, 1976.
D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4 (3): 415–447, 1992.
G. Matheron. Principles of geostatistics. Economic geology, 58: 1246–1266, 1963.
J. Rissanen. Parameter estimation by shortest description of data. Proc DACE Conf RSME, pages 593—?, 1976.
J. Rissanen. Stochastic complexity (with discussion). Journal of the Royal Statistical Society series B, 49: 223–239, 1987.
G. Schwartz. Estimating the dimension of a model. Annals of statistics. 6: 461–464, 1978.
M. O. Stitson, A. Gammerman, V. N. Vapnik. V. Vovk, C. Watkins. and J. Weston. Support vector regression with anova decomposition kernels. Technical report, Royal Holloway, University of London. 1997.
G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Pmvc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann. San Francisco. CA. 1998.
R. Shibata. An optimal selection of regresion variables. Bio, netrika, 68: 45–54, 1981.
V.F. Turchin, V.P. Kozlov, and M.S. Malkevich. Application of mathematical statistics methods for ill posed problem solving (rus.). Uspehi. Phys. Nauk., 102: 345–386, 1970.
V. N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, Berlin, 1982.
V. Vapnik. Statistical Learning Theory. John_Wiley. 1998.
C. S. Wallace. On the selection of the order of a polynomial model. Technical report, Royal Holloway. 1997.
C. Wallace and D. Boulton. An information measure for classification. Computing Journal, 11(2): 185–195. August 1968.
C. S. Wallace and P. R. Freeman. Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B, 49: 240–265, 1987.
C. K. I. Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. Technical report. Aston University. UK, 1997. To appear in: Learning and Inference in Graphical Models, ed. M. L Jordan, Kluwer. 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Wien
About this chapter
Cite this chapter
Chervonenkis, A., Gammerman, A., Herbster, M. (2001). A combined Bayes — maximum likelihood method for regression. In: Della Riccia, G., Lenz, HJ., Kruse, R. (eds) Data Fusion and Perception. International Centre for Mechanical Sciences, vol 431. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2580-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-7091-2580-9_2
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83683-5
Online ISBN: 978-3-7091-2580-9
eBook Packages: Springer Book Archive