A combined Bayes — maximum likelihood method for regression

Chervonenkis, Alexei; Gammerman, Alex; Herbster, Mark

doi:10.1007/978-3-7091-2580-9_2

Alexei Chervonenkis¹⁰,
Alex Gammerman¹¹ &
Mark Herbster¹¹

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 431))

208 Accesses

Abstract

In this paper we propose an efficient method for model selection. We apply this method to select the degree of regularization, and either the number of basis functions or the parameters of a kernel function to be used in a regression of the data. The method combines the well-known Bayesian approach with the maximum likelihood method. The Bayesian approach is applied to a set of models with conventional priors that depend on unknown parameters, and the maximum likelihood method is used to determine these parameters. When parameter values determine the complexity of a model, a determination of model complexity is thus obtained. Under the assumption of Gaussian noise the method leads to a computationally feasible procedure for determining the optimum number of basis functions and the degree of regularization in ridge regression. This procedure is an inexpensive alternative to cross-validation. In the non-Gaussian case we show connections to support vectors methods. We also present experimental results comparing this method to other methods of model complexity selection, including cross-validation.

A very preliminary version of this research [CCGH99] was presented at the IJCAI99 Workshop on Support Vector Machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. A. Aizerman, E. M. Braverman. and L. 1. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 25: 821–837. 1964.
Google Scholar
H. Akaike. Statistical predictor identification. Annals of the Institute for Statistical Mathematics, 22: 203–217, 1970.
Article MATH MathSciNet Google Scholar
C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/—mlearn/MLRcpository.html.
Google Scholar
A. Chervonenkis, P. Chervonenkis, A. Gammerman. and M. Herhster. A combined bayesian - maximum likelihood approach to model selection. In Proceedings of IJCAI99 Workshop on Support Vector Machines.,Stockholm, I999.
Google Scholar
V. Cherkassky, F. Mulier, and V. Vapnik. Comparison of vc-method with classical methods for model selection. In Proceeding of the World Congress on Neural Networks, pages 957–962, 1996.
Google Scholar
P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik, 31: 377–403, 1979.
Article MATH MathSciNet Google Scholar
H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 155. The MIT Press, 1997.
Google Scholar
W. Härdle. Applied Nonparametric Regression. Springer Verlag, Berlin, 1992.
Google Scholar
A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12: 55–67, 1970.
Article MATH Google Scholar
D. Harrison and D.L. Rubinfeld. Hedonic prices and the demand for clean air. J. Environ. Economics Management, 5: 81–102, 1978.
Article MATH Google Scholar
Michael Kearns, Yishay Mansour, Andrew Y. Ng, and Dana Ron. An experimental and theoretical comparison of model selection methods. Machine Learning, 27: 7–50, 1997.
Article Google Scholar
D. G. Krige. A review of the development of geostatistics in south africa. In M. Guarascio, M. David, and C. Huijbregts, editors, Advanced geostatistics in the mining industry, pages 279–293. Reidel, 1976.
Google Scholar
D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4 (3): 415–447, 1992.
Article Google Scholar
G. Matheron. Principles of geostatistics. Economic geology, 58: 1246–1266, 1963.
Article Google Scholar
J. Rissanen. Parameter estimation by shortest description of data. Proc DACE Conf RSME, pages 593—?, 1976.
Google Scholar
J. Rissanen. Stochastic complexity (with discussion). Journal of the Royal Statistical Society series B, 49: 223–239, 1987.
MATH MathSciNet Google Scholar
G. Schwartz. Estimating the dimension of a model. Annals of statistics. 6: 461–464, 1978.
Article MathSciNet Google Scholar
M. O. Stitson, A. Gammerman, V. N. Vapnik. V. Vovk, C. Watkins. and J. Weston. Support vector regression with anova decomposition kernels. Technical report, Royal Holloway, University of London. 1997.
Google Scholar
G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Pmvc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann. San Francisco. CA. 1998.
Google Scholar
R. Shibata. An optimal selection of regresion variables. Bio, netrika, 68: 45–54, 1981.
MATH Google Scholar
V.F. Turchin, V.P. Kozlov, and M.S. Malkevich. Application of mathematical statistics methods for ill posed problem solving (rus.). Uspehi. Phys. Nauk., 102: 345–386, 1970.
Article Google Scholar
V. N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, Berlin, 1982.
Google Scholar
V. Vapnik. Statistical Learning Theory. John_Wiley. 1998.
Google Scholar
C. S. Wallace. On the selection of the order of a polynomial model. Technical report, Royal Holloway. 1997.
Google Scholar
C. Wallace and D. Boulton. An information measure for classification. Computing Journal, 11(2): 185–195. August 1968.
Google Scholar
C. S. Wallace and P. R. Freeman. Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B, 49: 240–265, 1987.
MATH MathSciNet Google Scholar
C. K. I. Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. Technical report. Aston University. UK, 1997. To appear in: Learning and Inference in Graphical Models, ed. M. L Jordan, Kluwer. 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Control Sciences, 5 Profsoyuznaya Street, 117806, Moscow GSP-4, Russia
Alexei Chervonenkis
Computer Learning Research Centre Department of Computer Science Royal Holloway, University of London, Egham, Surrey, TW20 0EX, England
Alex Gammerman & Mark Herbster

Authors

Alexei Chervonenkis
View author publications
You can also search for this author in PubMed Google Scholar
Alex Gammerman
View author publications
You can also search for this author in PubMed Google Scholar
Mark Herbster
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Matematica e Informatica, Università di Udine, Via della Scienze, 206, I-33100, Udine, Italy
Giacomo Della Riccia
Inst. für Wirtschaftsinformatik, Freie Universität Berlin, Garystr. 21, D-14195, Berlin, Germany
Hans-Joachim Lenz
Institut für Informatik, Otto-von-Guericke-Universität, Universitätsplatz 2, D-39106, Magdeburg, Germany
Rudolf Kruse

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chervonenkis, A., Gammerman, A., Herbster, M. (2001). A combined Bayes — maximum likelihood method for regression. In: Della Riccia, G., Lenz, HJ., Kruse, R. (eds) Data Fusion and Perception. International Centre for Mechanical Sciences, vol 431. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2580-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-7091-2580-9_2
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83683-5
Online ISBN: 978-3-7091-2580-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics