Skip to main content

A combined Bayes — maximum likelihood method for regression

  • Chapter
Data Fusion and Perception

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 431))

  • 208 Accesses

Abstract

In this paper we propose an efficient method for model selection. We apply this method to select the degree of regularization, and either the number of basis functions or the parameters of a kernel function to be used in a regression of the data. The method combines the well-known Bayesian approach with the maximum likelihood method. The Bayesian approach is applied to a set of models with conventional priors that depend on unknown parameters, and the maximum likelihood method is used to determine these parameters. When parameter values determine the complexity of a model, a determination of model complexity is thus obtained. Under the assumption of Gaussian noise the method leads to a computationally feasible procedure for determining the optimum number of basis functions and the degree of regularization in ridge regression. This procedure is an inexpensive alternative to cross-validation. In the non-Gaussian case we show connections to support vectors methods. We also present experimental results comparing this method to other methods of model complexity selection, including cross-validation.

A very preliminary version of this research [CCGH99] was presented at the IJCAI99 Workshop on Support Vector Machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. A. Aizerman, E. M. Braverman. and L. 1. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 25: 821–837. 1964.

    Google Scholar 

  2. H. Akaike. Statistical predictor identification. Annals of the Institute for Statistical Mathematics, 22: 203–217, 1970.

    Article  MATH  MathSciNet  Google Scholar 

  3. C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/—mlearn/MLRcpository.html.

    Google Scholar 

  4. A. Chervonenkis, P. Chervonenkis, A. Gammerman. and M. Herhster. A combined bayesian - maximum likelihood approach to model selection. In Proceedings of IJCAI99 Workshop on Support Vector Machines.,Stockholm, I999.

    Google Scholar 

  5. V. Cherkassky, F. Mulier, and V. Vapnik. Comparison of vc-method with classical methods for model selection. In Proceeding of the World Congress on Neural Networks, pages 957–962, 1996.

    Google Scholar 

  6. P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik, 31: 377–403, 1979.

    Article  MATH  MathSciNet  Google Scholar 

  7. H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 155. The MIT Press, 1997.

    Google Scholar 

  8. W. Härdle. Applied Nonparametric Regression. Springer Verlag, Berlin, 1992.

    Google Scholar 

  9. A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12: 55–67, 1970.

    Article  MATH  Google Scholar 

  10. D. Harrison and D.L. Rubinfeld. Hedonic prices and the demand for clean air. J. Environ. Economics Management, 5: 81–102, 1978.

    Article  MATH  Google Scholar 

  11. Michael Kearns, Yishay Mansour, Andrew Y. Ng, and Dana Ron. An experimental and theoretical comparison of model selection methods. Machine Learning, 27: 7–50, 1997.

    Article  Google Scholar 

  12. D. G. Krige. A review of the development of geostatistics in south africa. In M. Guarascio, M. David, and C. Huijbregts, editors, Advanced geostatistics in the mining industry, pages 279–293. Reidel, 1976.

    Google Scholar 

  13. D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4 (3): 415–447, 1992.

    Article  Google Scholar 

  14. G. Matheron. Principles of geostatistics. Economic geology, 58: 1246–1266, 1963.

    Article  Google Scholar 

  15. J. Rissanen. Parameter estimation by shortest description of data. Proc DACE Conf RSME, pages 593—?, 1976.

    Google Scholar 

  16. J. Rissanen. Stochastic complexity (with discussion). Journal of the Royal Statistical Society series B, 49: 223–239, 1987.

    MATH  MathSciNet  Google Scholar 

  17. G. Schwartz. Estimating the dimension of a model. Annals of statistics. 6: 461–464, 1978.

    Article  MathSciNet  Google Scholar 

  18. M. O. Stitson, A. Gammerman, V. N. Vapnik. V. Vovk, C. Watkins. and J. Weston. Support vector regression with anova decomposition kernels. Technical report, Royal Holloway, University of London. 1997.

    Google Scholar 

  19. G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Pmvc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann. San Francisco. CA. 1998.

    Google Scholar 

  20. R. Shibata. An optimal selection of regresion variables. Bio, netrika, 68: 45–54, 1981.

    MATH  Google Scholar 

  21. V.F. Turchin, V.P. Kozlov, and M.S. Malkevich. Application of mathematical statistics methods for ill posed problem solving (rus.). Uspehi. Phys. Nauk., 102: 345–386, 1970.

    Article  Google Scholar 

  22. V. N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, Berlin, 1982.

    Google Scholar 

  23. V. Vapnik. Statistical Learning Theory. John_Wiley. 1998.

    Google Scholar 

  24. C. S. Wallace. On the selection of the order of a polynomial model. Technical report, Royal Holloway. 1997.

    Google Scholar 

  25. C. Wallace and D. Boulton. An information measure for classification. Computing Journal, 11(2): 185–195. August 1968.

    Google Scholar 

  26. C. S. Wallace and P. R. Freeman. Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B, 49: 240–265, 1987.

    MATH  MathSciNet  Google Scholar 

  27. C. K. I. Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. Technical report. Aston University. UK, 1997. To appear in: Learning and Inference in Graphical Models, ed. M. L Jordan, Kluwer. 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Wien

About this chapter

Cite this chapter

Chervonenkis, A., Gammerman, A., Herbster, M. (2001). A combined Bayes — maximum likelihood method for regression. In: Della Riccia, G., Lenz, HJ., Kruse, R. (eds) Data Fusion and Perception. International Centre for Mechanical Sciences, vol 431. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2580-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-2580-9_2

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-211-83683-5

  • Online ISBN: 978-3-7091-2580-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics