Skip to main content

A Minimal Description Length Scheme for Polynomial Regression

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Abstract

The paper addresses the task of polynomial regression, i.e., the task of inducing polynomials from numeric data that can be used to predict the value of a selected numeric variable. As in other learning tasks, we face the problem of finding an optimal trade-off between the complexity of the induced model and its predictive error. One of the approaches to finding this optimal trade-off is the minimal description length (MDL) principle. In this paper, we propose an MDL scheme for polynomial regression, which includes coding schemes for polynomials and the errors they make on data. We empirically compare this principled MDL scheme to an ad-hoc MDL scheme and show that it performs better. The improvements in performance are such that the polynomial regression approach we propose is now comparable in performance to other commonly used methods for regression, such as model trees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International, Belmont (1984)

    MATH  Google Scholar 

  2. Grünwald, P., Myung, I., Pitt, M. (eds.): Advances in minimum description length: Theory and applications. MIT Press, Cambridge (2005)

    Google Scholar 

  3. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, New York (2001)

    MATH  Google Scholar 

  4. Newman, D., Hettich, C.B.S., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  5. Rissanen, J.: A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11, 416–431 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  6. Rissanen, J.: Mdl denoising. IEEE Transactions on Information Theory 46, 2537–2543 (1999)

    Article  MathSciNet  Google Scholar 

  7. Robnik, M.: Pruning regression trees with mdl. In: Proceedings of the European Conference on Artificial Intelligence, pp. 455–459. John Wiley and Sons, Brighton (1998)

    Google Scholar 

  8. Todorovski, L., Ljubič, P., Džeroski, S.: Inducing polynomial equations for regression. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 441–452. ACM Press, Banff, Alberta, Canada (2004)

    Google Scholar 

  9. Torgo, L.: Regression datasets (1998)

    Google Scholar 

  10. Witten, I.H., Frank, E. (eds.): Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pečkov, A., Džeroski, S., Todorovski, L. (2008). A Minimal Description Length Scheme for Polynomial Regression. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68125-0_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68124-3

  • Online ISBN: 978-3-540-68125-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics