Abstract
The paper addresses the task of polynomial regression, i.e., the task of inducing polynomials from numeric data that can be used to predict the value of a selected numeric variable. As in other learning tasks, we face the problem of finding an optimal trade-off between the complexity of the induced model and its predictive error. One of the approaches to finding this optimal trade-off is the minimal description length (MDL) principle. In this paper, we propose an MDL scheme for polynomial regression, which includes coding schemes for polynomials and the errors they make on data. We empirically compare this principled MDL scheme to an ad-hoc MDL scheme and show that it performs better. The improvements in performance are such that the polynomial regression approach we propose is now comparable in performance to other commonly used methods for regression, such as model trees.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
Grünwald, P., Myung, I., Pitt, M. (eds.): Advances in minimum description length: Theory and applications. MIT Press, Cambridge (2005)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, New York (2001)
Newman, D., Hettich, C.B.S., Merz, C.: UCI repository of machine learning databases (1998)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11, 416–431 (1983)
Rissanen, J.: Mdl denoising. IEEE Transactions on Information Theory 46, 2537–2543 (1999)
Robnik, M.: Pruning regression trees with mdl. In: Proceedings of the European Conference on Artificial Intelligence, pp. 455–459. John Wiley and Sons, Brighton (1998)
Todorovski, L., Ljubič, P., Džeroski, S.: Inducing polynomial equations for regression. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 441–452. ACM Press, Banff, Alberta, Canada (2004)
Torgo, L.: Regression datasets (1998)
Witten, I.H., Frank, E. (eds.): Data mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pečkov, A., Džeroski, S., Todorovski, L. (2008). A Minimal Description Length Scheme for Polynomial Regression. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)