Abstract
The linear spline regression problem is to determine a piecewise linear function for estimating a set of given points while minimizing a given measure of misfit or error. This is a classical problem in computational statistics and operations research; dynamic programming was proposed as a solution technique more than 40 years ago by Bellman and Roth (J Am Stat Assoc 64:1079–1084, 1969). The algorithm requires a discretization of the solution space to define a grid of candidate breakpoints. This paper proposes an adaptive refinement scheme for the grid of candidate breakpoints in order to allow the dynamic programming method to scale for larger instances of the problem. We evaluate the quality of solutions found on small instances compared with optimal solutions determined by a novel integer programming formulation of the problem. We also consider a generalization of the linear spline regression problem to fit multiple curves that share breakpoint horizontal coordinates, and we extend our method to solve the generalized problem. Computational experiments verify that our nonuniform grid construction schemes are useful for computing high-quality solutions for both the single-curve and two-curve linear spline regression problem.






Similar content being viewed by others
References
Aronov, B., Asanov, T., Katoh, N., Mehlhorn, K.: Polyline fitting of planar points under min-sum criteria. Int. J. Comput. Geom. Appl. 16(2&3), 97–116 (2006)
Bache, K., Lichman, M.: UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine (2013). http://archive.ics.uci.edu/ml
Bai, J., Perron, P.: Computation and analysis of multiple structural change models. J. Appl. Econom. 18, 1–22 (2003)
Bellman, R., Roth, R.: Curve fitting by segmented straight lines. J. Am. Stat. Assoc. 64, 1079–1084 (1969)
Ertel, J.E., Fowlkes, E.B.: Some algorithms for linear spline and piecewise multiple linear regression. J. Am. Stat. Assoc. 71(355), 640–648 (1976)
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–141 (1991)
Guthery, S.B.: Partition regression. J. Am. Stat. Assoc. 348, 945–947 (1974)
Hannah, L.A., Dunson, D.B.: Multivariate convex regression with adaptive partitioning. J. Mach. Learn. Res. 14, 3261–3294 (2013)
Kehagias, A., Nidelkou, E., Petridis, V.: A dynamic programming segmentation procedure for hydrological and environmental time series. Stoch. Environ. Res. Risk Assess. 20(1), 77–94 (2005)
Toriello, A., Vielma, J.P.: Fitting piecewise linear continuous functions. Eur. J. Oper. Res. 219(1), 89–95 (2012)
Weber, G.-W., Batmaz, I., Köksal, G., Taylan, P., Yelikaya-Özkurt, F.: CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Probl. Sci. Eng. 20(3), 371–400 (2012)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Proposition 2
The DP requires the evaluation of \(F_{\widehat{m}}(i,j)\) for \(\widehat{m}=1,\ldots , m\), \(i=1,\ldots ,p\), and \(j\in S(t_i)\).
First, \(F_1(i,j)\) is evaluated for \(i=2,\ldots ,p-m+1\) and and each such evaluation is \(O(1)\) (assuming that \(E\) is given). For each \(i\in \{1,\ldots ,p-(m-1)\}\), \(|S^w(t_i)|\le q\) for each \(w\in W\). Then \(q^{2v}\) operations are used to enumerate elements of \(S^w(t_1)\times S^w(t_i)\) in order to determine
, for
. Hence, in total at most \((p-m)q^{2v}\) evaluations are required to determine \(F_1(i,j)\) for \(i=2,\ldots ,p-m+1\) and
.
Next, for \(\widehat{m}\in \{ 2,\ldots ,m\}\), \(F_{\widehat{m}}(i,j)\) is evaluated for \(i\in I \mathop {=}\limits ^{\text {Def}} \{\widehat{m}+1,\ldots ,p-m+\widehat{m}\}\) and . Note that \(\left| I\right| = p-m\). For a given \(i\in I\) and \(j\in S(t_{i})\), \(F_{\widehat{m}}(i,j)\) evaluates \(F_{\widehat{m}-1}(k,\cdot )\) for \(k\in \{\widehat{m}+1,\ldots , i-1\}\). For each pair \((k,i)\in I\times I\) with \(k\ne i\), and \(w\in W\), at most \(q^2\) evaluations are required. Then, the number of segment combinations over \(w\in W\) is \(q^{2\left| W\right| }=q^{2v}\). There are at most \({(p-m)\atopwithdelims ()2}\) such pairs (where \(k\ne i\)), so that at most \(\frac{(p-m)(p-m-1)}{2}q^{2v}\) evaluations are required to compute \(F_{\widehat{m}}(i,j)\) for \(i=\widehat{m},\ldots ,p-m+\widehat{m}\) and
.
Hence, the number of operations is bounded by
\(\square \)
Rights and permissions
About this article
Cite this article
Goldberg, N., Kim, Y., Leyffer, S. et al. Adaptively refined dynamic program for linear spline regression. Comput Optim Appl 58, 523–541 (2014). https://doi.org/10.1007/s10589-014-9647-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-014-9647-y