Abstract
The prediction sum of squares statistic uses the principle of leave-one-out cross-validation in linear least squares regression. It is computationally attractive, as it can be computed non-iteratively. However, it has limitations: it does not handle coupled measurements, which should be held out simultaneously, and is specific to the principle of leave-one-out, which is known to overfit when used for selecting a model’s complexity. We propose multiple-exclusion PRESS (MEXPRESS), which generalizes PRESS to coupled measurements and other types of cross-validation, while retaining computational efficiency with the non-iterative MEXPRESS formula. Using MEXPRESS, various strategies to resolve overfitting can be efficiently implemented. The core principle is to exclude training data too ‘close’ or too ‘similar’ to the validation data. We show that this allows one to select the number of control points automatically in three cases: (i) the estimation of linear fractional warps for dense image registration from point correspondences, (ii) surface reconstruction from a dense depth-map obtained by a depth sensor and (iii) surface reconstruction from a sparse point cloud obtained by shape-from-template.
Similar content being viewed by others
Notes
We use \(l_{\max } = 6\) for \(n=10\), \(l_{\max } = 7\) for \(n=15\) and \(l_{\max } = \min (\text {round}(\frac{m}{2}),100)\) otherwise.
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automation and Control, AC–19(6), 716–723.
Allen, D. M. (1971). Mean square error of prediction as a criterion for selecting variables. Technometrics, 13(3), 469–475.
Bartoli, A. (2008). Maximizing the predictivity of smooth deformable image warps through cross-validation. Journal of Mathematical Imaging and Vision, 31(2–3), 133–145.
Bartoli, A. (2009). On computing the prediction sum of squares statistic in linear least squares problems with multiple parameter or measurement sets. International Journal of Computer Vision, 85(2), 133–142.
Bartoli, A., Perriollat, M., & Chambon, S. (2010). Generalized thin-plate spline warps. International Journal of Computer Vision, 88(1), 85–110.
Bartoli, A., Pizarro, D., & Collins, T. (2013). A robust analytical solution to isometric shape-from-template with focal length calibration. In International conference on computer vision.
Bartoli, A., Gérard, Y., Chadebecq, F., Collins, T., & Pizarro, D. (2015). Shape-from-template. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 2099–2118.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Bookstein, F. L. (1986). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6), 567–585.
Breiman, L. (1993). Fitting additive models to regression data. Computational Statistics and Data Analysis, 15, 13–46.
Brunet, F., Bartoli, A., Malgouyres, R., & Navab, N. (2009). NURBS warps. In British machine vision conference.
Dierckx, P. (1981). An algorithm for surface-fitting with spline functions. IMA Journal of Numerical Analysis, 1(3), 267–283.
Dierckx, P. (1993). Curve and surface fitting with splines. Oxford: Oxford University Press.
Forsyth, D., & Ponce, J. (2003). Computer vision: A modern approach (2nd ed.). Upper Saddle River: Prentice Hall.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.
Hartley, R. I., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge, MA: Cambridge University Press.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.
Jupp, D. L. B. (1978). Approximation to data by splines with free knots. SIAM Journal on Numerical Analysis, 15(2), 328–343.
Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In International conference on computer vision.
Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
Miyata, S., & Shen, X. (2005). Free-knot splines and adaptive knot selection. Journal of the Japanese Statistical Society, 35(2), 303–324.
Molinari, N., Durand, J.-F., & Sabatier, R. (2004). Bounded optimal knots for regression splines. Computational Statistics and Data Analysis, 45, 159–178.
Nguyen, N., Milanfar, P., & Golub, G. (2001). Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(9), 1299–1308.
Perriollat, M., & Bartoli, A. (2013). A computational model of bounded developable surfaces with application to image-based 3D reconstruction. Computer Animation and Virtual Worlds, 24(5), 459–476.
Perriollat, M., Hartley, R., & Bartoli, A. (2011). Monocular template-based reconstruction of inextensible surfaces. International Journal of Computer Vision, 95(2), 124–137.
Rueckert, D., Sonoda, L. I., Hayes, C., Hill, D. L. G., Leach, M. O., & Hawkes, D. J. (1999). Nonrigid registation using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8), 712–721.
Salzmann, M., Pilet, J., Ilic, S., & Fua, P. (2007). Surface deformation models for nonrigid 3D shape recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1–7.
Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization optimization, and beyond. Cambridge, MA: MIT Press.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Sevilla-Lara, L., & Learned-Miller, E. (2012). Distribution fields for tracking. In International conference on computer vision and pattern recognition.
Steinbrücker, F., Kerl, C., Sturm, J., & Cremers, D. (2013). Large-scale multi-resolution surface reconstruction from RGB-D sequences. In International conference on computer vision.
Süssmuth, J., Meyer, Q., & Greiner, G. (2010). Surface reconstruction based on hierarchical floating radial basis functions. Computer Graphics Forum, 29(6), 1854–1864.
Tang, H., Joshi, N., & Kapoor, A. (2011). Learning a blind measure of perceptual image quality. In International conference on computer vision and pattern recognition.
Varol, A., Salzmann, M., Fua, P., & Urtasun, R. (2012). A constrained latent variable model. In International conference on computer vision and pattern recognition.
Xu, W., Salzmann, M., Wang, Y., & Liu, Y. (2014). Nonrigid surface registration and completion from RGBD images. In European conference on computer vision.
Yan, X., & Su, X.G. (2009). Linear regression analysis: Theory and computing. Singapore: World Scientific Publishing. ISBN 9812834109.
Zollhöfer, M., Niessner, M., Izadi, S., Rhemann, C., Zach, C., Fisher, M., et al. (2014). Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics, 33(4), 156.
Acknowledgments
I thank Mathias Gallardo and Ajad Chhatkuli for their help in creating the surface fitting datasets and the authors of Varol et al. (2012) for the paper SfT dataset. This research has received funding from the EU’s FP7 through the ERC research Grant 307483 FLEXABLE.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jiri Matas.
Appendix: Proof of Proposition 1
Appendix: Proof of Proposition 1
The proof of proposition 1 follows the same steps as the proof for the PRESS (Bartoli 2009). It requires the following lemma.
Lemma 1
The model estimate \(\bar{\mathbf {x}}_{(\mathcal {K})}\) obtained by holding out a set of measurements with index set \(\mathcal {K}\in [1,m]\) is the same as the one obtained by replacing the \(\mathcal {K}\) responses by their predictions with the model \(\bar{\mathbf {x}}_{(\mathcal {K})}\):
where we used the notation \(\mathtt I _{(\mathcal {K})}\) for an identity matrix whose diagonal entries with index in \(\mathcal {K}\) are put to zero, and \(\mathtt I _{(-\mathcal {K})} \mathop {=}\limits ^{ def }\mathtt I- \mathtt I _{(\mathcal {K})}\). Formally, \(\mathtt I _{(\mathcal {K})} = \text {diag}(\mathbf {d})\) with \(\mathbf {d}_\mathcal {K}= 0\) and \(\mathbf {d}_{-\mathcal {K}} = 1\).
Proof of lemma 1
Using the definition of \({\tilde{\mathbf {b}}}_{(\mathcal {K})}\) we have:
Because \(\mathtt I _{(-\mathcal {K})} = \mathtt I- \mathtt I _{(\mathcal {K})}\) this may be expanded as:
The second term is simplified to \({\mathtt A}^{\dag } \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} = \bar{\mathbf {x}}_{(\mathcal {K})}\). Using the definition of \(\bar{\mathbf {x}}_{(\mathcal {K})}\), the third term is simplified to \({\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} = {\mathtt A}^{\dag }\mathtt I _{(\mathcal {K})}\mathbf {b}\). The overall expression is thus simplified to:
concluding the proof. \(\square \)
Proof of proposition 1
The case of an empty set \(\mathcal {K}\) is straightforward as then \(\mathbf {e}_{(\mathcal {K})}\) is a zero-length vector. We start by rewriting the predictions in \(\mathcal {K}\) by the model fitted with all measurements as \(\mathtt A _\mathcal {K}\bar{\mathbf {x}} = \mathtt A _\mathcal {K}{\mathtt A}^{\dag } \mathbf {b} = \hat{\mathtt A} _\mathcal {K}\mathbf {b}\). Similarly, we rewrite the predictions in \(\mathcal {K}\) by the model fitted with all but the measurements in \(\mathcal {K}\) as \(\mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} = \mathtt A _\mathcal {K}{\mathtt A}^{\dag } {\tilde{\mathbf {b}}}_{(\mathcal {K})} = \hat{\mathtt A} _\mathcal {K}{\tilde{\mathbf {b}}}_{(\mathcal {K})}\), where the first equality is given by Lemma 1. The difference between the two predictions is thus given by:
where the second equality is obtained by using the definition of \({\tilde{\mathbf {b}}}_{(\mathcal {K})}\). We then use \(\mathtt I-\mathtt I _{(\mathcal {K})}=\mathtt I _{(-\mathcal {K})}\), leading to:
By noting that \(\hat{\mathtt A} _\mathcal {K}\mathtt I _{(-\mathcal {K})}\mathtt A = \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathtt A _\mathcal {K}\) and \(\hat{\mathtt A} _\mathcal {K}\mathtt I _{(-\mathcal {K})}\mathbf {b} = \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathbf {b}_\mathcal {K}\) we rearrange the equation as:
We subtract \(\mathbf {b}_\mathcal {K}\) to each side of the equation:
and arrive at:
concluding the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Bartoli, A. Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting. Int J Comput Vis 122, 61–83 (2017). https://doi.org/10.1007/s11263-016-0954-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0954-x