Skip to main content
Log in

Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The prediction sum of squares statistic uses the principle of leave-one-out cross-validation in linear least squares regression. It is computationally attractive, as it can be computed non-iteratively. However, it has limitations: it does not handle coupled measurements, which should be held out simultaneously, and is specific to the principle of leave-one-out, which is known to overfit when used for selecting a model’s complexity. We propose multiple-exclusion PRESS (MEXPRESS), which generalizes PRESS to coupled measurements and other types of cross-validation, while retaining computational efficiency with the non-iterative MEXPRESS formula. Using MEXPRESS, various strategies to resolve overfitting can be efficiently implemented. The core principle is to exclude training data too ‘close’ or too ‘similar’ to the validation data. We show that this allows one to select the number of control points automatically in three cases: (i) the estimation of linear fractional warps for dense image registration from point correspondences, (ii) surface reconstruction from a dense depth-map obtained by a depth sensor and (iii) surface reconstruction from a sparse point cloud obtained by shape-from-template.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. We use \(l_{\max } = 6\) for \(n=10\), \(l_{\max } = 7\) for \(n=15\) and \(l_{\max } = \min (\text {round}(\frac{m}{2}),100)\) otherwise.

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automation and Control, AC–19(6), 716–723.

    Article  MathSciNet  MATH  Google Scholar 

  • Allen, D. M. (1971). Mean square error of prediction as a criterion for selecting variables. Technometrics, 13(3), 469–475.

    Article  MATH  Google Scholar 

  • Bartoli, A. (2008). Maximizing the predictivity of smooth deformable image warps through cross-validation. Journal of Mathematical Imaging and Vision, 31(2–3), 133–145.

    Article  MathSciNet  Google Scholar 

  • Bartoli, A. (2009). On computing the prediction sum of squares statistic in linear least squares problems with multiple parameter or measurement sets. International Journal of Computer Vision, 85(2), 133–142.

    Article  MathSciNet  Google Scholar 

  • Bartoli, A., Perriollat, M., & Chambon, S. (2010). Generalized thin-plate spline warps. International Journal of Computer Vision, 88(1), 85–110.

    Article  Google Scholar 

  • Bartoli, A., Pizarro, D., & Collins, T. (2013). A robust analytical solution to isometric shape-from-template with focal length calibration. In International conference on computer vision.

  • Bartoli, A., Gérard, Y., Chadebecq, F., Collins, T., & Pizarro, D. (2015). Shape-from-template. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 2099–2118.

    Article  Google Scholar 

  • Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Bookstein, F. L. (1986). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6), 567–585.

    Article  MATH  Google Scholar 

  • Breiman, L. (1993). Fitting additive models to regression data. Computational Statistics and Data Analysis, 15, 13–46.

    Article  MathSciNet  MATH  Google Scholar 

  • Brunet, F., Bartoli, A., Malgouyres, R., & Navab, N. (2009). NURBS warps. In British machine vision conference.

  • Dierckx, P. (1981). An algorithm for surface-fitting with spline functions. IMA Journal of Numerical Analysis, 1(3), 267–283.

    Article  MathSciNet  MATH  Google Scholar 

  • Dierckx, P. (1993). Curve and surface fitting with splines. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Forsyth, D., & Ponce, J. (2003). Computer vision: A modern approach (2nd ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Hartley, R. I., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge, MA: Cambridge University Press.

    MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Jupp, D. L. B. (1978). Approximation to data by splines with free knots. SIAM Journal on Numerical Analysis, 15(2), 328–343.

    Article  MathSciNet  MATH  Google Scholar 

  • Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In International conference on computer vision.

  • Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.

    Article  MathSciNet  MATH  Google Scholar 

  • Miyata, S., & Shen, X. (2005). Free-knot splines and adaptive knot selection. Journal of the Japanese Statistical Society, 35(2), 303–324.

    Article  MathSciNet  Google Scholar 

  • Molinari, N., Durand, J.-F., & Sabatier, R. (2004). Bounded optimal knots for regression splines. Computational Statistics and Data Analysis, 45, 159–178.

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen, N., Milanfar, P., & Golub, G. (2001). Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(9), 1299–1308.

    MathSciNet  MATH  Google Scholar 

  • Perriollat, M., & Bartoli, A. (2013). A computational model of bounded developable surfaces with application to image-based 3D reconstruction. Computer Animation and Virtual Worlds, 24(5), 459–476.

    Article  Google Scholar 

  • Perriollat, M., Hartley, R., & Bartoli, A. (2011). Monocular template-based reconstruction of inextensible surfaces. International Journal of Computer Vision, 95(2), 124–137.

    Article  MathSciNet  MATH  Google Scholar 

  • Rueckert, D., Sonoda, L. I., Hayes, C., Hill, D. L. G., Leach, M. O., & Hawkes, D. J. (1999). Nonrigid registation using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8), 712–721.

    Article  Google Scholar 

  • Salzmann, M., Pilet, J., Ilic, S., & Fua, P. (2007). Surface deformation models for nonrigid 3D shape recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1–7.

    Article  Google Scholar 

  • Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization optimization, and beyond. Cambridge, MA: MIT Press.

    Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Sevilla-Lara, L., & Learned-Miller, E. (2012). Distribution fields for tracking. In International conference on computer vision and pattern recognition.

  • Steinbrücker, F., Kerl, C., Sturm, J., & Cremers, D. (2013). Large-scale multi-resolution surface reconstruction from RGB-D sequences. In International conference on computer vision.

  • Süssmuth, J., Meyer, Q., & Greiner, G. (2010). Surface reconstruction based on hierarchical floating radial basis functions. Computer Graphics Forum, 29(6), 1854–1864.

    Article  Google Scholar 

  • Tang, H., Joshi, N., & Kapoor, A. (2011). Learning a blind measure of perceptual image quality. In International conference on computer vision and pattern recognition.

  • Varol, A., Salzmann, M., Fua, P., & Urtasun, R. (2012). A constrained latent variable model. In International conference on computer vision and pattern recognition.

  • Xu, W., Salzmann, M., Wang, Y., & Liu, Y. (2014). Nonrigid surface registration and completion from RGBD images. In European conference on computer vision.

  • Yan, X., & Su, X.G. (2009). Linear regression analysis: Theory and computing. Singapore: World Scientific Publishing. ISBN 9812834109.

  • Zollhöfer, M., Niessner, M., Izadi, S., Rhemann, C., Zach, C., Fisher, M., et al. (2014). Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics, 33(4), 156.

    Article  Google Scholar 

Download references

Acknowledgments

I thank Mathias Gallardo and Ajad Chhatkuli for their help in creating the surface fitting datasets and the authors of Varol et al. (2012) for the paper SfT dataset. This research has received funding from the EU’s FP7 through the ERC research Grant 307483 FLEXABLE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrien Bartoli.

Additional information

Communicated by Jiri Matas.

Appendix: Proof of Proposition 1

Appendix: Proof of Proposition 1

The proof of proposition 1 follows the same steps as the proof for the PRESS (Bartoli 2009). It requires the following lemma.

Lemma 1

The model estimate \(\bar{\mathbf {x}}_{(\mathcal {K})}\) obtained by holding out a set of measurements with index set \(\mathcal {K}\in [1,m]\) is the same as the one obtained by replacing the \(\mathcal {K}\) responses by their predictions with the model \(\bar{\mathbf {x}}_{(\mathcal {K})}\):

(22)

where we used the notation \(\mathtt I _{(\mathcal {K})}\) for an identity matrix whose diagonal entries with index in \(\mathcal {K}\) are put to zero, and \(\mathtt I _{(-\mathcal {K})} \mathop {=}\limits ^{ def }\mathtt I- \mathtt I _{(\mathcal {K})}\). Formally, \(\mathtt I _{(\mathcal {K})} = \text {diag}(\mathbf {d})\) with \(\mathbf {d}_\mathcal {K}= 0\) and \(\mathbf {d}_{-\mathcal {K}} = 1\).

Proof of lemma 1

Using the definition of \({\tilde{\mathbf {b}}}_{(\mathcal {K})}\) we have:

$$\begin{aligned} {\mathtt A}^{\dag }{\tilde{\mathbf {b}}}_{(\mathcal {K})} ={\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} + {\mathtt A}^{\dag } \mathtt I _{(-\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} \end{aligned}$$
(23)

Because \(\mathtt I _{(-\mathcal {K})} = \mathtt I- \mathtt I _{(\mathcal {K})}\) this may be expanded as:

$$\begin{aligned} {\mathtt A}^{\dag }{\tilde{\mathbf {b}}}_{(\mathcal {K})} ={\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} + {\mathtt A}^{\dag } \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} - {\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})}. \end{aligned}$$
(24)

The second term is simplified to \({\mathtt A}^{\dag } \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} = \bar{\mathbf {x}}_{(\mathcal {K})}\). Using the definition of \(\bar{\mathbf {x}}_{(\mathcal {K})}\), the third term is simplified to \({\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} = {\mathtt A}^{\dag }\mathtt I _{(\mathcal {K})}\mathbf {b}\). The overall expression is thus simplified to:

$$\begin{aligned} {\mathtt A}^{\dag }{\tilde{\mathbf {b}}}_{(\mathcal {K})} ={\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} + \bar{\mathbf {x}}_{(\mathcal {K})} - {\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} =\bar{\mathbf {x}}_{(\mathcal {K})}, \end{aligned}$$
(25)

concluding the proof. \(\square \)

Proof of proposition 1

The case of an empty set \(\mathcal {K}\) is straightforward as then \(\mathbf {e}_{(\mathcal {K})}\) is a zero-length vector. We start by rewriting the predictions in \(\mathcal {K}\) by the model fitted with all measurements as \(\mathtt A _\mathcal {K}\bar{\mathbf {x}} = \mathtt A _\mathcal {K}{\mathtt A}^{\dag } \mathbf {b} = \hat{\mathtt A} _\mathcal {K}\mathbf {b}\). Similarly, we rewrite the predictions in \(\mathcal {K}\) by the model fitted with all but the measurements in \(\mathcal {K}\) as \(\mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} = \mathtt A _\mathcal {K}{\mathtt A}^{\dag } {\tilde{\mathbf {b}}}_{(\mathcal {K})} = \hat{\mathtt A} _\mathcal {K}{\tilde{\mathbf {b}}}_{(\mathcal {K})}\), where the first equality is given by Lemma 1. The difference between the two predictions is thus given by:

(26)

where the second equality is obtained by using the definition of \({\tilde{\mathbf {b}}}_{(\mathcal {K})}\). We then use \(\mathtt I-\mathtt I _{(\mathcal {K})}=\mathtt I _{(-\mathcal {K})}\), leading to:

$$\begin{aligned} \mathtt A _\mathcal {K}\bar{\mathbf {x}} - \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} =\hat{\mathtt A} _\mathcal {K}\left( \mathtt I _{(-\mathcal {K})}\mathbf {b} - \mathtt I _{(-\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})}\right) . \end{aligned}$$
(27)

By noting that \(\hat{\mathtt A} _\mathcal {K}\mathtt I _{(-\mathcal {K})}\mathtt A = \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathtt A _\mathcal {K}\) and \(\hat{\mathtt A} _\mathcal {K}\mathtt I _{(-\mathcal {K})}\mathbf {b} = \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathbf {b}_\mathcal {K}\) we rearrange the equation as:

$$\begin{aligned} (\mathtt I- \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}) \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} + \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathbf {b}_\mathcal {K}=\mathtt A _\mathcal {K}\bar{\mathbf {x}}. \end{aligned}$$
(28)

We subtract \(\mathbf {b}_\mathcal {K}\) to each side of the equation:

$$\begin{aligned} \left( \mathtt I- \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\right) \left( \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} - \mathbf {b}_\mathcal {K}\right) =\mathtt A _\mathcal {K}\bar{\mathbf {x}} - \mathbf {b}_\mathcal {K}, \end{aligned}$$
(29)

and arrive at:

$$\begin{aligned} \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} - \mathbf {b}_\mathcal {K}={\left( \mathtt I- \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\right) }^{-1} \left( \mathtt A _\mathcal {K}\bar{\mathbf {x}} - \mathbf {b}_\mathcal {K}\right) , \end{aligned}$$
(30)

concluding the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bartoli, A. Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting. Int J Comput Vis 122, 61–83 (2017). https://doi.org/10.1007/s11263-016-0954-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0954-x

Keywords

Navigation