Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting

Bartoli, Adrien

doi:10.1007/s11263-016-0954-x

Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting

Published: 20 September 2016

Volume 122, pages 61–83, (2017)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Adrien Bartoli¹

479 Accesses
1 Citation
Explore all metrics

Abstract

The prediction sum of squares statistic uses the principle of leave-one-out cross-validation in linear least squares regression. It is computationally attractive, as it can be computed non-iteratively. However, it has limitations: it does not handle coupled measurements, which should be held out simultaneously, and is specific to the principle of leave-one-out, which is known to overfit when used for selecting a model’s complexity. We propose multiple-exclusion PRESS (MEXPRESS), which generalizes PRESS to coupled measurements and other types of cross-validation, while retaining computational efficiency with the non-iterative MEXPRESS formula. Using MEXPRESS, various strategies to resolve overfitting can be efficiently implemented. The core principle is to exclude training data too ‘close’ or too ‘similar’ to the validation data. We show that this allows one to select the number of control points automatically in three cases: (i) the estimation of linear fractional warps for dense image registration from point correspondences, (ii) surface reconstruction from a dense depth-map obtained by a depth sensor and (iii) surface reconstruction from a sparse point cloud obtained by shape-from-template.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on SVM and their application in image classification

Article 11 January 2018

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Notes

We use $l_{\max } = 6$ for $n=10$, $l_{\max } = 7$ for $n=15$ and $l_{\max } = \min (\text {round}(\frac{m}{2}),100)$ otherwise.

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automation and Control, AC–19(6), 716–723.
Article MathSciNet MATH Google Scholar
Allen, D. M. (1971). Mean square error of prediction as a criterion for selecting variables. Technometrics, 13(3), 469–475.
Article MATH Google Scholar
Bartoli, A. (2008). Maximizing the predictivity of smooth deformable image warps through cross-validation. Journal of Mathematical Imaging and Vision, 31(2–3), 133–145.
Article MathSciNet Google Scholar
Bartoli, A. (2009). On computing the prediction sum of squares statistic in linear least squares problems with multiple parameter or measurement sets. International Journal of Computer Vision, 85(2), 133–142.
Article MathSciNet Google Scholar
Bartoli, A., Perriollat, M., & Chambon, S. (2010). Generalized thin-plate spline warps. International Journal of Computer Vision, 88(1), 85–110.
Article Google Scholar
Bartoli, A., Pizarro, D., & Collins, T. (2013). A robust analytical solution to isometric shape-from-template with focal length calibration. In International conference on computer vision.
Bartoli, A., Gérard, Y., Chadebecq, F., Collins, T., & Pizarro, D. (2015). Shape-from-template. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 2099–2118.
Article Google Scholar
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
MATH Google Scholar
Bookstein, F. L. (1986). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6), 567–585.
Article MATH Google Scholar
Breiman, L. (1993). Fitting additive models to regression data. Computational Statistics and Data Analysis, 15, 13–46.
Article MathSciNet MATH Google Scholar
Brunet, F., Bartoli, A., Malgouyres, R., & Navab, N. (2009). NURBS warps. In British machine vision conference.
Dierckx, P. (1981). An algorithm for surface-fitting with spline functions. IMA Journal of Numerical Analysis, 1(3), 267–283.
Article MathSciNet MATH Google Scholar
Dierckx, P. (1993). Curve and surface fitting with splines. Oxford: Oxford University Press.
MATH Google Scholar
Forsyth, D., & Ponce, J. (2003). Computer vision: A modern approach (2nd ed.). Upper Saddle River: Prentice Hall.
Google Scholar
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.
Article MathSciNet MATH Google Scholar
Hartley, R. I., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge, MA: Cambridge University Press.
MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.
Book MATH Google Scholar
Jupp, D. L. B. (1978). Approximation to data by splines with free knots. SIAM Journal on Numerical Analysis, 15(2), 328–343.
Article MathSciNet MATH Google Scholar
Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In International conference on computer vision.
Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
Article MathSciNet MATH Google Scholar
Miyata, S., & Shen, X. (2005). Free-knot splines and adaptive knot selection. Journal of the Japanese Statistical Society, 35(2), 303–324.
Article MathSciNet Google Scholar
Molinari, N., Durand, J.-F., & Sabatier, R. (2004). Bounded optimal knots for regression splines. Computational Statistics and Data Analysis, 45, 159–178.
Article MathSciNet MATH Google Scholar
Nguyen, N., Milanfar, P., & Golub, G. (2001). Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(9), 1299–1308.
MathSciNet MATH Google Scholar
Perriollat, M., & Bartoli, A. (2013). A computational model of bounded developable surfaces with application to image-based 3D reconstruction. Computer Animation and Virtual Worlds, 24(5), 459–476.
Article Google Scholar
Perriollat, M., Hartley, R., & Bartoli, A. (2011). Monocular template-based reconstruction of inextensible surfaces. International Journal of Computer Vision, 95(2), 124–137.
Article MathSciNet MATH Google Scholar
Rueckert, D., Sonoda, L. I., Hayes, C., Hill, D. L. G., Leach, M. O., & Hawkes, D. J. (1999). Nonrigid registation using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8), 712–721.
Article Google Scholar
Salzmann, M., Pilet, J., Ilic, S., & Fua, P. (2007). Surface deformation models for nonrigid 3D shape recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1–7.
Article Google Scholar
Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization optimization, and beyond. Cambridge, MA: MIT Press.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
Sevilla-Lara, L., & Learned-Miller, E. (2012). Distribution fields for tracking. In International conference on computer vision and pattern recognition.
Steinbrücker, F., Kerl, C., Sturm, J., & Cremers, D. (2013). Large-scale multi-resolution surface reconstruction from RGB-D sequences. In International conference on computer vision.
Süssmuth, J., Meyer, Q., & Greiner, G. (2010). Surface reconstruction based on hierarchical floating radial basis functions. Computer Graphics Forum, 29(6), 1854–1864.
Article Google Scholar
Tang, H., Joshi, N., & Kapoor, A. (2011). Learning a blind measure of perceptual image quality. In International conference on computer vision and pattern recognition.
Varol, A., Salzmann, M., Fua, P., & Urtasun, R. (2012). A constrained latent variable model. In International conference on computer vision and pattern recognition.
Xu, W., Salzmann, M., Wang, Y., & Liu, Y. (2014). Nonrigid surface registration and completion from RGBD images. In European conference on computer vision.
Yan, X., & Su, X.G. (2009). Linear regression analysis: Theory and computing. Singapore: World Scientific Publishing. ISBN 9812834109.
Zollhöfer, M., Niessner, M., Izadi, S., Rhemann, C., Zach, C., Fisher, M., et al. (2014). Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics, 33(4), 156.
Article Google Scholar

Download references

Acknowledgments

I thank Mathias Gallardo and Ajad Chhatkuli for their help in creating the surface fitting datasets and the authors of Varol et al. (2012) for the paper SfT dataset. This research has received funding from the EU’s FP7 through the ERC research Grant 307483 FLEXABLE.

Author information

Authors and Affiliations

Clermont Université, Clermont-Ferrand, France
Adrien Bartoli

Authors

Adrien Bartoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrien Bartoli.

Additional information

Communicated by Jiri Matas.

Appendix: Proof of Proposition 1

The proof of proposition 1 follows the same steps as the proof for the PRESS (Bartoli 2009). It requires the following lemma.

Lemma 1

The model estimate $\bar{\mathbf {x}}_{(\mathcal {K})}$ obtained by holding out a set of measurements with index set $\mathcal {K}\in [1,m]$ is the same as the one obtained by replacing the $\mathcal {K}$ responses by their predictions with the model $\bar{\mathbf {x}}_{(\mathcal {K})}$:

(22)

where we used the notation $\mathtt I _{(\mathcal {K})}$ for an identity matrix whose diagonal entries with index in $\mathcal {K}$ are put to zero, and $\mathtt I _{(-\mathcal {K})} \mathop {=}\limits ^{ def }\mathtt I- \mathtt I _{(\mathcal {K})}$. Formally, $\mathtt I _{(\mathcal {K})} = \text {diag}(\mathbf {d})$ with $\mathbf {d}_\mathcal {K}= 0$ and $\mathbf {d}_{-\mathcal {K}} = 1$.

Proof of lemma 1

Using the definition of ${\tilde{\mathbf {b}}}_{(\mathcal {K})}$ we have:

$$\begin{aligned} {\mathtt A}^{\dag }{\tilde{\mathbf {b}}}_{(\mathcal {K})} ={\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} + {\mathtt A}^{\dag } \mathtt I _{(-\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} \end{aligned}$$

(23)

Because $\mathtt I _{(-\mathcal {K})} = \mathtt I- \mathtt I _{(\mathcal {K})}$ this may be expanded as:

$$\begin{aligned} {\mathtt A}^{\dag }{\tilde{\mathbf {b}}}_{(\mathcal {K})} ={\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} + {\mathtt A}^{\dag } \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} - {\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})}. \end{aligned}$$

(24)

The second term is simplified to ${\mathtt A}^{\dag } \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} = \bar{\mathbf {x}}_{(\mathcal {K})}$. Using the definition of $\bar{\mathbf {x}}_{(\mathcal {K})}$, the third term is simplified to ${\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})} = {\mathtt A}^{\dag }\mathtt I _{(\mathcal {K})}\mathbf {b}$. The overall expression is thus simplified to:

$$\begin{aligned} {\mathtt A}^{\dag }{\tilde{\mathbf {b}}}_{(\mathcal {K})} ={\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} + \bar{\mathbf {x}}_{(\mathcal {K})} - {\mathtt A}^{\dag } \mathtt I _{(\mathcal {K})} \mathbf {b} =\bar{\mathbf {x}}_{(\mathcal {K})}, \end{aligned}$$

(25)

concluding the proof. $\square $

Proof of proposition 1

The case of an empty set $\mathcal {K}$ is straightforward as then $\mathbf {e}_{(\mathcal {K})}$ is a zero-length vector. We start by rewriting the predictions in $\mathcal {K}$ by the model fitted with all measurements as $\mathtt A _\mathcal {K}\bar{\mathbf {x}} = \mathtt A _\mathcal {K}{\mathtt A}^{\dag } \mathbf {b} = \hat{\mathtt A} _\mathcal {K}\mathbf {b}$. Similarly, we rewrite the predictions in $\mathcal {K}$ by the model fitted with all but the measurements in $\mathcal {K}$ as $\mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} = \mathtt A _\mathcal {K}{\mathtt A}^{\dag } {\tilde{\mathbf {b}}}_{(\mathcal {K})} = \hat{\mathtt A} _\mathcal {K}{\tilde{\mathbf {b}}}_{(\mathcal {K})}$, where the first equality is given by Lemma 1. The difference between the two predictions is thus given by:

(26)

where the second equality is obtained by using the definition of ${\tilde{\mathbf {b}}}_{(\mathcal {K})}$. We then use $\mathtt I-\mathtt I _{(\mathcal {K})}=\mathtt I _{(-\mathcal {K})}$, leading to:

$$\begin{aligned} \mathtt A _\mathcal {K}\bar{\mathbf {x}} - \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} =\hat{\mathtt A} _\mathcal {K}\left( \mathtt I _{(-\mathcal {K})}\mathbf {b} - \mathtt I _{(-\mathcal {K})} \mathtt A \bar{\mathbf {x}}_{(\mathcal {K})}\right) . \end{aligned}$$

(27)

By noting that $\hat{\mathtt A} _\mathcal {K}\mathtt I _{(-\mathcal {K})}\mathtt A = \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathtt A _\mathcal {K}$ and $\hat{\mathtt A} _\mathcal {K}\mathtt I _{(-\mathcal {K})}\mathbf {b} = \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathbf {b}_\mathcal {K}$ we rearrange the equation as:

$$\begin{aligned} (\mathtt I- \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}) \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} + \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\mathbf {b}_\mathcal {K}=\mathtt A _\mathcal {K}\bar{\mathbf {x}}. \end{aligned}$$

(28)

We subtract $\mathbf {b}_\mathcal {K}$ to each side of the equation:

$$\begin{aligned} \left( \mathtt I- \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\right) \left( \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} - \mathbf {b}_\mathcal {K}\right) =\mathtt A _\mathcal {K}\bar{\mathbf {x}} - \mathbf {b}_\mathcal {K}, \end{aligned}$$

(29)

and arrive at:

$$\begin{aligned} \mathtt A _\mathcal {K}\bar{\mathbf {x}}_{(\mathcal {K})} - \mathbf {b}_\mathcal {K}={\left( \mathtt I- \hat{\mathtt A} _{\mathcal {K},\mathcal {K}}\right) }^{-1} \left( \mathtt A _\mathcal {K}\bar{\mathbf {x}} - \mathbf {b}_\mathcal {K}\right) , \end{aligned}$$

(30)

concluding the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bartoli, A. Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting. Int J Comput Vis 122, 61–83 (2017). https://doi.org/10.1007/s11263-016-0954-x

Download citation

Received: 06 August 2015
Accepted: 07 September 2016
Published: 20 September 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11263-016-0954-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Proposition 1

Lemma 1

Proof of lemma 1

Proof of proposition 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalizing the Prediction Sum of Squares Statistic and Formula, Application to Linear Fractional Image Warp and Surface Fitting

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of Proposition 1

Appendix: Proof of Proposition 1

Lemma 1

Proof of lemma 1

Proof of proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation