Uncertainty analysis of 3D reconstruction from uncalibrated views

https://doi.org/10.1016/S0262-8856(99)00072-4Get rights and content

Abstract

We consider reconstruction algorithms using points tracked over a sequence of (at least three) images, to estimate the positions of the cameras (motion parameters), the 3D coordinates (structure parameters), and the calibration matrix of the cameras (calibration parameters). Many algorithms have been reported in literature, and there is a need to know how well they may perform. We show how the choice of assumptions on the camera intrinsic parameters (either fixed, or with a probabilistic prior) influences the precision of the estimator. We associate a Maximum Likelihood estimator to each type of assumptions, and derive analytically their covariance matrices, independently of any specific implementation. We verify that the obtained covariance matrices are realistic, and compare the relative performance of each type of estimator.

Introduction

The problem of 3D reconstruction from images has drawn considerable attention. We focus on the problem of reconstruction from matched points (corners). The parameters of interest are the structure parameters, i.e. the 3D coordinates of the points, the motion parameters that describe the positions of the cameras; and the calibration parameters that describe the intrinsic characteristics of the used sensors. The case of known intrinsic parameters has been thoroughly studied in photogrammetry [1]. Work on uncalibrated reconstruction progressed dramatically in recent years with the works of Hartley [2], Faugeras [3], Maybank [4], Pollefeys et al. [5], who showed how to obtain projective, affine, and, finally, euclidean reconstructions from uncalibrated views. We are interested in euclidean reconstruction. Many algorithms have been proposed, differing, e.g. on the assumptions concerning the calibration parameters and/or motion [6]. Studies of the precision of the estimation of the “fundamental matrix” [7] and “trifocal tensor” [8], which represent multilinear constraints that tracked 2D features must verify can be found in [9], [10], [11]. A study of critical (pathological) cases for self-calibration can be found in Ref. [12], and the achievable precision in the calibrated case is addressed in Ref. [13].

In this paper, we study the precision with which 3D points, camera orientation, position and calibration are estimated. In some studies [14], [15] some intrinsic parameters are fixed to nominal values. We want to compare, in terms of precision, the effect of these assumptions and the precision achieved in the calibrated case. One contribution of this paper is to compare the precisions of calibrated and uncalibrated reconstruction. Although the former always performs better, experimentation shows that when more than ten images are available uncalibrated reconstruction performs honorably.

Errors in the localization of image features introduce errors in the reconstruction. Some algorithms are numerically unstable, intrinsically, or in conjunction to particular setups of points and/or of cameras. However, an in-depth study of the precision of these algorithms has not been presented. The issue of the accuracy of uncalibrated reconstruction has been raised and studied repeatedly, but always associated to a particular algorithm. Our aim is to give a more general treatment to the question, while remaining as independent as possible of any particular implementation.

Most algorithms combine an “algebraic” part, and an optimization part that solves for a Maximum Likelihood [2] (or related [16]) estimate. Maximum Likelihood (ML) and related estimators are often reported [16] to converge to the solution only if started close from it. It is the purpose of the “algebraic” algorithm to provide the starting position. In this paper, we study the precision of the ML-like estimator, not that of the algebraic algorithm. The true parameters are considered as random variables with a distribution that is defined from the observations. The estimator is defined by the observation model, independently from any specific algorithm; we derive analytically its covariance matrix in various cases of interest: we distinguish the cases in which only the observations are available (ML estimation) and those where some knowledge of the estimated quantities is available a priori. Amongst the later, we further distinguish the cases of probabilistic knowledge (maximum a posteriori estimation), and that of “exact” knowledge, where some parameters are fixed.

When estimating all parameters from only the observations, the estimation is often numerically ill posed. For example, in Ref. [14] some intrinsic parameters are highly correlated with some of the motion parameters, and the focal length is correlated to the depth (cinema uses the fact that zooming is almost indistinguishable from forward motion).

If some calibration parameters are fixed, they may be removed from the estimated vector. This simplifies the study and implementation of the estimator, and—presumably—ameliorate the numerical stability. Typical assumptions are that pixels are rectangular or square, or that the principal point coincides with the image center [5], [15]. We verify in Section 5.1 the effect on precision of fixing the intrinsic parameters, either to values obtained from a pre-calibration step or to nominal values (corresponding to square pixels and centered principal point).

Finally, the likelihood function may be modified to take into account a priori knowledge expressed probabilistically, e.g. assuming that structure or calibration follow a known distribution. One then performs maximum a posteriori (MAP) estimation. A prior on structure serves most often to retrieve precisely the intrinsic parameters, and is then called calibration from a known object.

A prior on the calibration parameters, may come either from a previous calibration step, or from assuming that the camera parameters follow a “nominal” distribution, e.g. the expected value of the principal point is the center of the image, and that its standard deviation is approximately 10% of the image size.1 This is the probabilistic counterpart of fixing the principal point to image center, in Section 2.2. In terms of the theoretical precision, priors are preferable to fixed parameters.

We will write analytically the covariance matrices corresponding to the studied cases in , , . The diagonal terms correspond to the variances of the individual estimated parameters. The validity of our analytical expressions is verified by comparing the theoretical and the observed behavior of a reconstruction algorithm, in Section 5.1. One important contribution of this paper lies in showing how big the variances of the considered estimators are in practice.

Section snippets

Notations

We now define the notations used throughout the paper. We consider that a set of P points has been tracked over a sequence of N images. The following notation is adopted:

  • p∈{1,2,…,P} and n∈{1,2,…,N} are the indices used for numbering points and images, respectively.

  • xp∈R3 is the vector of the coordinates, in the world frame, of the pth point. Its components are i∈{1,2,3}. The symbol X shall denote the 3D coordinates of all the points x1,…,xP.

The projection of these 3D points in the image depends

Covariance of estimators

We derive the covariance matrices of estimators for three possible cases: the “plain” maximum likelihood (ML) estimator defined from the observations only, the maximum a posteriori (MAP) estimator obtained when a probabilistic prior is available for a subset of the estimated parameters and finally for the “restricted” ML estimator, in which a subset of the estimated parameters is fixed to given values. The obtained expressions, some of which being identical to those in [19], [20], only involve Q

Specialization to the problem of reconstruction

The above formulas hold for any estimator of the considered types (ML, MAP or “restricted” ML). We now specialize them to the case of Gaussian noise, when the log-likelihood is a sum of squared differences between observations and predictionsDΘiQ=npkDΘivnpk(vnpk−unpk)/σ2DΘiΘj2Q=1σ2npkDΘivnpkDΘjvnpk+DΘiΘj2vnpk(vnpk−unpk)DΘiunpk2Q=DΘivnpk2

A first practical consideration: notice that in the previous section, one may perform the expansions in Taylor series around Θ rather than in Θ̂. One would

Experimental results

We performed various experiments (real and simulated) to study the performance of the estimators. The errors on the parameters X, W, T and K are studied separately. For X and K, which are normalized for having E(‖xp2)=1, and E(‖K2)≃1, the error measures are the standard deviations of xpxp and KK. For T, the standard deviation of tt is used too. We saw in Section 2.5 that T is expressed in the same unit as X. For W, the measure is the standard deviation of the angle formed between

Conclusions

We have analyzed the problem of the precision that is achievable in 3D reconstruction from uncalibrated views. Although a lot of work has been carried out on various forms of reconstruction, the problem of precision evaluation is seldom addressed in a systematic way.

We have formulated the problem in a probabilistic framework. We further considered that various types of prior information may be available and defined the corresponding estimators.

One contribution of this work is the analytical

Acknowledgements

This work has been supported by projects INCO COPERNICUS Proj. 960174-VIRTUOUS and PRAXIS 2/2.1/TPAR/2074/95.

References (21)

  • P.H.S. Torr et al.

    Robust detection of degenerate configurations for the fundamental matrix

    Computer Vision and Image Understanding

    (1998)
  • P.R. Wolf

    Elements of photogrammetry, with air photo interpretation and remote sensing

    (1983)
  • R.I. Hartley

    Euclidean reconstruction from uncalibrated views

    In in 2nd Proc. Europe-U.S. Workshop on Invariance

    (1993)
  • O.D. Faugeras

    What can be seen in three dimensions with an uncalibrated stereo rig?

  • S.J. Maybank et al.

    Theory of self-calibration of a moving camera

    International Journal Computer Vision

    (1992)
  • M. Pollefeys et al.

    Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters

    Proceedings of the Sixth ICCV

    (1998)
  • A.W. Fitzgibbon et al.

    Automatic 3D model construction for turn-table sequences

  • Gang Xu et al.

    Epipolar geometry in Stereo, Motion and Object Recognition, A Unified Approach

    (1996)
  • A. Shashua

    Algebraic functions for recognition

    IEEE Transactions on PAMI

    (1994)
  • G. Csurka et al.

    Characterizing the uncertainty of the fundamental matrix

    Computer Vision and Image Understanding

    (1997)
There are more references available in the full text version of this article.

Cited by (20)

  • Research on Fast Motion Estimation in H264 Coding

    2022, Proceedings of SPIE - The International Society for Optical Engineering
  • Structure from Motion with variable focal length: Interconnected fuzzy observer

    2021, Proceedings of the IEEE Conference on Decision and Control
  • Research on the influence of camera position on reconstruction accuracy in binocular vision

    2020, Proceedings of SPIE - The International Society for Optical Engineering
View all citing articles on Scopus
View full text