Abstract
Recent research on structure and motion recovery has focused on issues related to sensitivity and robustness of existing techniques. One possible reason is that in practical applications, the underlying assumptions made by existing algorithms are often violated. In this paper, we propose a framework for 3D reconstruction from short monocular video sequences taking into account the statistical errors in reconstruction algorithms. Detailed error analysis is especially important for this problem because the motion between pairs of frames is small and slight perturbations in its estimates can lead to large errors in 3D reconstruction. We focus on the following issues: physical sources of errors, their experimental and theoretical analysis, robust estimation techniques and measures for characterizing the quality of the final reconstruction. We derive a precise relationship between the error in the reconstruction and the error in the image correspondences. The error analysis is used to design a robust, recursive multi-frame fusion algorithm using “stochastic approximation” as the framework since it is capable of dealing with incomplete information about errors in observations. Rate-distortion analysis is proposed for evaluating the quality of the final reconstruction as a function of the number of frames and the error in the image correspondences. Finally, to demonstrate the effectiveness of the algorithm, examples of depth reconstruction are shown for different video sequences.
Similar content being viewed by others
References
Azarbayejani, A. and Pentland, A. 1995. Recursive estimation of motion, structure, and focal length. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17:562-575.
Benveniste, A., Metivier, M., and Priouret, P. 1987. Adaptive Algorithms and Stochastic Approximations. Springer-Verlag.
Black, M. and Rangarajan, A. 1996. On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International Journal of Computer Vision, 19:57- 91.
Broida, T. 1985. Estimating the Kinematics and Structure of aMoving Object from a Sequence of Images. Ph.D. Thesis.
Broida, T., Chandrashekhar, S., and Chellappa, R. 1990. Recursive estimation of 3-D kinematics and structure from a noisy monocular image sequence. IEEE Trans. on Aerospace and Electronic Systems, 26:639-656.
Broida, T. and Chellappa, R. 1989. Performance bounds for estimating three-dimensional motion parameters from a sequence of noisy images. Journal of the Optical Society of America A, 6:879- 889.
Broida, T. and Chellappa, R. 1991. Estimating the kinematics and structure of a rigid object from a sequence of monocular images. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13:497-513.
Cho, K., Meer, P., and Cabrera, J. 1997. Performance assessment through bootstrap. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19:1185-1198.
Cover, T. and Thomas, J. 1991. Elements of Information Theory. John Wiley and Sons.
Daniilidis, K. and Nagel, H. 1990. Analytic results on error sensitivity of motion estimation from two views. Image and Vision Computing, 8(4):297-303.
Daniilidis, K. and Nagel, H. 1993. The coupling of rotation and translation in motion estimation of planar surfaces. In Conference on Computer Vision and Pattern Recognition, pp. 188- 193.
Daniilidis, K. and Spetsakis, M. 1993. Understanding noise sensitivity in structure from motion. In VisNav93.
Efron, B. and Tibshirani, R. 1993. An Introduction to the Bootstrap.Chapman and Hall.
Faugeras, O. 1993. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press.
Fermuller, C. and Aloimonos, Y. 2001. Statistics explains geometrical optical illusions. In Foundations of Image Understanding, Chap. 14.
Fessler, J. 1996. Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): Applications to tomography. IEEE Transactions on Image Processing, 5:493- 506.
Fua, P. 2000. Regularized bundle-adjustment to model heads from image sequences without calibration data. International Journal of Computer Vision, 38(2):153-171.
Gennery, D. 1992. Visual tracking of known three-dimensional objects. International Journal of Computer Vision, 7(3):243- 270.
Golub, G. and Van Loan, C. 1989. Matrix Computations. Johns Hopkins University Press.
Goodman, I., Mahler, R., and Nguyen, H. 1997. Mathematics of Data Fusion. Kluwer Academic Publishers.
Haralick, R. 1996. Covariance propagation in computer vision. In ECCV Workshop on Performance Characteristics of Vision Algorithms.
Hartley, R.I. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.
Kanatani, K. 1993. Unbiased estimation and statistical analysis of 3-D rigid motion from two views. Pattern Analysis and Machine Intelligence, 15(1):37-50.
Kanatani, K. 1996. Statistical Optimization for Geometric Computation: Theory and Practice. North-Holland.
Ljung, L. and Soderstrom, T. 1987. Theory and Practice of Recursive Identification. MIT Press.
Longuet-Higgins, H. 1981. A computer algorithm for reconstructing a scenes from two projections. Nature, 293:133-135.
Ma, Y., Kosecka, J., and Sastry, S. 2000. Linear differential algorithm for motion recovery:Ageometric approach. International Journal of Computer Vision, 36:71-89.
Meer, P., Mintz, D., and Rosenfeld, A. 1992. Analysis of the least median of squares estimator for computer vision applications. In Conference on ComputerVision andPattern Recognition, pp. 621- 623.
Morris, D., Kanatani, K., and Kanade, T. 2000. 3D model accuracy and gauge fixing. Technical Report, Carnegie-Mellon University, Pittsburgh.
Nalwa, V. 1993. A Guided Tour of Computer Vision. AddisonWesley.
Oliensis, J. 1999. A multi-frame structure-from-motion algorithm under perspective projection. International Journal of Computer Vision, 34:1-30.
Oliensis, J. 2000. A critique of structure from motion algorithms. Technical Report http://www.neci.nj.nec.com/homepages/oliensis/, NECI.
Oliensis, J. and Genc, Y. 2001. Fast and accurate algorithms for projective multi-image structure from motion. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(6):546-559.
Papoulis, A. 1991. Probability, Random Variables and Stochastic Processes. McGraw-Hill.
Poor, H. 1988. An Introduction to Signal Detection and Estimation. Springer-Verlag.
Robbins, H. and Monro, S. 1951. A stochastic approximation method. Annals of Mathematical Statistics, 22:400-407.
Rousseeuw, P. 1984. Least median of square regression. Journal of the American Statistical ssociation, 79:871-880.
Rousseeuw, P.and Leroy, A.1987. Robust Regression and Outlier Detection. John Wiley and Sons.
Roy Chowdhury, A. 2002. Statistical Analysis of 3D Modeling From Monocular Video Stream. Ph.D. Thesis, University of Maryland, College Park.
Roy Chowdhury, A. and Chellappa, R. 2002. Towards a criterion for evaluating the quality of 3D reconstructions. In International Conference on Acoustics, Speech and Signal Processing.
Roy Chowdhury, A. and Chellappa, R. 2003a. Face reconstruction from monocular video using uncertainty analysis and a generic model. Accepted to Computer Vision and Image Understanding.
Roy Chowdhury, A. and Chellappa, R. 2003b. Statistical error propagation in 3D modeling from monocular video. In CVPRWorkshop on Statistical Analysis in Computer Vision.
Saridis, G. December 1974. Stochastic approximation methods for identification and control-A survey. IEEE Trans. on Automatic Control, 19.
Shan, Y., Liu, Z., and Zhang, Z. 2001. Model-based bundle adjustment with application to face modeling. In International Conference on Computer Vision. pp.644-651.
Shao, J. 1998. Mathematical Statistics. Springer-Verlag.
Soatto, S. and Brockett, R. 1998. Optimal structure from motion: Local ambiguities and global estimates. In Conference on Computer Vision and Pattern Recognition, pp. 282-288.
Spall, J. 2000. Preprint of Introduction to Stochastic Search and Optimization. Wiley.
Srinivasan, S. 2000. Extracting structure from optical flow using fast error search technique. International Journal of Computer Vision,37:203-230.
Sun, Z., Ramesh, V., and Tekalp, A. 2001. Error characterization of the factorization method. Computer Vision and Image Understanding, 82:110-137.
Szeliski, R. and Kang, S. 1994. Recovering 3D shape and motion from image streams using non-linear least squares. Journal of Visual Computation and Image Representation, 5:10-28.
Thomas, J. and Oliensis, J. 1999. Dealing with noise in multiframe structure from motion. Computer Vision and Image understanding, 76:109-124.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9:137-154.
Triggs, B., Zisserman, A., and Szeliski, R. 2000. Vision Algorithms:T heory and Practice. Springer.
Tsai, R. and Huang, T. 1981. Estimating 3-D motion parameters of a rigid planar patch: I. IEEE Trans. on Acoustics, Speech and Signal Processing, 29:1147-1152.
Walter, R. 1976. Principles of Mathematical Analysis, 3rd Edition.McGraw-Hill.
Weng, J., Ahuja, N., and Huang, T. 1993. Optimal motion and structure estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15:864-884.
Weng, J., Huang, T., and Ahuja, N. 1987. 3-D motion estimation, understanding, and prediction from noisy image sequences. IEEE Trans. on Pattern Analysis and Machine Intelligence, 9:370-389.
Weng, J., Huang, T., and Ahuja, N. 1989. Motion and structure from two perspective views: Algorithms, error analysis, and error estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 11(5):451-476.
Young, G. and Chellappa, R. 1990. 3-D motion estimation using a sequence of noisy stereo images: Models, estimation, and uniqueness results. Pattern Analysis and Machine Intelligence:12(8):735- 759.
Young, G. and Chellappa, R. 1992. Statistical analysis of inherent ambiguities in recovering 3-D motion from a noisy flowfield. IEEE Trans. on Pattern Analysis and Machine Intelligence, 14:995- 1013.
Zhang, Z. 1998. Determining the epipolar geometry and its uncertainty: A review. International Journal of Computer Vision, 27:161-195.
Zhang, Z. and Faugeras, O. 1992. 3D Dynamic Scene Analysis. Springer-Verlag.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chowdhury, A.K.R., Chellappa, R. Stochastic Approximation and Rate-Distortion Analysis for Robust Structure and Motion Estimation. International Journal of Computer Vision 55, 27–53 (2003). https://doi.org/10.1023/A:1024488407740
Issue Date:
DOI: https://doi.org/10.1023/A:1024488407740