Elsevier

Pattern Recognition Letters

Volume 31, Issue 12, 1 September 2010, Pages 1683-1692
Pattern Recognition Letters

Resolving stereo matching errors due to repetitive structures using model information

https://doi.org/10.1016/j.patrec.2010.05.020Get rights and content

Abstract

This study regards the problem of incorrect stereo matches due to the occurrence of repetitive structures in the scene. In stereo vision, repetitive structures may lead to “phantom objects” in front of or behind the true scene which cause severe problems in scenarios involving mobile robot navigation or human–robot interaction. To alleviate this problem, we propose a model-based method which is independent of the specific stereo algorithm used. The basic idea is the feedback of application dependent model information into the correspondence analysis procedure without loosing the ability to reconstruct scene parts not described by the model. The employed scene models may either consist of a single plane or (for modelling more complex objects) of several connected planes. An FFT-based detection stage allows the extraction of scene parts displaying repetitive structures and yields the orientation of the model plane, while the plane distance is inferred with a robust optimisation technique based on a model-free stereo analysis. Alternatively, motion-based segmentation can be applied. Our experimental evaluation performed on manually labelled real-world scenes showing objects in front of repetitive structures shows that the proposed method reduces the fraction of false correspondences on the repetitive structures by factors of up to 30 while only moderately decreasing the fraction of 3D points correctly assigned to the object.

Introduction

Three-dimensional scene reconstruction is essential for applications in fields such as mobile robot navigation, automotive driver assistance systems, or human–machine interaction. A stereo camera system is an appropriate sensor for such applications due to its high lateral resolution and low cost.

Conventional stereo algorithms tend to generate false correspondences in the presence of repetitive structures as a consequence of ambiguities occurring during correspondence analysis. This problem is especially encountered when local methods are used (cf. Brown et al., 2003 for an overview). In many real-time applications, local methods are favourable due to their low computational complexity, e.g. in the domains of driver assistance systems (Franke et al., 2005), safe human–robot interaction (Schmidt et al., 2007), or navigation of planetary rovers (Matthies et al., 2007). But even recent dense global stereo analysis techniques such as the semi-global matching approach (Hirschmüller, 2005) are only partially able to resolve ambiguities due to repetitive structures. Fig. 1 shows the 3D reconstruction results for different stereo methods, regarding a scene displaying a planar chequerboard pattern.

Many stereo algorithms attempt to avoid false correspondences by using well-known techniques such as the ordering constraint, the smoothness constraint, the geometric similarity constraint, or a left–right consistency check (Fua, 1993). Other approaches improve the 3D reconstruction result based on adaptive windows (Kanade and Okutomi, 1991) or multiple windows (Hirschmüller et al., 2002). Regarding repetitive structures, Di Stefano et al. (2004) assess the quality of the minimum of the cost function and the related disparity value by introducing a distinctiveness and a sharpness test to resolve ambiguities. Nedevschi et al. (2004) generally omit a match if more than one possible candidate exists.

Some approaches handle erroneous stereo correspondences explicitly. Murray and Little (2004) use the RANSAC algorithm (Fischler and Bolles, 1981) to fit planes to the 3D points in order to detect and eliminate gross errors. Sepehri et al. (2004) use a similar approach to fit a plane to the 3D points of an object using an M-estimator technique (Huber, 1981, Rey, 1983).

This contribution presents a novel method to cope with repetitive structures in stereo analysis, which can be applied independent of the specific stereo algorithm used. In a first step, a 3D reconstruction of the scene is determined by conventional correspondence analysis, leading to correct and incorrect 3D points. An application dependent scene model or object model is adapted to the initial 3D points, which yields a model pose. The model pose is used to perform a refined correspondence analysis by taking into account the distance of the 3D points to the model into the cost function on which the correspondence analysis is based.

Section snippets

Resolving matching errors using model information

The proposed approach is formulated for general use with an arbitrary stereo algorithm. This section provides a short overview of local and global methods for stereo image analysis along with a description of the stereo system, the employed models, and the pose estimation approaches used for the experimental evaluation presented in Section 3.

Experimental evaluation

In this section we describe an experimental evaluation of the proposed method for resolving stereo matching errors. For image acquisition, we utilised a PointGrey Digiclops camera system with an image size of 1024 × 768 pixels, a camera constant of 6 mm (corresponding to f = 1350 pixels), and a baseline distance of l = 100 mm. The images were rectified to standard epipolar geometry based on the algorithm by Fusiello et al. (2000). We regard three different scenes, each showing a small object in front of

Summary and conclusion

In this study we have examined the problem of incorrect stereo matches due to repetitive structures in the scene. The proposed model-based method is independent of the specific stereo algorithm used. We have employed scene models represented by a single plane or several connected planes. The parameters of the single-plane model are determined by a FFT-based approach which at the same time provides a detection of repetitive structures in the image. The multi-plane model is derived from an

References (35)

  • I.J. Cox et al.

    A maximum likelihood stereo algorithm

    Computer Vision and Image Understanding

    (1996)
  • Amberg, B., Blake, A., Fitzgibbon, A., Romdhani, S., Vetter, T., 2007. Reconstructing high quality face-surfaces using...
  • Baker, H.H., Binford, T.O., 1981. Depth from edge and intensity based stereo. In: Proc. Internat. Joint Conf. on...
  • B. Barrois et al.

    Spatio-temporal 3d pose estimation of objects in stereo images

  • P.J. Besl et al.

    A method for registration of 3-d shapes

    IEEE Trans. Pattern Anal. Machine Intell.

    (1992)
  • Biber, P., Andreasson, H., Duckett, T., Schilling, A., 2004. 3d modeling of indoor environments by a mobile robot with...
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    IEEE Trans. Pattern Anal. Machine Intell.

    (2001)
  • M.Z. Brown et al.

    Advances in computational stereo

    IEEE Trans. Pattern Anal. Machine Intell.

    (2003)
  • L. Di Stefano et al.

    A PC-based real-time stereo vision system

    Internat. J. Machine Graphics Vision

    (2004)
  • O. Faugeras et al.

    Variational principles, surface evolution, pde’s, level set methods and the stereo problem

    IEEE Trans. Image Process.

    (1998)
  • Fielding, G., Kam, M., 1997. Applying the hungarian method to stereo matching. In: Proc. IEEE Conf. on Decision and...
  • M.A. Fischler et al.

    Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • U. Franke et al.

    6D-vision: Fusion of stereo and motion for robust environment perception

  • P. Fua

    A parallel stereo algorithm that produces dense depth maps and preserves image features

    Machine Vision Appl.

    (1993)
  • A. Fusiello et al.

    A compact algorithm for rectification of stereo pairs

    Machine Vision Appl.

    (2000)
  • Hahn, M., Krüger, L., Wöhler, C., Groß, H.-M., 2007. Tracking of human body parts using the multiocular contracting...
  • Heap, T., Hogg, D., 1996. Toward 3D hand tracking using a deformable model. In: Proc. IEEE Internat. Conf. on Automatic...
  • Cited by (7)

    View all citing articles on Scopus
    View full text