Stereo matching algorithm based on per pixel difference adjustment, iterative guided filter and graph segmentation

https://doi.org/10.1016/j.jvcir.2016.11.016Get rights and content

Highlights

  • Added the programming language and library used in the manuscript.

  • Added the average time for the results in Table 2.

  • Added an extra table for the results based on (bad 2.0) pixel error in Table 6.

Abstract

Stereo matching process is a difficult and challenging task due to many uncontrollable factors that affect the results. These factors include the radiometric variations and illumination inconsistence. The absolute differences (AD) algorithms work fast, but they are too sensitive to noise and low textured areas. Therefore, this paper proposes an improved algorithm to overcome these limitations. First, the proposed algorithm utilizes per-pixel difference adjustment for AD and gradient matching to reduce the radiometric distortions. Then, both differences are combined with census transform to reduce the effect of illumination variations. Second, a new approach of iterative guided filter is introduced at cost aggregation to preserve and improve the object boundaries. The undirected graph segmentation is used at the last stage in order to smoothen the low textured areas. The experimental results on the standard indoor and outdoor datasets show that the proposed algorithm produces smooth disparity maps and accurate results.

Introduction

In recent years there have been various progresses in the field of image processing and computer vision. One of the popular topics in computer vision is the generation of depth map or disparity map from a pair of stereo images. This topic is useful and plays an essential role in many applications such as three dimensional (3D) scanning, 3D tracking, 3D reconstruction and autonomous navigation. Many research groups have studied this discipline in depth, gaining mechanism for 3D mapping. Following this, stereo vision is established with focusing on achieving low computational cost and high accuracy. It uses two parallel digital cameras to acquire the depth of a scene. The stereo cameras provide high resolution images at low prices and every pixel of the images can be used for any other applications as well [1].

The matching process of stereo matching searches for corresponding predictions of the same scene point onto both camera planes. One of the problems associated with developing image matching is the computational cost required to achieve the appropriate results [1]. The result from the matching process is presented as a disparity map. This map provides the depth data which is important to 3D image reconstruction. The disparity map estimation comprises of finding the correspondence for each pixel pair from two images at a designated coordinates (i.e., reference image coordinates). Most of the stereo matching algorithms rely on four steps taxonomy proposed by Scharstein and Szeliski [2]. These steps are:

  • Step 1: Matching cost computation (i.e., matching process for each pixel from left image to right image).

  • Step 2: Cost aggregation (i.e., aggregate initial costs over a support region).

  • Step 3: Disparity optimization (i.e., select the disparity level that optimize the function).

  • Step 4: Disparity refinement (i.e., post-processing to refine the final disparity map).

The disparity map algorithms can be classified as either global, semi-global or local methods. This classification depends on how the disparity is been calculated [2].

Global optimization methods treat the disparity assignment problem as a problem of minimizing a predefined global energy function. They are usually less sensitive to local individualities and tend to be more computationally expensive. The measurement is taken from the global data with an additional smooth constraint for neighbouring pixels [3]. Numerous methods for solving the global energy minimization problem by using a graph-based from Markov Random Field (MRF) have been proposed [4]. These methods can be categorized as either graph cut (GC) method [5] or belief propagation method [6].

The GC method gives the minimal energy solution by applying minimum cut and max-flow algorithms to the energy flow structure which is extracted from the MRF graph. In contrast, the belief propagation method minimizes the energy function by iteratively passing messages from the current node to neighboring nodes in the MRF graph. Mozerov et al. [4] utilized two energy minimization steps by using bilateral filter (BF) on a fully attached MRF. First, the minimum energy marginally to the globally connected model are calculated. These values are used for the second step of optimization with locally connected model. Their algorithm was capable to reduce the occlusion errors and increase the efficiency but the computational complexity with two steps of optimization also increase tremendously.

Semi-global matching (SGM) is another method to find the correspondence pixels between a pair of stereo images. This method uses local approximation to form matching cost and aggregates using a global cost function across entire 2D MRF image along linear 1D pixel paths [7]. These paths may use several directions through the images (e.g., 8 or 16 directions) to cover the structure of the image. However, the SGM depends on both the number of pixels and the disparity range. It makes the SGM to require more space of temporary memory to be operated. Wenzel et al. [8] implemented SGM method which performed multi-baseline matching between a base image with all matched images.

The modification of SGM by Bethmann et al. [9] produced best matches in 3D space. This method transferred the process of cost calculation and aggregates from the image into the object space. Sinha et al. [10] proposed an algorithm using sparse feature matching followed by an iterative clustering steps. Local plane sweeps are then executed around each slanted plane to create out-of-plane parallax and matching cost estimations. Each pixel is assigned to one of the local plane hypotheses by using the SGM optimization. This technique delivers significant speedups for high resolution images but the algorithm still produces low accuracy on the low textured regions.

On the other hand, local methods determine the disparity using the correspondence between the gray values or patterns within a given local support window. The local support window is a small number of pixels around the pixel of interest. These methods are also referred as window-based or area-based methods. There are several approaches related to window-based methods such as fixed window [11], multiple window [12], adaptive window [13], [14], [15] and segmentation based [16]. Such approaches use only local information. They have low computational requirement and a short runtime. The disparity map is determined by selecting the smallest matching cost value from the disparity candidates. This selection is well known as winner take all (WTA) approach. Although the local methods can produce disparity maps quickly, thus precision is low, especially at regions with low textured and depth discontinuity. These problems can be reduced using the image segmentation process such as mean shift [3] or watershed segmentations [17].

One of the well-known local-based methods is using adaptive support weight (ASW) which was proposed by Yoon and Kweon [18]. The ASW is an adaptive method which each pixel has a different support weight. Hosni et al. [19] proposed the ASW using a guided filter (GF) which was developed by He et al. [20], [21] to reduce the edge fattening problem at the cost aggregation stage. They performed color segmentation inside the GF’s window. The pixels having a similar color with the center pixel are imposed with a high support weight. Zhang et al. [22] used exponential function with different window sizes of GF at cost aggregation stage. They applied the ASW technique to reduce the discontinuity error. Yang et al. [23] used two support windows of the GF to increase the edge-preserving efficiency, but this technique also increases the weight of invalid pixels. Zhu et al. [24] used the ASW using a GF with modified dynamic programming. Their results show the reduction of errors on the occluded and discontinuity regions. However this method is unable to reduce the errors especially on the low textured or flattening areas. Zhu et al. [25] used adaptive edge-preserving GF with cross-based support window to resolve the ambiguity pixels effectively. Kordelas et al. [26] proposed a new approach which used content-based GF and weighted SGM for accurate disparity map estimation. They used two different sizes of support windows which a pixel was assigned based on the local image content around the pixel of interest. Their results demonstrated high accuracy based on the Middlebury database V2.

Based on the studies from state of art algorithms, every method (i.e., global, semi-global, local) have their own advantages and disadvantages [27]. Global methods commonly deliver more accurate disparity map [28]. However, they are time-consuming and high computational complexity. So they are relatively slow and do not scale well to high resolution images that involve high storage. The SGM methods show more fast execution and memory efficient compared to the global methods. These methods aim to minimize a global 2D energy function by solving a large number of 1D minimization problem [29]. Generally, the state of art local methods lack of accuracy but provide high speed and low computational complexity. However based on the recent studies, this method is able to produce high accuracy. This is shown by the works of Kordelas et al. [14], Zhan et al. [30] and Peng et al. [31] which produce high accuracy and fast execution. Their algorithms also rank at top of Middlebury stereo evaluation based on the lowest of bad pixels percentage [32]. Encouraged by this, the proposed work in this manuscript is using local approach.

This paper proposes a new local-based stereo matching algorithm to compute the disparity maps more accurately. The proposed work uses a combination of three different similarity measures which consists of absolute differences (AD), gradient matching (GM) and census transform (CN). The differences of AD and GM are imposed with per-pixel difference adjustment to increase the quality of the preliminary disparity map. The aggregation step uses a new developed iterative GF which increases the accuracy of the edges and discontinuity areas. The final stage consists of post-processing steps with the usage of weighted BF, undirected graph segmentation and plane fitting process.

The rest of this paper is organized as follows. The next section will provide a detailed explanation of the proposed stereo matching algorithm. This is followed by the section of experimental arrangements and the results. The conclusion is provided in the last section.

Section snippets

The proposed stereo matching algorithm

Similar to a basic local stereo matching development, the proposed algorithm involves the steps as explained in Section 1. The block diagram of the proposed algorithm is shown by Fig. 1. The new matching cost computation is using a combination of three per-pixel difference measurements with adjustment element. The cost aggregation is implemented using the iterative GF. Then, the optimization step uses a winner-take-all (WTA) strategy. The WTA strategy absorbs the minimal aggregated

Experimental results

Experiments were carried out to evaluate the performance of the proposed algorithm. Three different datasets have been used. They are the indoor scenes from the Middlebury V3 database [45], the outdoor dataset from the KITTI database [46] and images from Universiti Sains Malaysia Laboratory (USMLab). The quantitative results of experiments in this work will be presented by using the on-line Middlebury database. This is obtained by uploading the disparity maps into their on-line system. The

Conclusion

In this work, the new local-based stereo matching algorithm is presented. The proposed algorithm is able to reduce the errors and increase the accuracy based on the Middlebury and KITTI datasets. The combination of matching cost functions based on (AD + GM + CN) is able to overcome the disadvantage of a single matching cost. Furthermore, the proposed iterative GF is able to improve and preserve the edges. The usage of undirected graph segmentation with plane-fitting process increases the robustness

Acknowledgement

This work was supported by Universiti Sains Malaysia (No: PLD-0025/13(R)) and Universiti Teknikal Malaysia Melaka.

References (54)

  • D. Scharstein et al.

    A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

    Int. J. Comput. Vision

    (2002)
  • M.G. Mozerov et al.

    Accurate stereo matching by two-step energy minimization

    IEEE T. Image Process.

    (2015)
  • H.Q. Wang et al.

    Effective stereo matching using reliable points based graph cut

  • H. Hirschmüller

    Stereo processing by semiglobal matching and mutual information

    IEEE T. Pattern Anal.

    (2008)
  • K. Wenzel et al.

    Image acquisition and model selection for multi-view stereo

    Int. Arch. Photogram. Rem. Sens. Spatial. Inform. Sci.

    (2013)
  • F. Bethmann et al.

    Semi-global matching in object space

    Int. Arch. Photogram. Rem. Sens. Spatial. Inform. Sci.

    (2015)
  • S.N. Sinha et al.

    Efficient high-resolution stereo matching using local plane sweeps

  • G.W. Zheng et al.

    A fast stereo matching algorithm based on fixed-window

    Appl. Mech. Mater.

    (2013)
  • H. Hirschmller et al.

    Real-time correlation-based stereo vision with reduced border errors

    Int. J. Comput. Vision

    (2002)
  • S. Zhu et al.

    Local stereo matching using combined matching cost and adaptive cost aggregation

    KSII T. Internet Inf.

    (2015)
  • C. Yang et al.

    Real-time hardware stereo matching using guided image filter

  • X. Mei et al.

    Segment-tree based cost aggregation for stereo matching

  • K.-J. Yoon et al.

    Adaptive support-weight approach for correspondence search

    IEEE T. Pattern Anal.

    (2006)
  • A. Hosni et al.

    Real-time local stereo matching using guided image filtering

  • K. He et al.

    Guided image filtering

  • K. He et al.

    Guided image filtering

    IEEE Trans. Pattern Anal. Machine Intell.

    (2013)
  • J. Zhang et al.

    Real-time gpu-based local stereo matching method

  • Cited by (75)

    • Accurate edge-preserving stereo matching by enhancing anisotropy

      2023, Signal Processing: Image Communication
    • Efficient local stereo matching algorithm based on fast gradient domain guided image filtering

      2021, Signal Processing: Image Communication
      Citation Excerpt :

      An improved NLCA version [17] for constructing a more reliable ST structure was proposed and achieved better results. In edge-aware filtering based methods [7,10,18–21,23], cost aggregation can be regarded as a filtering process. Edge-aware filtering methods perform once for each disparity level in the process of cost aggregation.

    • State-of-the-Art Binocular Image Dense Matching Method

      2023, Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University
    • Anisotropic stereo matching with multi-scale information

      2023, Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Zicheng Liu.

    View full text