Robust non-local stereo matching for outdoor driving images using segment-simple-tree

doi:10.1016/j.image.2015.09.012

Signal Processing: Image Communication

Volume 39, Part A, November 2015, Pages 173-184

https://doi.org/10.1016/j.image.2015.09.012 Get rights and content

Highlights

•
We propose a non-local stereo matching algorithm for driver assistance systems.
•
The disparity characteristics of outdoor driving images are demonstrated by analysis.
•
We introduce segment-simple-tree that is more adequate for outdoor driving images than minimum spanning tree.
•
Qualitative and quantitative evaluation to existing methods is provided over three datasets.

Abstract

Non-local cost aggregation has recently emerged as a promising approach for stereomatching and has attracted much interest over the past few years. Most non-local algorithms are reportedly better than state-of-the-art local algorithms for high-quality indoor images. However, the accuracy of non-local algorithms is still limited for outdoor images. Computing disparity maps for outdoor images in driver assistance systems is one of the most actively researched topics in the field of stereo vision. In this paper, we present a robust non-local stereo matching algorithm that improves the performance of non-local approaches for outdoor driving images. The proposed algorithm is inspired by the non-local cost aggregation method based on a minimum spanning tree, and it improves the estimation accuracy by introducing an alternate, effective segment-simple-tree that is more adequate for outdoor driving images than the minimum spanning tree. Experimental results showed that the proposed algorithm is superior to the existing local and non-local algorithms, and is comparable to semi-global matching.

Introduction

Binocular stereo matching is one of the most important algorithms in computer vision because it provides computers with the depth perception ability similar to that of humans. The goal is to estimate the disparity maps from two rectified images of the same scene taken from left and right viewpoints. Binocular stereo matching has been extensively studied over the past few decades, and numerous algorithms have been proposed [1], [2], [3]. Nevertheless, it is still an active area of research because new challenges have surfaced and there are a variety of problems that have not yet been solved. Among these challenges, computing the disparity maps for outdoor images in driver assistance systems (DAS) is one of the most actively researched topics in this field [4], [5], [6].

In general, a stereo matching algorithm consists of four steps: cost computation, cost aggregation, disparity optimization, and post-processing. In this paper, we mainly focus on the cost aggregation step since it has the greatest impact on the accuracy of the estimated disparity map. Most existing cost aggregation algorithms define a local 2-D support window for each pixel and perform summing/averaging operations using the information obtained from the pixels inside of that window. State-of-the-art local algorithms include the adaptive support-weight approach [7], geodesic diffusion [8], and fast cost-volume filtering [9]. These were respectively inspired by bilateral filtering [10], anisotropic diffusion [11], and guided image filtering [12], which are the three most well-known edge-aware image filtering methods.

Semi-global matching (SGM) [13] is one of the best performing stereo matching algorithms for outdoor driving images [4]. SGM performs the cost aggregation step using several global paths instead of the local pixel similarities used in the local approach. Even though this is an optimization-based approach that uses dynamic programming for each path, it can still be considered as a cost aggregation step [14].

In the past few years, non-local cost aggregation has recently emerged as a promising approach since the accuracy and the computation are reportedly better and faster than existing local algorithms. The key idea behind its success is that it aggregates the matching cost using the information from all of the pixels in the image rather than only those inside a specific window, as the aforementioned local algorithms do [7], [8], [9]. Cigla et al. [15] proposed an information permeability algorithm with separable successive weighted summation along the horizontal and vertical directions, and the improved spatio-temporal information permeability filtering is presented in [16]. Pham et al. [17] employed a sequence of 1-D filters to reduce the computation and memory costs relative to conventional 2-D filters by applying domain transform [18].

Yang׳s method [19], [20], the most notable non-local algorithm, carries out the cost aggregation on a minimum spanning tree (MST). The MST is constructed using a subset of edges whose sum of edge-weights is minimal, and the matching cost is then aggregated on this tree within two passes, leaf-to-root and root-to-leaf. Mei et al. [21] improved upon this algorithm by incorporating image segmentation [22] in the tree construction procedure. This results in the new tree structure, which is called segment-tree (ST). Although these methods generate accurate disparity maps for indoor images as shown in [19], [20], [21], they fail to do the same for outdoor driving images. Both MST and ST are not adequate for outdoor driving images, as will be demonstrated by analysis in the later sections. In this paper, we present a robust non-local algorithm that can meet this challenge. Specifically, we perform cost aggregation on an alternate, effective segment-simple-tree (SST) that is more suitable for outdoor driving images than MST and ST.

The remainder of this paper is structured as follows. Section 2 presents the non-local cost aggregation on a MST, which is the inspiration for our algorithm. Section 3 presents the proposed algorithm. Section 4 presents the cost computation, disparity optimization and post-processing. Section 5 presents the experimental results that compare our algorithm to those proposed in prior literature. Section 6 provides the conclusions for this paper.

Section snippets

Non-local cost aggregation on MST

In this section, we present the MST-based non-local algorithm [19], [20], which directly inspired our algorithm. Specifically, we summarize the construction of the MST and the two-pass cost aggregation. The proposed algorithm performs a two-pass cost aggregation similar to that of the MST. However, the tree is constructed differently by employing an alternate, effective segment-simple-tree structure (SST) that is more suitable than MST for outdoor driving images.

To begin, we denote $G = (V, E)$ as a

Analysis of outdoor driving images

In this subsection, we first analyze the disparity characteristics of outdoor driving images to understand the key idea behind the SST proposed for non-local stereo matching in DAS. Fig. 2 illustrates this analysis using a synthesized stereo image pair obtained from the EISATS dataset [4]. Fig. 2(a) and (b) shows the left image and the corresponding ground truth data, respectively. To examine the disparity characteristic on the road surface, we plot the disparities on the road regions in the

Cost computation

As previously mentioned, cost computation is the first step of a general four-step stereo matching pipeline. Cost computation is essential and strongly influences the accuracy of the disparity results. There are several existing metrics for computing matching costs; these metrics include truncated absolute difference (TAD), a combination of TAD using intensity and gradient [9], mutual information (MI) [13], and Census [30]. TAD is one of the simplest measures that has been widely used, but it

Experimental setup

We compared the proposed non-local algorithm using SST (NL-SST) with the state-of-the-art cost aggregation algorithms including cost-volume filtering (CostFilter) [9], domain transform-based cost aggregation (DTAggr) [17], non-local cost aggregation on a MST (NL-MST) [19], non-local cost aggregation on a segment-tree (NL-ST) [21], and semi-global matching (SGM) [13]. The reasons for this choice are as follows: (1) CostFilter is a state-of-the-art local algorithm, (2) DTAggr, NL-MST, and NL-ST

Conclusions

In this paper, we presented an effective segment-simple-tree for non-local stereo matching in driver assistance systems. In contrast to the original non-local algorithm that only builds a single MST for the entire image, our approach constructs multiple MSTs for non-road segments and some simple-trees for the segments on the road. The cost aggregation is carried out on the proposed SST in two passes in a manner similar to that of the original approach using MST. The advantages of SST are

Acknowledgments

This work was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1402-12.

References (32)

N. Kiryati et al.
A probabilistic hough transform
Pattern Recognit.
(1991)
D. Scharstein et al.
A taxonomy and evaluation of dense two-frame stereo correspondence algorithms
Int. J. Comput. Vis.
(2002)
R. Szeliski
Computer Vision: Applications and Algorithms
(2011)
D. Scharstein, R. Szeliski, Middlebury Stereo Evaluation - Version 2 〈http://vision.middlebury.edu/stereo/eval〉,...
R. Klette et al.
Performance of correspondence algorithms in vision-based driver assistance using an online image sequence database
IEEE Trans. Veh. Technol.
(2011)
S. Meister, B. Jahne, D. Kondermann, Outdoor stereo camera system for the generation of real-world benchmark data sets,...
A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The KITTI vision benchmark suite, In: Proceedings...
K.-J. Yoon et al.
Adaptive support-weight approach for correspondence search
IEEE Trans. Pattern Anal. Mach. Intell.
(2006)
L. De-Maeztu et al.
Near real-time stereo matching using geodesic diffusion
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
C. Rhemann, A. Hosni, M. Bleyer, C. Rother, M. Gelautz, Fast Cost-Volume Filtering for Visual Correspondence and...

C. Tomasi, R. Manduchi, Bilateral filtering for gray and color images, In: Proceedings of the International Conference...

P. Penora et al.

Scale-space and edge detection using anisotropic diffusion

IEEE Trans. Pattern Anal. Mach. Intell.

(1990)

K. He, J. Sun, X. Tang, Guided image filtering, In: Proceedings of ECCV 2010 of LNCS, vol. 6311, 2010, pp....

H. Hirschmuller

Stereo processing by semiglobal matching and mutual information

IEEE Trans. Pattern Anal. Mach. Intell.

(2008)

M. Bleyer, C. Breiteneder, Stereo matching—state-of-the-art and research challenges, In: Advanced Topics in Computer...

C. Cigla, A.A. Alatan, Efficient edge-preserving stereo matching, In: ICCV Workshop on LDRMV, 2011, pp....

Cited by (14)

Efficient and robust unsupervised inverse intensity compensation for stereo image registration under radiometric changes
2021, Signal Processing: Image Communication
Citation Excerpt :
Image registration is one of the most important technologies in computer vision research field. After years of development, image registration technology has made important applications in augmented reality (AR), autonomous navigation, medical image processing, 3-D reconstruction, 3-D scanning, dense mapping, and other fields [1–4]. Registration refers to the identification of the pixel in each image, from a collection of the same scene, which corresponds to the same physical point [5].
Image registration is a challenging problem for computer vision, and accurate and effective image registration is still required in various computer vision applications, e.g., 3-D scanning, autonomous navigation, and augmented reality. However, image registration becomes difficult due to the presence of noise and photometric changes. This paper presents a novel image registration method with unsupervised inverse intensity compensation (ICIR). This methodology uses weighted vectors to compensate for areas affected by radiometric variations. This is a 5-D vector body composed of RGB, brightness, and gradient, that is, each pixel is represented by a 5-D vector in its neighborhood. When performing image registration, the vector angle metric robust to illumination effect is used to calculate cost volumes. Then the selected cost metrics are aggregated based on RGB-Gradient tree structure. Experiments performed on stereo images of the Middlebury datasets and ours demonstrate this methodology in calculation accuracy and time all have good performance.
Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks
2017, Signal Processing: Image Communication
Object proposals have recently emerged as an essential cornerstone for object detection. The current state-of-the-art object detectors employ object proposals to detect objects within a modest set of candidate bounding box proposals instead of exhaustively searching across an image using the sliding window approach. However, achieving high recall and good localization with few proposals is still a challenging problem. The challenge becomes even more difficult in the context of autonomous driving, in which small objects, occlusion, shadows, and reflections usually occur. In this paper, we present a robust object proposals re-ranking algorithm that effectivity re-ranks candidates generated from a customized class-independent 3DOP (3D Object Proposals) method using a two-stream convolutional neural network (CNN). The goal is to ensure that those proposals that accurately cover the desired objects are amongst the few top-ranked candidates. The proposed algorithm, which we call DeepStereoOP, exploits not only RGB images as in the conventional CNN architecture, but also depth features including disparity map and distance to the ground. Experiments show that the proposed algorithm outperforms all existing object proposal algorithms on the challenging KITTI benchmark in terms of both recall and localization. Furthermore, the combination of DeepStereoOP and Fast R-CNN achieves one of the best detection results of all three KITTI object classes.
A New Fuzzy Smoothing Term Model For Stereo Matching
2024, Computer Journal
A novel cell structure-based disparity estimation for unsupervised stereo matching
2022, IET Image Processing
Depth Map Information from Stereo Image Pairs using Deep Learning and Bilateral Filter for Machine Vision Application
2022, 2022 IEEE 5th International Symposium in Robotics and Manufacturing Automation, ROMA 2022
Accurate and Efficient Stereo Matching by Log-Angle and Pyramid-Tree
2021, IEEE Transactions on Circuits and Systems for Video Technology

View all citing articles on Scopus

View full text

Robust non-local stereo matching for outdoor driving images using segment-simple-tree

Highlights

Abstract

Introduction

Section snippets

Non-local cost aggregation on MST

Analysis of outdoor driving images

Cost computation

Experimental setup

Conclusions

Acknowledgments

Pattern Recognit.

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

Int. J. Comput. Vis.

Computer Vision: Applications and Algorithms

Performance of correspondence algorithms in vision-based driver assistance using an online image sequence database

IEEE Trans. Veh. Technol.

Adaptive support-weight approach for correspondence search

IEEE Trans. Pattern Anal. Mach. Intell.

Near real-time stereo matching using geodesic diffusion

IEEE Trans. Pattern Anal. Mach. Intell.

Scale-space and edge detection using anisotropic diffusion

IEEE Trans. Pattern Anal. Mach. Intell.

Stereo processing by semiglobal matching and mutual information

IEEE Trans. Pattern Anal. Mach. Intell.