Skip to main content

Stereo Matching—State-of-the-Art and Research Challenges

  • Chapter
Advanced Topics in Computer Vision

Abstract

Stereo matching denotes the problem of finding dense correspondences in pairs of images in order to perform 3D reconstruction. In this chapter, we provide a review of stereo methods with a focus on recent developments and our own work. We start with a discussion of local methods and introduce our algorithms: geodesic stereo, cost filtering and PatchMatch stereo. Although local algorithms have recently become very popular, they are not capable of handling large untextured regions where a global smoothness prior is required. In the discussion of such global methods, we briefly describe standard optimization techniques. However, the real problem is not in the optimization, but in finding an energy function that represents a good model of the stereo problem. In this context, we investigate data and smoothness terms of standard energies to find the best-suited implementations of which. We then describe our own work on finding a good model. This includes our combined stereo and matting approach, Surface Stereo, Object Stereo as well as a new method that incorporates physics-based reasoning in stereo matching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The distance between the cameras is thereby referred to as the stereo baseline.

  2. 2.

    Rectification can be accomplished using standard methods (e.g., [29]), once the stereo camera system has been calibrated.

  3. 3.

    This is the reason why local algorithms are also sometimes referred to as area-based methods in literature.

  4. 4.

    It is interesting to note that the majority of recent submissions to the Middlebury benchmark [67] are adaptive support weight techniques.

  5. 5.

    Hirschmüller stresses a relationship to local methods. He calls the method semi-global matching, as he uses the aggregation step of local methods, but aggregates (global) DP path costs instead of pixel dissimilarities of spatially close pixels.

  6. 6.

    It is likely that the approach would run in real time if implemented on a modern GPU.

  7. 7.

    Note that although we speak about disparity, these move making algorithms can be applied to arbitrary computer vision labeling problems outside stereo vision.

  8. 8.

    This is why these algorithms are referred to as graph-cuts.

  9. 9.

    Due to the np-hardness of the problem, QPBO can only guarantee to find a part of the global optimal solution and will, in general, leave a subset of pixels unlabeled. The autarky property of QPBO guarantees that by assigning unlabeled pixels to the disparities of proposal 1 the energy of the fusion result will be lower or the same as that of proposal 1. Alternative ways for handling these unlabeled pixels are QPBO-I and QPBO-P [64].

  10. 10.

    In practice, it is sufficient to compute the gradient in x-direction, as vertical edges contain more disparity information than horizontal ones.

  11. 11.

    Hirschmüller [32] uses a hierarchical approach to speed up this iterative procedure.

  12. 12.

    In our study, the extension of AD and SD to color works by computing AD and SD for each color channel separately. We then sum up the differences over the 3 color channels.

  13. 13.

    The stereo problem is symmetrical.

  14. 14.

    In general, such a construction works if the smoothness term is convex [40].

  15. 15.

    Note that in [81], the smoothness term is also truncated to make it discontinuity preserving.

  16. 16.

    There are also alternative methods (e.g., cooperative cuts [41]).

  17. 17.

    It is interesting to note some similarity to cost filter approaches discussed in Sect. 6.2.

  18. 18.

    This term forms a higher-order clique in the optimization step. In general, optimization of such higher-order cliques is intractable. We take advantage of the fact that these cliques are sparse, i.e., only one state generates 0 costs and all other states generate constant costs. There exist graph-cut based algorithms that can “efficiently” optimize such sparse higher-order cliques [44, 65] and we make use of them.

  19. 19.

    By optimal we mean the solution that leads to the largest energy decrease according to our energy function among all possible fusion moves.

  20. 20.

    Small number means that we apply the MDL term above to penalize the number of objects.

References

  1. Agarwal S, Snavely N, Simon I, Seitz S, Szeliski R (2009) Building Rome in a day. In: ICCV

    Google Scholar 

  2. Baker S, Szeliski R, Anandan P (1998) A layered approach to stereo reconstruction. In: CVPR, pp 434–441

    Google Scholar 

  3. Baker S, Scharstein D, Lewis J, Roth S, Black M, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92(1):1–31

    Article  Google Scholar 

  4. Barnes C, Shechtman E, Finkelstein A, Goldman D (2009) PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24 (SIGGRAPH Proc)

    Article  Google Scholar 

  5. Birchfield S, Tomasi C (1998) A pixel dissimilarity measure that is insensitive to image sampling. IEEE Trans Pattern Anal Mach Intell 20(4):401–406

    Article  Google Scholar 

  6. Birchfield S, Tomasi C (1999) Depth discontinuities by pixel-to-pixel stereo. Int J Comput Vis 35(3):269–293

    Article  Google Scholar 

  7. Bleyer M, Chambon S (2010) Does color really help in dense stereo matching? In: International symposium on 3D data processing, visualization and transmission (3DPVT)

    Google Scholar 

  8. Bleyer M, Gelautz M (2005) A layered stereo matching algorithm using image segmentation and global visibility constraints. ISPRS J Photogramm Remote Sens 59(3):128–150

    Article  Google Scholar 

  9. Bleyer M, Gelautz M (2007) Graph-cut-based stereo matching using image segmentation with symmetrical treatment of occlusions. Signal Process Image Commun 22(2):127–143

    Article  Google Scholar 

  10. Bleyer M, Gelautz M (2008) Simple but effective tree structures for dynamic programming-based stereo matching. In: VISAPP, vol 2, pp 415–422

    Google Scholar 

  11. Bleyer M, Chambon S, Poppe U, Gelautz M (2008) Evaluation of different methods for using colour information in global stereo matching. In: International archives of the photogrammetry, remote sensing and spatial information sciences, vol XXXVII, pp 415–422

    Google Scholar 

  12. Bleyer M, Gelautz M, Rother C, Rhemann C (2009) A stereo approach that handles the matting problem via image warping. In: CVPR, pp 501–508

    Google Scholar 

  13. Bleyer M, Rother C, Kohli P (2010) Surface stereo with soft segmentation. In: CVPR

    Google Scholar 

  14. Bleyer M, Rhemann C, Rother C (2011) PatchMatch stereo—stereo matching with slanted support windows. In: BMVC

    Google Scholar 

  15. Bleyer M, Rother C, Kohli P, Scharstein D, Sinha S (2011) Object stereo—joint stereo matching and object segmentation. In: CVPR

    Google Scholar 

  16. Bleyer M, Rhemann C, Rother C (2012) Extracting 3D scene-consistent object proposals and depth from stereo images. In: ECCV

    Google Scholar 

  17. Bobick A, Intille S (1999) Large occlusion stereo. Int J Comput Vis 33(3):181–200

    Article  Google Scholar 

  18. Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239

    Article  Google Scholar 

  19. Carreira J, Li F, Sminchisescu C (2012) Object recognition by sequential figure-ground ranking. Int J Comput Vis 98(3):243–262

    Article  MathSciNet  Google Scholar 

  20. Deng Y, Yang Q, Lin X, Tang X (2005) A symmetric patch-based correspondence model for occlusion handling. In: ICCV, pp 542–567

    Google Scholar 

  21. Faugeras O, Hotz B, Mathieu H, Viéville T, Zhang Z, Fua P, Théron E, Moll L, Berry G, Vuillemin J, Bertin P, Proy C (1996) Real time correlation based stereo: algorithm implementations and applications. Technical report, RR-2013, INRIA

    Google Scholar 

  22. Felzenszwalb P, Huttenlocher D (2006) Efficient belief propagation for early vision. Int J Comput Vis 70(1):41–54

    Article  Google Scholar 

  23. Fua PV (1991) Combining stereo and monocular information to compute dense depth maps that preserve depth discontinuities. In: International joint conference on artificial intelligence, pp 1292–1298

    Google Scholar 

  24. Fusiello A, Roberto V, Trucco E (1997) Efficient stereo with multiple windowing. In: CVPR, pp 858–863

    Google Scholar 

  25. Gallup D, Frahm J, Mordohai P, Yang Q, Pollefeys M (2007) Real-time plane-sweeping stereo with multiple sweeping directions. In: CVPR

    Google Scholar 

  26. Gehrig S, Franke U (2007) Improving sub-pixel accuracy for long range stereo. In: ICCV VRML workshop

    Google Scholar 

  27. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR

    Google Scholar 

  28. Gupta A, Efros A, Hebert M (2010) Blocks world revisited: image understanding using qualitative geometry and mechanics. In: ECCV

    Google Scholar 

  29. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision

    Google Scholar 

  30. Hasinoff S, Kang SB, Szeliski R (2006) Boundary matting for view synthesis. Comput Vis Image Underst 103(1):22–32

    Article  Google Scholar 

  31. He K, Sun J, Tang X (2010) Guided image filtering. In: ECCV

    Google Scholar 

  32. Hirschmüller H (2005) Accurate and efficient stereo processing by semi-global matching and mutual information. In: CVPR, vol 2, pp 807–814

    Google Scholar 

  33. Hirschmüller H, Scharstein D (2009) Evaluation of stereo matching costs on images with radiometric differences. IEEE Trans Pattern Anal Mach Intell 31:1582–1599

    Article  Google Scholar 

  34. Hirschmüller H, Innocent P, Garibaldi J (2002) Real-time correlation-based stereo vision with reduced border errors. Int J Comput Vis 47:229–246

    Article  MATH  Google Scholar 

  35. Hong L, Chen G (2004) Segment-based stereo matching using graph cuts. In: CVPR, vol 1, pp 74–81

    Google Scholar 

  36. Hosni A, Bleyer M, Gelautz M, Rhemann C (2009) Local stereo matching using geodesic support weights. In: ICIP

    Google Scholar 

  37. Hosni A, Bleyer M, Gelautz M (2010) Near real-time stereo with adaptive support weight approaches. In: 3DPVT

    Google Scholar 

  38. Hosni A, Rhemann C, Bleyer M, Gelautz M (2011) Temporally consistent disparity and optical flow via efficient spatio-temporal filtering. In: PSIVT

    Google Scholar 

  39. Hosni A, Rhemann C, Bleyer M, Rother C, Gelautz M (2013) Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans Pattern Anal Mach Intell 35(2):504–511

    Article  Google Scholar 

  40. Ishikawa H (2000) Global optimization using embedded graphs. PhD thesis, New York University

    Google Scholar 

  41. Jegelka S, Bilmes J (2011) Submodularity beyond submodular energies: coupling edges in graph cuts. In: CVPR

    Google Scholar 

  42. Ju M, Kang H (2009) Constant time stereo matching. In: MVIP, pp 13–17

    Google Scholar 

  43. Klaus A, Sormann M, Karner K (2006) Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: ICPR, pp 15–18

    Google Scholar 

  44. Kohli P, Kumar M, Torr P (2007) P3 & beyond: solving energies with higher order cliques. In: CVPR

    Google Scholar 

  45. Kolmogorov V, Rother C (2007) Minimizing non-submodular functions with graph cuts—a review. IEEE Trans Pattern Anal Mach Intell 29(7):1274–1279

    Article  Google Scholar 

  46. Kolmogorov V, Zabih R (2002) Computing visual correspondence with occlusions using graph cuts. In: ICCV, vol 2, pp 508–515

    Google Scholar 

  47. Kolmogorov V, Zabih R (2002) Multi-camera scene reconstruction via graph cuts. In: ECCV

    Google Scholar 

  48. Kolmogorov V, Zabih R (2004) What energy functions can be minimized via graph cuts? IEEE Trans Pattern Anal Mach Intell 26(2):147–159

    Article  Google Scholar 

  49. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with gaussian edge potentials. In: Advances in neural information processing systems

    Google Scholar 

  50. Lempitsky V, Rother C, Blake A (2007) Logcut—efficient graph cut optimization for Markov random fields. In: ICCV

    Google Scholar 

  51. Li G, Zucker SW (2006) Surface geometric constraints for stereo in belief propagation. In: CVPR, pp 2355–2362

    Google Scholar 

  52. Lin M, Tomasi C (2003) Surfaces with occlusions from layered stereo. In: CVPR, pp 710–717

    Google Scholar 

  53. Mei X, Sun X, Zhou M, Jiao S, Wang H, Zhang X (2011) On building an accurate stereo matching system on graphics hardware. In: GPUCV, pp 467–474

    Google Scholar 

  54. Meltzer T, Yanover TC, Weiss Y (2005) Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation. In: ICCV, pp 428–435

    Google Scholar 

  55. Mühlmann K, Maier D, Hesser J, Männer R (2002) Calculating dense disparity maps from color stereo images, an efficient implementation. Int J Comput Vis 47(1):79–88

    Article  MATH  Google Scholar 

  56. Newcombe R, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A, Kohli P, Shotton J, Hodges S, Fitzgibbon A (2011) KinectFusion: real-time dense surface mapping and tracking. In: ISMAR

    Google Scholar 

  57. Ogale AS, Aloimonos Y (2004) Stereo correspondence with slanted surfaces: critical implications of horizontal slant. In: CVPR, pp 568–573

    Google Scholar 

  58. Ohta Y, Kanade T (1985) Stereo by intra- and inter-scanline search. IEEE Trans Pattern Anal Mach Intell 7(2):139–154

    Article  Google Scholar 

  59. Paris S, Durandi F (2009) A fast approximation of the bilateral filter using a signal processing approach. Int J Comput Vis 81:24–52

    Article  Google Scholar 

  60. Porikli F (2005) Integral histogram: a fast way to extract histograms in cartesian spaces. In: CVPR, vol 1, pp 829–836

    Google Scholar 

  61. Rhemann C, Hosni A, Bleyer M, Rother C, Gelautz M (2011) Fast cost-volume filtering for visual correspondence and beyond. In: CVPR

    Google Scholar 

  62. Richardt C, Orr D, Davies I, Criminisi A, Dodgson NA (2010) Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In: ECCV, vol 6313, pp 510–523

    Google Scholar 

  63. Rother C, Kolmogorov V, Blake A (2004) GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23:309–314

    Article  Google Scholar 

  64. Rother C, Kolmogorov V, Lempitsky V, Szummer M (2007) Optimizing binary MRFs via extended roof duality. In: CVPR

    Google Scholar 

  65. Rother C, Kohli P, Feng W, Jia J (2009) Minimizing sparse higher order energy functions of discrete variables. In: CVPR, pp 1382–1389

    Google Scholar 

  66. Roy S, Cox I (1998) A maximum-flow formulation of the n-camera stereo correspondence problem. In: ICCV, pp 492–499

    Google Scholar 

  67. Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1/2/3):7–42. http://vision.middlebury.edu/stereo/

    Article  MATH  Google Scholar 

  68. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: CVPR

    Google Scholar 

  69. Smith B, Zhang L, Jin H (2009) Stereo matching with nonparametric smoothness priors in feature space. In: CVPR, pp 485–492

    Google Scholar 

  70. Snavely N, Seitz S, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25:835–846 (SIGGRAPH Proc)

    Article  Google Scholar 

  71. Sun J, Zheng NN, Shum HY (2003) Stereo matching using belief propagation. IEEE Trans Pattern Anal Mach Intell 25(7):787–800

    Article  Google Scholar 

  72. Sun J, Li Y, Kang SB, Shum HY (2005) Symmetric stereo matching for occlusion handling. In: CVPR, vol 25, pp 399–406

    Google Scholar 

  73. Szeliski R, Golland P (1998) Stereo matching with transparency and matting. In: ICCV, pp 517–525

    Google Scholar 

  74. Szeliski R, Zabih R, Scharstein D, Veksler O, Kolmogorov V, Agarwala A, Tappen M, Rother C (2006) A comparative study of energy minimization methods for Markov random fields. In: ECCV, vol 2, pp 19–26

    Google Scholar 

  75. Taguchi Y, Wilburn B, Zitnick L (2008) Stereo reconstruction with mixed pixels using adaptive over-segmentation. In: CVPR, pp 1–8

    Google Scholar 

  76. Tao H, Sawhney H, Kumar R (2001) A global matching framework for stereo computation. In: ICCV, pp 532–539

    Google Scholar 

  77. Tappen MF, Freeman WT (2003) Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters. In: ICCV, vol 2, pp 900–906

    Google Scholar 

  78. Veksler O (2002) Stereo correspondence with compact windows via minimum ratio cycle. IEEE Trans Pattern Anal Mach Intell 24(12):1654–1660

    Article  Google Scholar 

  79. Veksler O (2005) Stereo correspondence by dynamic programming on a tree. In: CVPR, pp 384–390

    Google Scholar 

  80. Wainwright M, Jaakkola T, Willsky A (2003) Tree reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In: AISTATS

    Google Scholar 

  81. Woodford O, Torr P, Reid I, Fitzgibbon A (2008) Global stereo reconstruction under second order smoothness priors. In: CVPR

    Google Scholar 

  82. Xiong W, Jia J (2007) Stereo matching on objects with fractional boundary. In: CVPR, pp 1–8

    Google Scholar 

  83. Yang Q, Wang L, Yang R, Wang S, Liao M, Nister D (2006) Real-time global stereo matching using hierarchical belief propagation. In: BMVC

    Google Scholar 

  84. Yang Q, Yang R, Davis J, Nister D (2007) Spatial-depth super resolution for range images. In: CVPR

    Google Scholar 

  85. Yang Q, Wang L, Yang R, Stewenius H, Nister D (2009) Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. IEEE Trans Pattern Anal Mach Intell 31(3):492–504

    Article  Google Scholar 

  86. Yoon KJ, Kweon IS (2005) Locally adaptive support-weight approach for visual correspondence search. In: CVPR

    Google Scholar 

  87. Zhang Y, Gong M, Yang Y (2008) Local stereo matching with 3D adaptive cost aggregation for slanted surface modeling and sub-pixel accuracy. In: ICPR

    Google Scholar 

  88. Zhang K, Lu J, Lafruit G (2009) Cross-based local stereo matching using orthogonal integral images. IEEE Trans Circuits Syst Video Technol 19:1073–1079

    Article  Google Scholar 

  89. Zhang K, Lafruit G, Lauwereins R, Gool L (2010) Joint integral histograms and its application in stereo matching. In: ICIP, pp 817–820

    Google Scholar 

  90. Zitnick L, Kang S, Uyttendaele M, Winder S, Szeliski R (2004) High-quality video view interpolation using a layered representation. ACM Trans Graph 23(3):600–608

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Vienna Science and Technology Fund (WWTF) under project ICT08-019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Bleyer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Bleyer, M., Breiteneder, C. (2013). Stereo Matching—State-of-the-Art and Research Challenges. In: Farinella, G., Battiato, S., Cipolla, R. (eds) Advanced Topics in Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5520-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5520-1_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5519-5

  • Online ISBN: 978-1-4471-5520-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics