Skip to main content

EAI-Stereo: Error Aware Iterative Network for Stereo Matching

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Abstract

Current state-of-the-art stereo algorithms use a 2D CNN to extract features and then form a cost volume, which is fed into the following cost aggregation and regularization module composed of 2D or 3D CNNs. However, a large amount of high-frequency information like texture, color variation, sharp edge etc. is not well exploited during this process, which leads to relatively blurry and lacking detailed disparity maps. In this paper, we aim at making full use of the high-frequency information from the original image. Towards this end, we propose an error-aware refinement module that incorporates high-frequency information from the original left image and allows the network to learn error correction capabilities that can produce excellent subtle details and sharp edges. In order to improve the data transfer efficiency between our iterations, we propose the Iterative Multiscale Wide-LSTM Network which could carry more semantic information across iterations. We demonstrate the efficiency and effectiveness of our method on KITTI 2015, Middlebury, and ETH3D. At the time of writing this paper, EAI-Stereo ranks \({1^{st}}\) on the Middlebury leaderboard and \({1^{st}}\) on the ETH3D Stereo benchmark for 50% quantile metric and second for 0.5px error rate among all published methods. Our model performs well in cross-domain scenarios and outperforms current methods specifically designed for generalization. Code is available at https://github.com/David-Zhao-1997/EAI-Stereo.

H. Zhao and H. Zhou—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bleyer, M., Gelautz, M.: Simple but effective tree structures for dynamic programming-based stereo matching. In: International Conference on Computer Vision Theory and Applications, vol. 2, pp. 415–422. SCITEPRESS (2008)

    Google Scholar 

  2. Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)

    Google Scholar 

  3. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, pp. 103–111. Association for Computational Linguistics, October 2014. https://doi.org/10.3115/v1/W14-4012

  4. Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

    Google Scholar 

  5. Dinh, V.Q., Munir, F., Sheri, A.M., Jeon, M.: Disparity estimation using stereo images with different focal lengths. IEEE Trans. Intell. Transp. Syst. 21(12), 5258–5270 (2019)

    Article  Google Scholar 

  6. Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)

    Google Scholar 

  7. Du, X., El-Khamy, M., Lee, J.: AmNet: deep atrous multiscale stereo disparity estimation networks. arXiv preprint arXiv:1904.09099 (2019)

  8. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vision 70(1), 41–54 (2006)

    Article  Google Scholar 

  9. Fife, W.S., Archibald, J.K.: Improved census transforms for resource-optimized stereo vision. IEEE Trans. Circuits Syst. Video Technol. 23(1), 60–73 (2012)

    Article  Google Scholar 

  10. Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  12. Heo, Y.S., Lee, K.M., Lee, S.U.: Robust stereo matching using adaptive normalized cross-correlation. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 807–822 (2010)

    Google Scholar 

  13. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)

    Article  Google Scholar 

  14. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  15. Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2012)

    Article  Google Scholar 

  16. Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

    Google Scholar 

  17. Ji, P., Li, J., Li, H., Liu, X.: Superpixel alpha-expansion and normal adjustment for stereo matching. J. Vis. Commun. Image Represent. 79, 103238 (2021)

    Article  Google Scholar 

  18. Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75 (2017)

    Google Scholar 

  19. Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition (ICPR 2006), vol. 3, pp. 15–18. IEEE (2006)

    Google Scholar 

  20. Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 508–515. IEEE (2001)

    Google Scholar 

  21. Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicative LSTM for sequence modelling. arXiv preprint arXiv:1609.07959 (2016)

  22. Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16263–16272 (2022)

    Google Scholar 

  23. Lipson, L., Teed, Z., Deng, J.: Raft-stereo: multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV), pp. 218–227. IEEE (2021)

    Google Scholar 

  24. Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)

    Google Scholar 

  25. Melis, G., Kočiskỳ, T., Blunsom, P.: Mogrifier lstm. arXiv preprint arXiv:1909.01792 (2019)

  26. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015)

    Google Scholar 

  27. Neshatpour, K., Behnia, F., Homayoun, H., Sasan, A.: ICNN: an iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 551–556. IEEE (2018)

    Google Scholar 

  28. Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3

    Chapter  Google Scholar 

  29. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47(1), 7–42 (2002)

    Article  MATH  Google Scholar 

  30. Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  31. Shen, Z., Dai, Y., Rao, Z.: CFNet: cascade and fused cost volume for robust stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915 (2021)

    Google Scholar 

  32. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)

    Google Scholar 

  33. Sun, J., Zheng, N.N., Shum, H.Y.: Stereo matching using belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 25(7), 787–800 (2003)

    Article  MATH  Google Scholar 

  34. Taniai, T., Matsushita, Y., Sato, Y., Naemura, T.: Continuous 3d label stereo matching using local expansion moves. IEEE Trans. Pattern Anal. Mach. Intell. 40(11), 2725–2739 (2017)

    Article  Google Scholar 

  35. Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: HitNet: hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14362–14372 (2021)

    Google Scholar 

  36. Teed, Z., Deng, J.: RAFT: Recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  37. Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2019)

    Google Scholar 

  38. Wang, F., Galliani, S., Vogel, C., Pollefeys, M.: IterMVS: iterative probability estimation for efficient multi-view stereo (2022)

    Google Scholar 

  39. Wang, H., Fan, R., Cai, P., Liu, M.: Pvstereo: pyramid voting module for end-to-end self-supervised stereo matching. IEEE Robot. Autom. Lett. 6(3), 4353–4360 (2021)

    Article  Google Scholar 

  40. Xu, G., Cheng, J., Guo, P., Yang, X.: ACVNet: attention concatenation volume for accurate and efficient stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  41. Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1959–1968 (2020)

    Google Scholar 

  42. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47

    Chapter  Google Scholar 

  43. Yin, Z., Darrell, T., Yu, F.: Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6044–6053 (2019)

    Google Scholar 

  44. Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)

    Article  Google Scholar 

  45. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  46. Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1592–1599 (2015)

    Google Scholar 

  47. Zhang, F., Prisacariu, V., Yang, R., Torr, P.S.: GA-Net: Guided aggregation net for end-to-end stereo matching. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 185–194. IEEE Computer Society, June 2019. https://doi.org/10.1109/CVPR.2019.00027, https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00027

  48. Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Domain-invariant stereo matching networks. LNCS, vol. 12347, pp. 420–439. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_25

    Chapter  Google Scholar 

  49. Zhang, Y., et al.: Adaptive unimodal cost volume filtering for deep stereo matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12926–12934 (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by Shenzhen Fundamental Research Program (JCYJ20180503182133411).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongjun Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, H., Zhou, H., Zhang, Y., Zhao, Y., Yang, Y., Ouyang, T. (2023). EAI-Stereo: Error Aware Iterative Network for Stereo Matching. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26319-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26318-7

  • Online ISBN: 978-3-031-26319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics