EAI-Stereo: Error Aware Iterative Network for Stereo Matching

Zhao, Haoliang; Zhou, Huizhou; Zhang, Yongjun; Zhao, Yong; Yang, Yitong; Ouyang, Ting

doi:10.1007/978-3-031-26319-4_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13841))

Included in the following conference series:

Asian Conference on Computer Vision

535 Accesses
4 Citations

Abstract

Current state-of-the-art stereo algorithms use a 2D CNN to extract features and then form a cost volume, which is fed into the following cost aggregation and regularization module composed of 2D or 3D CNNs. However, a large amount of high-frequency information like texture, color variation, sharp edge etc. is not well exploited during this process, which leads to relatively blurry and lacking detailed disparity maps. In this paper, we aim at making full use of the high-frequency information from the original image. Towards this end, we propose an error-aware refinement module that incorporates high-frequency information from the original left image and allows the network to learn error correction capabilities that can produce excellent subtle details and sharp edges. In order to improve the data transfer efficiency between our iterations, we propose the Iterative Multiscale Wide-LSTM Network which could carry more semantic information across iterations. We demonstrate the efficiency and effectiveness of our method on KITTI 2015, Middlebury, and ETH3D. At the time of writing this paper, EAI-Stereo ranks \({1^{st}}\) on the Middlebury leaderboard and \({1^{st}}\) on the ETH3D Stereo benchmark for 50% quantile metric and second for 0.5px error rate among all published methods. Our model performs well in cross-domain scenarios and outperforms current methods specifically designed for generalization. Code is available at https://github.com/David-Zhao-1997/EAI-Stereo.

H. Zhao and H. Zhou—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bleyer, M., Gelautz, M.: Simple but effective tree structures for dynamic programming-based stereo matching. In: International Conference on Computer Vision Theory and Applications, vol. 2, pp. 415–422. SCITEPRESS (2008)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)
Google Scholar
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, pp. 103–111. Association for Computational Linguistics, October 2014. https://doi.org/10.3115/v1/W14-4012
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Dinh, V.Q., Munir, F., Sheri, A.M., Jeon, M.: Disparity estimation using stereo images with different focal lengths. IEEE Trans. Intell. Transp. Syst. 21(12), 5258–5270 (2019)
Article Google Scholar
Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Google Scholar
Du, X., El-Khamy, M., Lee, J.: AmNet: deep atrous multiscale stereo disparity estimation networks. arXiv preprint arXiv:1904.09099 (2019)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vision 70(1), 41–54 (2006)
Article Google Scholar
Fife, W.S., Archibald, J.K.: Improved census transforms for resource-optimized stereo vision. IEEE Trans. Circuits Syst. Video Technol. 23(1), 60–73 (2012)
Article Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Heo, Y.S., Lee, K.M., Lee, S.U.: Robust stereo matching using adaptive normalized cross-correlation. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 807–822 (2010)
Google Scholar
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2012)
Article Google Scholar
Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Ji, P., Li, J., Li, H., Liu, X.: Superpixel alpha-expansion and normal adjustment for stereo matching. J. Vis. Commun. Image Represent. 79, 103238 (2021)
Article Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75 (2017)
Google Scholar
Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition (ICPR 2006), vol. 3, pp. 15–18. IEEE (2006)
Google Scholar
Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 508–515. IEEE (2001)
Google Scholar
Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicative LSTM for sequence modelling. arXiv preprint arXiv:1609.07959 (2016)
Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16263–16272 (2022)
Google Scholar
Lipson, L., Teed, Z., Deng, J.: Raft-stereo: multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV), pp. 218–227. IEEE (2021)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)
Google Scholar
Melis, G., Kočiskỳ, T., Blunsom, P.: Mogrifier lstm. arXiv preprint arXiv:1909.01792 (2019)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015)
Google Scholar
Neshatpour, K., Behnia, F., Homayoun, H., Sasan, A.: ICNN: an iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 551–556. IEEE (2018)
Google Scholar
Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3
Chapter Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47(1), 7–42 (2002)
Article MATH Google Scholar
Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Shen, Z., Dai, Y., Rao, Z.: CFNet: cascade and fused cost volume for robust stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915 (2021)
Google Scholar
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)
Google Scholar
Sun, J., Zheng, N.N., Shum, H.Y.: Stereo matching using belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 25(7), 787–800 (2003)
Article MATH Google Scholar
Taniai, T., Matsushita, Y., Sato, Y., Naemura, T.: Continuous 3d label stereo matching using local expansion moves. IEEE Trans. Pattern Anal. Mach. Intell. 40(11), 2725–2739 (2017)
Article Google Scholar
Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: HitNet: hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14362–14372 (2021)
Google Scholar
Teed, Z., Deng, J.: RAFT: Recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2019)
Google Scholar
Wang, F., Galliani, S., Vogel, C., Pollefeys, M.: IterMVS: iterative probability estimation for efficient multi-view stereo (2022)
Google Scholar
Wang, H., Fan, R., Cai, P., Liu, M.: Pvstereo: pyramid voting module for end-to-end self-supervised stereo matching. IEEE Robot. Autom. Lett. 6(3), 4353–4360 (2021)
Article Google Scholar
Xu, G., Cheng, J., Guo, P., Yang, X.: ACVNet: attention concatenation volume for accurate and efficient stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1959–1968 (2020)
Google Scholar
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47
Chapter Google Scholar
Yin, Z., Darrell, T., Yu, F.: Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6044–6053 (2019)
Google Scholar
Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)
Article Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1592–1599 (2015)
Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.S.: GA-Net: Guided aggregation net for end-to-end stereo matching. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 185–194. IEEE Computer Society, June 2019. https://doi.org/10.1109/CVPR.2019.00027, https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00027
Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Domain-invariant stereo matching networks. LNCS, vol. 12347, pp. 420–439. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_25
Chapter Google Scholar
Zhang, Y., et al.: Adaptive unimodal cost volume filtering for deep stereo matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12926–12934 (2020)
Google Scholar

Download references

Acknowledgements

This work is supported by Shenzhen Fundamental Research Program (JCYJ20180503182133411).

Author information

Authors and Affiliations

State Key Laboratory of Public Big Data, Institute for Artificial Intelligence, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, Guizhou, China
Haoliang Zhao, Yongjun Zhang, Yong Zhao, Yitong Yang & Ting Ouyang
School of Physics and Optoelectronic Engineering, Guangdong University of Technology, Guangzhou, 510006, China
Huizhou Zhou
The Key Laboratory of Integrated Microsystems, Shenzhen Graduate School, Peking University, Beijing, China
Yong Zhao
Ghost-Valley AI Technology, Shenzhen, Guangdong, China
Haoliang Zhao, Huizhou Zhou & Yong Zhao

Authors

Haoliang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Huizhou Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yongjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yitong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongjun Zhang .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, H., Zhou, H., Zhang, Y., Zhao, Y., Yang, Y., Ouyang, T. (2023). EAI-Stereo: Error Aware Iterative Network for Stereo Matching. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-26319-4_1
Published: 04 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26318-7
Online ISBN: 978-3-031-26319-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

EAI-Stereo: Error Aware Iterative Network for Stereo Matching