skip to main content
10.1145/3664647.3681372acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching

Published: 28 October 2024 Publication History

Abstract

Stereo matching is a pivotal technique for depth estimation and has been popularly applied in various computer vision tasks. Although many related methods have been reported recently, they still face some challenges such as significant disparity variations at object boundaries, difficult prediction at large disparity regions, and suboptimal generalization when label distribution varies between source and target domains. Therefore, we propose a stereo-matching model (i.e., Eglcr) that utilizes edge structure information and multi-scale matching similarity features for better disparity estimation. First, we use a lightweight network to predict the initial disparity. Then, we develop a multi-scale similarity feature extraction module, incorporating adaptive attention mechanisms, to capture the fusion similarity information of stereo images across various scales. Meanwhile, we introduce an edge structure-aware module that features an iteratively optimized disparity map and a scale attention factor, aimed at accurately delineating edge information in complex scenes. After that, we employ an iterative strategy for disparity estimation, guided by the fusion similarity features across multiple scales and the detailed edge structure information. We conduct abundant experiments on some popular stereo matching datasets including Middlebury, KITTI, ETH3D, and Scene Flow. The results show that our proposed Eglcr achieves state-of-the-art performance both in accuracy and generalization. Our code is available at https://github.com/kangarooCV/Eglcr.

References

[1]
Jia-Ren Chang and Yong-Sheng Chen. 2018. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5418.
[2]
Liyan Chen,WeihanWang, and Philippos Mordohai. 2023. Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17235--17244.
[3]
Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Hongdong Li, Tom Drummond, and Zongyuan Ge. 2020. Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems 33 (2020), 22158--22169.
[4]
Yuchao Dai, Zhidong Zhu, Zhibo Rao, and Bo Li. 2019. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV). Ieee, 1--8.
[5]
Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[6]
Yang et al. 2022. Edge supervision and multi-scale cost volume for stereo matching. Image and Vision Computing (2022).
[7]
Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, and Hongsheng Li. 2019. Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3273--3282.
[8]
Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part VII 13. Springer, 345--360.
[9]
Heiko Hirschmuller. 2005. Accurate and efficient stereo processing by semi-global matching and mutual information. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 2. IEEE, 807--814.
[10]
Baichuan Huang, Hongwei Yi, Can Huang, Yijia He, Jingbin Liu, and Xiao Liu. 2021. M3VSNet: Unsupervised multi-metric multi-view stereo network. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 3163--3167.
[11]
Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. 2017. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision. 66--75.
[12]
Christian Kerl, Jürgen Sturm, and Daniel Cremers. 2013. Robust odometry estimation for RGB-D cameras. In 2013 IEEE international conference on robotics and automation. IEEE, 3748--3754.
[13]
Andreas Klaus, Mario Sormann, and Konrad Karner. 2006. Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In 18th International Conference on Pattern Recognition (ICPR'06), Vol. 3. IEEE, 15--18.
[14]
Vladimir Kolmogorov and Ramin Zabih. 2001. Computing visual correspondence with occlusions using graph cuts. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2. IEEE, 508--515.
[15]
Jiankun Li, PeisenWang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. 2022. Practical stereo matching via cascaded recurrent network with adaptive correlation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16263--16272.
[16]
Zhengfa Liang, Yulan Guo, Yiliu Feng, Wei Chen, Linbo Qiao, Li Zhou, Jianfeng Zhang, and Hengzhu Liu. 2019. Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 300--315.
[17]
Lahav Lipson, Zachary Teed, and Jia Deng. 2021. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV). IEEE, 218--227.
[18]
Biyang Liu, Huimin Yu, and Yangqi Long. 2022. Local similarity pattern and cost self-reassembling for deep stereo matching networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1647--1655.
[19]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
[20]
Jin Luo, Zhaohui Tang, Hu Zhang, Ying Fan, Yongfang Xie, and Weihua Gui. 2023. A Binocular Camera Calibration Method in Froth Flotation Based on Key Frame Sequences and Weighted Normalized Tilt Difference. IEEE Transactions on Circuits and Systems for Video Technology (2023).
[21]
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040--4048.
[22]
Moritz Menze, Christian Heipke, and Andreas Geiger. 2015. Joint 3d estimation of vehicles and scene flow. ISPRS annals of the photogrammetry, remote sensing and spatial information sciences 2 (2015), 427.
[23]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.
[24]
Daniel Scharstein, Heiko Hirschmüller, York Kitajima, Greg Krathwohl, Xi Wang, and Porter Westling. 2014. High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2--5, 2014, Proceedings 36. Springer, 31--42.
[25]
Thomas Schops, Johannes L Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3260--3269.
[26]
Zhelun Shen, Yuchao Dai, and Zhibo Rao. 2021. Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13906--13915.
[27]
Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, and Liangjun Zhang. 2022. PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching. In European Conference on Computer Vision(ECCV).
[28]
Xiao Song, Guorun Yang, Xinge Zhu, Hui Zhou, Zhe Wang, and Jianping Shi. 2021. AdaStereo: a simple and efficient approach for adaptive stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10328--10337.
[29]
Xiao Song, Xu Zhao, Liangji Fang, Hanwen Hu, and Yizhou Yu. 2020. Edgestereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision 128, 4 (2020), 910--930.
[30]
Vladimir Tankovich, Christian Hane, Yinda Zhang, Adarsh Kowdle, Sean Fanello, and Sofien Bouaziz. 2021. Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14362--14372.
[31]
Fangjinhua Wang, Silvano Galliani, Christoph Vogel, and Marc Pollefeys. 2022. IterMVS: Iterative probability estimation for efficient multi-view stereo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8606--8615.
[32]
Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, and Jérôme Revaud. 2023. CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow. In ICCV.
[33]
ZhenyaoWu, XinyiWu, Xiaoping Zhang, SongWang, and Lili Ju. 2019. Semantic stereo matching with pyramid cost volumes. In Proceedings of the IEEE/CVF international conference on computer vision. 7484--7493.
[34]
Bin Xu, Yuhua Xu, Xiaoli Yang, Wei Jia, and Yulan Guo. 2021. Bilateral grid learning for stereo matching networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12497--12506.
[35]
Gangwei Xu, Junda Cheng, Peng Guo, and Xin Yang. 2022. Attention Concatenation Volume for Accurate and Efficient Stereo Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12981--12990.
[36]
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. 2023. Iterative Geometry Encoding Volume for Stereo Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21919--21928.
[37]
Haofei Xu and Juyong Zhang. 2020. Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1959--1968.
[38]
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, and Andreas Geiger. 2023. Unifying flow, stereo and depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[39]
Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, and Qiuxia Wu. 2021. Self-supervised multi-view stereo via effective co-segmentation and dataaugmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3030--3038.
[40]
Gengshan Yang, Joshua Manela, Michael Happold, and Deva Ramanan. 2019. Hierarchical deep stereo matching on high-resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5515--5524.
[41]
Jiayu Yang, Jose M Alvarez, and Miaomiao Liu. 2021. Self-supervised learning of depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7526--7534.
[42]
Chengtang Yao, Yunde Jia, Huijun Di, Pengxiang Li, and Yuwei Wu. 2021. A decomposition model for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6091--6100.
[43]
Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip Torr. 2019. GA-Net: Guided Aggregation Net for End-to-end Stereo Matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Feihu Zhang, Xiaojuan Qi, Ruigang Yang, Victor Prisacariu, Benjamin Wah, and Philip Torr. 2020. Domain-invariant Stereo Matching Networks. In Europe Conference on Computer Vision (ECCV).
[45]
Ke Zhang, Jiangbo Lu, and Gauthier Lafruit. 2009. Cross-based local stereo matching using orthogonal integral images. IEEE transactions on circuits and systems for video technology 19, 7 (2009), 1073--1079.
[46]
Haoliang Zhao, Huizhou Zhou, Yongjun Zhang, Jie Chen, Yitong Yang, and Yong Zhao. 2023. High-frequency stereo matching network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1327--1336.
[47]
Zihua Zheng, Ni Nie, Zhi Ling, Pengfei Xiong, Jiangyu Liu, Hao Wang, and Jiankun Li. 2022. Dip: Deep inverse patchmatch for high-resolution optical flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8925--8934.

Index Terms

  1. Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention mechanism
    2. depth estimation
    3. edge estimation
    4. feature extraction
    5. stereo matching

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 112
      Total Downloads
    • Downloads (Last 12 months)112
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media