research-article

Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching

Authors:

Yongfang XieAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 4197 - 4206

https://doi.org/10.1145/3664647.3681372

Published: 28 October 2024 Publication History

Abstract

Stereo matching is a pivotal technique for depth estimation and has been popularly applied in various computer vision tasks. Although many related methods have been reported recently, they still face some challenges such as significant disparity variations at object boundaries, difficult prediction at large disparity regions, and suboptimal generalization when label distribution varies between source and target domains. Therefore, we propose a stereo-matching model (i.e., Eglcr) that utilizes edge structure information and multi-scale matching similarity features for better disparity estimation. First, we use a lightweight network to predict the initial disparity. Then, we develop a multi-scale similarity feature extraction module, incorporating adaptive attention mechanisms, to capture the fusion similarity information of stereo images across various scales. Meanwhile, we introduce an edge structure-aware module that features an iteratively optimized disparity map and a scale attention factor, aimed at accurately delineating edge information in complex scenes. After that, we employ an iterative strategy for disparity estimation, guided by the fusion similarity features across multiple scales and the detailed edge structure information. We conduct abundant experiments on some popular stereo matching datasets including Middlebury, KITTI, ETH3D, and Scene Flow. The results show that our proposed Eglcr achieves state-of-the-art performance both in accuracy and generalization. Our code is available at https://github.com/kangarooCV/Eglcr.

References

[1]

Jia-Ren Chang and Yong-Sheng Chen. 2018. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5410--5418.

[2]

Liyan Chen,WeihanWang, and Philippos Mordohai. 2023. Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17235--17244.

[3]

Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Hongdong Li, Tom Drummond, and Zongyuan Ge. 2020. Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems 33 (2020), 22158--22169.

[4]

Yuchao Dai, Zhidong Zhu, Zhibo Rao, and Bo Li. 2019. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV). Ieee, 1--8.

[5]

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[6]

Yang et al. 2022. Edge supervision and multi-scale cost volume for stereo matching. Image and Vision Computing (2022).

[7]

Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, and Hongsheng Li. 2019. Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3273--3282.

[8]

Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part VII 13. Springer, 345--360.

[9]

Heiko Hirschmuller. 2005. Accurate and efficient stereo processing by semi-global matching and mutual information. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 2. IEEE, 807--814.

Digital Library

[10]

Baichuan Huang, Hongwei Yi, Can Huang, Yijia He, Jingbin Liu, and Xiao Liu. 2021. M3VSNet: Unsupervised multi-metric multi-view stereo network. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 3163--3167.

[11]

Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry. 2017. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision. 66--75.

[12]

Christian Kerl, Jürgen Sturm, and Daniel Cremers. 2013. Robust odometry estimation for RGB-D cameras. In 2013 IEEE international conference on robotics and automation. IEEE, 3748--3754.

[13]

Andreas Klaus, Mario Sormann, and Konrad Karner. 2006. Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In 18th International Conference on Pattern Recognition (ICPR'06), Vol. 3. IEEE, 15--18.

Digital Library

[14]

Vladimir Kolmogorov and Ramin Zabih. 2001. Computing visual correspondence with occlusions using graph cuts. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2. IEEE, 508--515.

[15]

Jiankun Li, PeisenWang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, and Shuaicheng Liu. 2022. Practical stereo matching via cascaded recurrent network with adaptive correlation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16263--16272.

[16]

Zhengfa Liang, Yulan Guo, Yiliu Feng, Wei Chen, Linbo Qiao, Li Zhou, Jianfeng Zhang, and Hengzhu Liu. 2019. Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 300--315.

[17]

Lahav Lipson, Zachary Teed, and Jia Deng. 2021. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV). IEEE, 218--227.

[18]

Biyang Liu, Huimin Yu, and Yangqi Long. 2022. Local similarity pattern and cost self-reassembling for deep stereo matching networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1647--1655.

[19]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[20]

Jin Luo, Zhaohui Tang, Hu Zhang, Ying Fan, Yongfang Xie, and Weihua Gui. 2023. A Binocular Camera Calibration Method in Froth Flotation Based on Key Frame Sequences and Weighted Normalized Tilt Difference. IEEE Transactions on Circuits and Systems for Video Technology (2023).

Digital Library

[21]

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4040--4048.

[22]

Moritz Menze, Christian Heipke, and Andreas Geiger. 2015. Joint 3d estimation of vehicles and scene flow. ISPRS annals of the photogrammetry, remote sensing and spatial information sciences 2 (2015), 427.

[23]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.

[24]

Daniel Scharstein, Heiko Hirschmüller, York Kitajima, Greg Krathwohl, Xi Wang, and Porter Westling. 2014. High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2--5, 2014, Proceedings 36. Springer, 31--42.

[25]

Thomas Schops, Johannes L Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3260--3269.

[26]

Zhelun Shen, Yuchao Dai, and Zhibo Rao. 2021. Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13906--13915.

[27]

Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, and Liangjun Zhang. 2022. PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching. In European Conference on Computer Vision(ECCV).

[28]

Xiao Song, Guorun Yang, Xinge Zhu, Hui Zhou, Zhe Wang, and Jianping Shi. 2021. AdaStereo: a simple and efficient approach for adaptive stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10328--10337.

[29]

Xiao Song, Xu Zhao, Liangji Fang, Hanwen Hu, and Yizhou Yu. 2020. Edgestereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision 128, 4 (2020), 910--930.

Digital Library

[30]

Vladimir Tankovich, Christian Hane, Yinda Zhang, Adarsh Kowdle, Sean Fanello, and Sofien Bouaziz. 2021. Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14362--14372.

[31]

Fangjinhua Wang, Silvano Galliani, Christoph Vogel, and Marc Pollefeys. 2022. IterMVS: Iterative probability estimation for efficient multi-view stereo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8606--8615.

[32]

Philippe Weinzaepfel, Thomas Lucas, Vincent Leroy, Yohann Cabon, Vaibhav Arora, Romain Brégier, Gabriela Csurka, Leonid Antsfeld, Boris Chidlovskii, and Jérôme Revaud. 2023. CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow. In ICCV.

[33]

ZhenyaoWu, XinyiWu, Xiaoping Zhang, SongWang, and Lili Ju. 2019. Semantic stereo matching with pyramid cost volumes. In Proceedings of the IEEE/CVF international conference on computer vision. 7484--7493.

[34]

Bin Xu, Yuhua Xu, Xiaoli Yang, Wei Jia, and Yulan Guo. 2021. Bilateral grid learning for stereo matching networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12497--12506.

[35]

Gangwei Xu, Junda Cheng, Peng Guo, and Xin Yang. 2022. Attention Concatenation Volume for Accurate and Efficient Stereo Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12981--12990.

[36]

Gangwei Xu, Xianqi Wang, Xiaohuan Ding, and Xin Yang. 2023. Iterative Geometry Encoding Volume for Stereo Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21919--21928.

[37]

Haofei Xu and Juyong Zhang. 2020. Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1959--1968.

[38]

Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, and Andreas Geiger. 2023. Unifying flow, stereo and depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[39]

Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, and Qiuxia Wu. 2021. Self-supervised multi-view stereo via effective co-segmentation and dataaugmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3030--3038.

[40]

Gengshan Yang, Joshua Manela, Michael Happold, and Deva Ramanan. 2019. Hierarchical deep stereo matching on high-resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5515--5524.

[41]

Jiayu Yang, Jose M Alvarez, and Miaomiao Liu. 2021. Self-supervised learning of depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7526--7534.

[42]

Chengtang Yao, Yunde Jia, Huijun Di, Pengxiang Li, and Yuwei Wu. 2021. A decomposition model for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6091--6100.

[43]

Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip Torr. 2019. GA-Net: Guided Aggregation Net for End-to-end Stereo Matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]

Feihu Zhang, Xiaojuan Qi, Ruigang Yang, Victor Prisacariu, Benjamin Wah, and Philip Torr. 2020. Domain-invariant Stereo Matching Networks. In Europe Conference on Computer Vision (ECCV).

[45]

Ke Zhang, Jiangbo Lu, and Gauthier Lafruit. 2009. Cross-based local stereo matching using orthogonal integral images. IEEE transactions on circuits and systems for video technology 19, 7 (2009), 1073--1079.

Digital Library

[46]

Haoliang Zhao, Huizhou Zhou, Yongjun Zhang, Jie Chen, Yitong Yang, and Yong Zhao. 2023. High-frequency stereo matching network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1327--1336.

[47]

Zihua Zheng, Ni Nie, Zhi Ling, Pengfei Xiong, Jiangyu Liu, Hao Wang, and Jiankun Li. 2022. Dip: Deep inverse patchmatch for high-resolution optical flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8925--8934.

Index Terms

Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Adaptive descriptor-based robust stereo matching under radiometric changes

An effective matching scheme for robust stereo matching under various radiometric changes.A content adaptive descriptor-based approach to effectively reflect image contents.Entropy-based energy function guiding for weighting the elements of the ...
Efficient Disparity Map Generation Using Stereo and Time-of-Flight Depth Cameras
Proceedings, Part II, of the 16th Pacific-Rim Conference on Advances in Multimedia Information Processing -- PCM 2015 - Volume 9315

Three-dimensional content 3D creation has received a lot of attention due to numerous successes of 3D entertainment. Accurate estimation of depth information is necessary for efficient 3D content creation. In this paper, we propose a disparity map ...
Transformer-based Iterative Update Stereo Matching Network
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Feature extraction is a crucial part of the stereo matching algorithm based on deep learning. The existing stereo matching algorithms have poor matching effects on smaller objects in the background and low-texture areas, which leads to the decrease of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
112
Total Downloads

Downloads (Last 12 months)112
Downloads (Last 6 weeks)58

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten