research-article

Transformer-based Iterative Update Stereo Matching Network

Authors:
Qun Kong

Computer Science and Technology, Shandong University of Technology, China

Computer Science and Technology, Shandong University of Technology, China

0000-0002-2472-1086
View Profile

,
Liye Zhang

Computer Science and Technology, Shandong University of Technology, China

Computer Science and Technology, Shandong University of Technology, China

0000-0002-4300-1789
View Profile

,
Zhuang Wang

Computer Science and Technology, Shandong University of Technology, China

Computer Science and Technology, Shandong University of Technology, China

0000-0003-2501-0654
View Profile

,
Yegang Li

Computer Science and Technology, Shandong University of Technology, China

Computer Science and Technology, Shandong University of Technology, China

0000-0003-0020-4567
View Profile

,
Bingcai Wei

Computer Science and Technology, Shandong University of Technology, China

Computer Science and Technology, Shandong University of Technology, China

0000-0003-4152-4559
View Profile

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringOctober 2022Pages 345–351https://doi.org/10.1145/3573428.3573487

Published:15 March 2023Publication History

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Pages 345–351

ABSTRACT

Feature extraction is a crucial part of the stereo matching algorithm based on deep learning. The existing stereo matching algorithms have poor matching effects on smaller objects in the background and low-texture areas, which leads to the decrease of disparity estimation accuracy. To improve the accuracy of disparity estimates, we propose TUNet, a Transformer-based iterative update stereo matching network in this paper. After the feature extraction module, the attention mechanism and positional encoding of the Transformer algorithm are added to the network, which enables the transformed features to better combine global context information. In the disparity calculation stage, the improved Gate Recurrent Unit (GRU) module in Recurrent All-Pairs Field Transforms for Optical Flow (RAFT) algorithm is added. In addition, the correlation of each feature is calculated at various resolutions. With correlation lookup and disparity update at the highest resolution and the final disparity map is obtained by GRU cross-up update. Our results show higher accuracy and dense disparity. The matching effect of smaller objects in the background or low-texture areas is improved, as is overall disparity accuracy. The proposed method is trained in ETH3D, Middlebury, and KITTI 2015 datasets. The proposed method minimizes the rate of mismatching and has some robustness when compared to earlier stereo matching algorithms.

References

D. Scharstein and R. Szeliskit, A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Stereo and Multi-Baseline Vision, 2001. Proceedings. IEEE Workshop on 47, 2002, 7-42. DOI: https://doi.org/10.1023/A:1014573219977.Google ScholarDigital Library
V. Ashish, Attention is all you need. Annual Conference on Neural Information Processing Systems, 2017, 5998–6008. DOI: https://doi.org/10.48550/arXiv.1706.03762.Google Scholar
Z. Teed and J. Dengr. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. European Conference on Computer Vision. 2020. DOI: https://doi.org/10.1007/978-3-030-58536-5_24.Google ScholarDigital Library
K. Zhang, J. Lu, and G. Lafruit. Cross-Based Local Stereo Matching Using Orthogonal Integral Images. IEEE Transactions on Circuits and Systems for Video Technology 19, 2016, 1073-1079.Google Scholar
Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. International Journal of Computer. 47, 2022, 229-246.Google Scholar
H. Hirschmüller, P. R. Innocent, and J. Garibaldi. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. International Journal of Computer. 2002, 47, 229-246.Google ScholarDigital Library
A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz. Fast Cost-Volume Filtering for Visual Correspondence and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence. 35, 2013, 504-511.Google ScholarDigital Library
H. Hirschmuller. Accurate and efficient stereo processing by semi-global matching and mutual information. Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 807-814. DOI: https://doi.org/ 10.1109/CVPR.2005.56.Google ScholarDigital Library
J. Žbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. The Journal of Machine Learning Research, 2016, 2287-2318.Google Scholar
N. Mayer, A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Conference on Computer Vision and Pattern Recognition. 2016, 4040-4048. DOI: https://doi.org/ 10.1109/CVPR.2016.438.Google Scholar
J. Pang, W. Sun, J. S. Ren, C. Yang, and Q. Yan. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching. International Conference on Computer Vision Workshops. 2017, 878-886. DOI: https://doi.org/ 10.1109/ICCVW.2017.108.Google Scholar
A. Kendall, End-to-End Learning of Geometry and Context for Deep Stereo Regression. Conference on Computer Vision. 2017, 66-75. DOI: https://doi.org/ 10.1109/ICCV.2017.17.Google Scholar
J. R. Chang and Y. S. Chen. Pyramid Stereo Matching Network. Conference on Computer Vision and Pattern Recognition. 2018, 5410-5418. DOI: https://doi.org/ 10.1109/CVPR.2018.00567.Google Scholar
X. Guo, K. Yang, W. Yang, X. Wang, and H. Li. Group-Wise Correlation Stereo Network. Conference on Computer Vision and Pattern Recognition. 2019, 3268-3277.Google Scholar
H. Xu and J. Zhang. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2022, 1956-1965. DOI: https://doi.org/ 10.1109/CVPR42600.2020.00203.Google Scholar
X. Cheng, Hierarchical Neural Architecture Search for Deep Stereo Matching. Advances in Neural Information Processing Systems. 2022, DOI: https://doi.org/ 10.48550/arXiv.2010.13501.Google Scholar
O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention. 2015, 234-241. DOI: https://doi.org/ 10.1007/978-3-319-24574-4_28.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. Conference on Computer Vision and Pattern Recognition. 2016, 770-778. DOI: https://doi.org/ 10.1109/CVPR.2016.90.Google Scholar
J. Li, Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation. Stereo and Multi-Baseline Vision, 2001. Proceedings. (2022). DOI: https://doi.org/ 10.1109/SMBV.2001.988771.Google Scholar
T. Schöps, A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos. Conference on Computer Vision and Pattern Recognition. 2017, 2538-2547. DOI: https://doi.org/ 10.1109/CVPR.2017.272.Google Scholar
D. Scharstein, High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. German Conference on Pattern Recognition Springer International Publishing. 2014, 31-42. DOI: https://doi.org/ 10.1007/978-3-319-11752-2_3.Google Scholar
M. Menze, C. Heipke, and A. Geiger. JOINT 3D ESTIMATION OF VEHICLES AND SCENE FLOW. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences. 2015, II-3/W5, 427-434.Google Scholar
D. J. Butler, J. Wulff, A naturalistic open source movie for optical flow evaluation. European Conference on Computer Vision. 2012, 611-625. DOI: https://doi.org/ 10.1007/978-3-642-33783-3_ 44.Google ScholarCross Ref
J. Tremblay, T. To, and S. Birchfield. Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation. Conference on Computer Vision and Pattern Recognition Workshops. 2018, 2038-2041. DOI: https://doi.org/ 10.1109/CVPRW.2018.00275.Google Scholar
D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations. 2015. DOI: https://doi.org/ 10.48550/arXiv.1412.6980.Google Scholar
L. Lipson, Z. Teed, and J. Deng. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. 2021 International Conference on 3D Vision. 2021, 218-227. DOI: https://doi.org/ 10.1109/3DV53792.2021.00032.Google Scholar
V. Tankovich, C. Häne, HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2021, 14362-14372. DOI: https://doi.org/ 10.48550/arXiv.2007.12140.Google Scholar
S. Zhelun, D. Yuchao, and R. Zhibo. CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2021, 13906-13915. DOI: https://doi.org/ 10.48550/arXiv.2104.04314.Google Scholar

Index Terms

Transformer-based Iterative Update Stereo Matching Network
1. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods
        Agile software development

Recommendations

Stereo vision using two PTZ cameras

The research of traditional stereo vision is mainly based on static cameras. As PTZ (Pan-Tilt-Zoom) cameras are able to obtain multi-view-angle and multi-resolution information, they have received more and more concern in both research and real ...
Read More
Occlusion-Aware Stereo Matching

Stereo vision systems with additional flash/no-flash cues have been demonstrated to be robust to depth discontinuities. The ratio of a flash and no-flash image pair naturally provides additional scene depth information and thus can serve as a strong cue ...
Read More
Template Based Stereo Matching Using Graph-cut
IMCCC '11: Proceedings of the 2011 First International Conference on Instrumentation, Measurement, Computer, Communication and Control

Stereo matching is a challenging issue in multicamera scene reconstruction, and recently many methods have been proposed in different contexts. This paper presents a template based stereo matching algorithm using graph-cut. First, a set of non-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
October 2022
1999 pages
ISBN:9781450397148
DOI:10.1145/3573428

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
4D Correlation Volumes
Attention Mechanism
GRU
Stereo Matching
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate508of972submissions,52%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 52
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Transformer-based Iterative Update Stereo Matching Network

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stereo vision using two PTZ cameras

Occlusion-Aware Stereo Matching

Template Based Stereo Matching Using Graph-cut

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Transformer-based Iterative Update Stereo Matching Network

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stereo vision using two PTZ cameras

Occlusion-Aware Stereo Matching

Template Based Stereo Matching Using Graph-cut

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media