ABSTRACT
Feature extraction is a crucial part of the stereo matching algorithm based on deep learning. The existing stereo matching algorithms have poor matching effects on smaller objects in the background and low-texture areas, which leads to the decrease of disparity estimation accuracy. To improve the accuracy of disparity estimates, we propose TUNet, a Transformer-based iterative update stereo matching network in this paper. After the feature extraction module, the attention mechanism and positional encoding of the Transformer algorithm are added to the network, which enables the transformed features to better combine global context information. In the disparity calculation stage, the improved Gate Recurrent Unit (GRU) module in Recurrent All-Pairs Field Transforms for Optical Flow (RAFT) algorithm is added. In addition, the correlation of each feature is calculated at various resolutions. With correlation lookup and disparity update at the highest resolution and the final disparity map is obtained by GRU cross-up update. Our results show higher accuracy and dense disparity. The matching effect of smaller objects in the background or low-texture areas is improved, as is overall disparity accuracy. The proposed method is trained in ETH3D, Middlebury, and KITTI 2015 datasets. The proposed method minimizes the rate of mismatching and has some robustness when compared to earlier stereo matching algorithms.
- D. Scharstein and R. Szeliskit, A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Stereo and Multi-Baseline Vision, 2001. Proceedings. IEEE Workshop on 47, 2002, 7-42. DOI: https://doi.org/10.1023/A:1014573219977.Google ScholarDigital Library
- V. Ashish, Attention is all you need. Annual Conference on Neural Information Processing Systems, 2017, 5998–6008. DOI: https://doi.org/10.48550/arXiv.1706.03762.Google Scholar
- Z. Teed and J. Dengr. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. European Conference on Computer Vision. 2020. DOI: https://doi.org/10.1007/978-3-030-58536-5_24.Google ScholarDigital Library
- K. Zhang, J. Lu, and G. Lafruit. Cross-Based Local Stereo Matching Using Orthogonal Integral Images. IEEE Transactions on Circuits and Systems for Video Technology 19, 2016, 1073-1079.Google Scholar
- Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. International Journal of Computer. 47, 2022, 229-246.Google Scholar
- H. Hirschmüller, P. R. Innocent, and J. Garibaldi. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. International Journal of Computer. 2002, 47, 229-246.Google ScholarDigital Library
- A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz. Fast Cost-Volume Filtering for Visual Correspondence and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence. 35, 2013, 504-511.Google ScholarDigital Library
- H. Hirschmuller. Accurate and efficient stereo processing by semi-global matching and mutual information. Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 807-814. DOI: https://doi.org/ 10.1109/CVPR.2005.56.Google ScholarDigital Library
- J. Žbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. The Journal of Machine Learning Research, 2016, 2287-2318.Google Scholar
- N. Mayer, A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Conference on Computer Vision and Pattern Recognition. 2016, 4040-4048. DOI: https://doi.org/ 10.1109/CVPR.2016.438.Google Scholar
- J. Pang, W. Sun, J. S. Ren, C. Yang, and Q. Yan. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching. International Conference on Computer Vision Workshops. 2017, 878-886. DOI: https://doi.org/ 10.1109/ICCVW.2017.108.Google Scholar
- A. Kendall, End-to-End Learning of Geometry and Context for Deep Stereo Regression. Conference on Computer Vision. 2017, 66-75. DOI: https://doi.org/ 10.1109/ICCV.2017.17.Google Scholar
- J. R. Chang and Y. S. Chen. Pyramid Stereo Matching Network. Conference on Computer Vision and Pattern Recognition. 2018, 5410-5418. DOI: https://doi.org/ 10.1109/CVPR.2018.00567.Google Scholar
- X. Guo, K. Yang, W. Yang, X. Wang, and H. Li. Group-Wise Correlation Stereo Network. Conference on Computer Vision and Pattern Recognition. 2019, 3268-3277.Google Scholar
- H. Xu and J. Zhang. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2022, 1956-1965. DOI: https://doi.org/ 10.1109/CVPR42600.2020.00203.Google Scholar
- X. Cheng, Hierarchical Neural Architecture Search for Deep Stereo Matching. Advances in Neural Information Processing Systems. 2022, DOI: https://doi.org/ 10.48550/arXiv.2010.13501.Google Scholar
- O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention. 2015, 234-241. DOI: https://doi.org/ 10.1007/978-3-319-24574-4_28.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. Conference on Computer Vision and Pattern Recognition. 2016, 770-778. DOI: https://doi.org/ 10.1109/CVPR.2016.90.Google Scholar
- J. Li, Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation. Stereo and Multi-Baseline Vision, 2001. Proceedings. (2022). DOI: https://doi.org/ 10.1109/SMBV.2001.988771.Google Scholar
- T. Schöps, A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos. Conference on Computer Vision and Pattern Recognition. 2017, 2538-2547. DOI: https://doi.org/ 10.1109/CVPR.2017.272.Google Scholar
- D. Scharstein, High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. German Conference on Pattern Recognition Springer International Publishing. 2014, 31-42. DOI: https://doi.org/ 10.1007/978-3-319-11752-2_3.Google Scholar
- M. Menze, C. Heipke, and A. Geiger. JOINT 3D ESTIMATION OF VEHICLES AND SCENE FLOW. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences. 2015, II-3/W5, 427-434.Google Scholar
- D. J. Butler, J. Wulff, A naturalistic open source movie for optical flow evaluation. European Conference on Computer Vision. 2012, 611-625. DOI: https://doi.org/ 10.1007/978-3-642-33783-3_ 44.Google ScholarCross Ref
- J. Tremblay, T. To, and S. Birchfield. Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation. Conference on Computer Vision and Pattern Recognition Workshops. 2018, 2038-2041. DOI: https://doi.org/ 10.1109/CVPRW.2018.00275.Google Scholar
- D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations. 2015. DOI: https://doi.org/ 10.48550/arXiv.1412.6980.Google Scholar
- L. Lipson, Z. Teed, and J. Deng. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. 2021 International Conference on 3D Vision. 2021, 218-227. DOI: https://doi.org/ 10.1109/3DV53792.2021.00032.Google Scholar
- V. Tankovich, C. Häne, HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2021, 14362-14372. DOI: https://doi.org/ 10.48550/arXiv.2007.12140.Google Scholar
- S. Zhelun, D. Yuchao, and R. Zhibo. CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2021, 13906-13915. DOI: https://doi.org/ 10.48550/arXiv.2104.04314.Google Scholar
Index Terms
- Transformer-based Iterative Update Stereo Matching Network
Recommendations
Stereo vision using two PTZ cameras
The research of traditional stereo vision is mainly based on static cameras. As PTZ (Pan-Tilt-Zoom) cameras are able to obtain multi-view-angle and multi-resolution information, they have received more and more concern in both research and real ...
Occlusion-Aware Stereo Matching
Stereo vision systems with additional flash/no-flash cues have been demonstrated to be robust to depth discontinuities. The ratio of a flash and no-flash image pair naturally provides additional scene depth information and thus can serve as a strong cue ...
Template Based Stereo Matching Using Graph-cut
IMCCC '11: Proceedings of the 2011 First International Conference on Instrumentation, Measurement, Computer, Communication and ControlStereo matching is a challenging issue in multicamera scene reconstruction, and recently many methods have been proposed in different contexts. This paper presents a template based stereo matching algorithm using graph-cut. First, a set of non-...
Comments