skip to main content
10.1145/3573428.3573487acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Transformer-based Iterative Update Stereo Matching Network

Published:15 March 2023Publication History

ABSTRACT

Feature extraction is a crucial part of the stereo matching algorithm based on deep learning. The existing stereo matching algorithms have poor matching effects on smaller objects in the background and low-texture areas, which leads to the decrease of disparity estimation accuracy. To improve the accuracy of disparity estimates, we propose TUNet, a Transformer-based iterative update stereo matching network in this paper. After the feature extraction module, the attention mechanism and positional encoding of the Transformer algorithm are added to the network, which enables the transformed features to better combine global context information. In the disparity calculation stage, the improved Gate Recurrent Unit (GRU) module in Recurrent All-Pairs Field Transforms for Optical Flow (RAFT) algorithm is added. In addition, the correlation of each feature is calculated at various resolutions. With correlation lookup and disparity update at the highest resolution and the final disparity map is obtained by GRU cross-up update. Our results show higher accuracy and dense disparity. The matching effect of smaller objects in the background or low-texture areas is improved, as is overall disparity accuracy. The proposed method is trained in ETH3D, Middlebury, and KITTI 2015 datasets. The proposed method minimizes the rate of mismatching and has some robustness when compared to earlier stereo matching algorithms.

References

  1. D. Scharstein and R. Szeliskit, A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Stereo and Multi-Baseline Vision, 2001. Proceedings. IEEE Workshop on 47, 2002, 7-42. DOI: https://doi.org/10.1023/A:1014573219977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Ashish, Attention is all you need. Annual Conference on Neural Information Processing Systems, 2017, 5998–6008. DOI: https://doi.org/10.48550/arXiv.1706.03762.Google ScholarGoogle Scholar
  3. Z. Teed and J. Dengr. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. European Conference on Computer Vision. 2020. DOI: https://doi.org/10.1007/978-3-030-58536-5_24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Zhang, J. Lu, and G. Lafruit. Cross-Based Local Stereo Matching Using Orthogonal Integral Images. IEEE Transactions on Circuits and Systems for Video Technology 19, 2016, 1073-1079.Google ScholarGoogle Scholar
  5. Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. International Journal of Computer. 47, 2022, 229-246.Google ScholarGoogle Scholar
  6. H. Hirschmüller, P. R. Innocent, and J. Garibaldi. Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. International Journal of Computer. 2002, 47, 229-246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz. Fast Cost-Volume Filtering for Visual Correspondence and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence. 35, 2013, 504-511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Hirschmuller. Accurate and efficient stereo processing by semi-global matching and mutual information. Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 807-814. DOI: https://doi.org/ 10.1109/CVPR.2005.56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Žbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. The Journal of Machine Learning Research, 2016, 2287-2318.Google ScholarGoogle Scholar
  10. N. Mayer, A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Conference on Computer Vision and Pattern Recognition. 2016, 4040-4048. DOI: https://doi.org/ 10.1109/CVPR.2016.438.Google ScholarGoogle Scholar
  11. J. Pang, W. Sun, J. S. Ren, C. Yang, and Q. Yan. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching. International Conference on Computer Vision Workshops. 2017, 878-886. DOI: https://doi.org/ 10.1109/ICCVW.2017.108.Google ScholarGoogle Scholar
  12. A. Kendall, End-to-End Learning of Geometry and Context for Deep Stereo Regression. Conference on Computer Vision. 2017, 66-75. DOI: https://doi.org/ 10.1109/ICCV.2017.17.Google ScholarGoogle Scholar
  13. J. R. Chang and Y. S. Chen. Pyramid Stereo Matching Network. Conference on Computer Vision and Pattern Recognition. 2018, 5410-5418. DOI: https://doi.org/ 10.1109/CVPR.2018.00567.Google ScholarGoogle Scholar
  14. X. Guo, K. Yang, W. Yang, X. Wang, and H. Li. Group-Wise Correlation Stereo Network. Conference on Computer Vision and Pattern Recognition. 2019, 3268-3277.Google ScholarGoogle Scholar
  15. H. Xu and J. Zhang. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2022, 1956-1965. DOI: https://doi.org/ 10.1109/CVPR42600.2020.00203.Google ScholarGoogle Scholar
  16. X. Cheng, Hierarchical Neural Architecture Search for Deep Stereo Matching. Advances in Neural Information Processing Systems. 2022, DOI: https://doi.org/ 10.48550/arXiv.2010.13501.Google ScholarGoogle Scholar
  17. O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention. 2015, 234-241. DOI: https://doi.org/ 10.1007/978-3-319-24574-4_28.Google ScholarGoogle Scholar
  18. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. Conference on Computer Vision and Pattern Recognition. 2016, 770-778. DOI: https://doi.org/ 10.1109/CVPR.2016.90.Google ScholarGoogle Scholar
  19. J. Li, Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation. Stereo and Multi-Baseline Vision, 2001. Proceedings. (2022). DOI: https://doi.org/ 10.1109/SMBV.2001.988771.Google ScholarGoogle Scholar
  20. T. Schöps, A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos. Conference on Computer Vision and Pattern Recognition. 2017, 2538-2547. DOI: https://doi.org/ 10.1109/CVPR.2017.272.Google ScholarGoogle Scholar
  21. D. Scharstein, High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. German Conference on Pattern Recognition Springer International Publishing. 2014, 31-42. DOI: https://doi.org/ 10.1007/978-3-319-11752-2_3.Google ScholarGoogle Scholar
  22. M. Menze, C. Heipke, and A. Geiger. JOINT 3D ESTIMATION OF VEHICLES AND SCENE FLOW. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences. 2015, II-3/W5, 427-434.Google ScholarGoogle Scholar
  23. D. J. Butler, J. Wulff, A naturalistic open source movie for optical flow evaluation. European Conference on Computer Vision. 2012, 611-625. DOI: https://doi.org/ 10.1007/978-3-642-33783-3_ 44.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Tremblay, T. To, and S. Birchfield. Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation. Conference on Computer Vision and Pattern Recognition Workshops. 2018, 2038-2041. DOI: https://doi.org/ 10.1109/CVPRW.2018.00275.Google ScholarGoogle Scholar
  25. D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations. 2015. DOI: https://doi.org/ 10.48550/arXiv.1412.6980.Google ScholarGoogle Scholar
  26. L. Lipson, Z. Teed, and J. Deng. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. 2021 International Conference on 3D Vision. 2021, 218-227. DOI: https://doi.org/ 10.1109/3DV53792.2021.00032.Google ScholarGoogle Scholar
  27. V. Tankovich, C. Häne, HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2021, 14362-14372. DOI: https://doi.org/ 10.48550/arXiv.2007.12140.Google ScholarGoogle Scholar
  28. S. Zhelun, D. Yuchao, and R. Zhibo. CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching. Conference on Computer Vision and Pattern Recognition. 2021, 13906-13915. DOI: https://doi.org/ 10.48550/arXiv.2104.04314.Google ScholarGoogle Scholar

Index Terms

  1. Transformer-based Iterative Update Stereo Matching Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
      October 2022
      1999 pages
      ISBN:9781450397148
      DOI:10.1145/3573428

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 March 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate508of972submissions,52%
    • Article Metrics

      • Downloads (Last 12 months)45
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format