skip to main content
10.1145/3503161.3547926acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion

Authors Info & Claims
Published:10 October 2022Publication History

ABSTRACT

Existing self-supervised multi-view stereo (MVS) approaches largely rely on photometric consistency for geometry inference, and hence suffer from low-texture or non-Lambertian appearances. In this paper, we observe that adjacent geometry shares certain commonality that can help to infer the correct geometry of the challenging or low-confident regions. Yet exploiting such property in a non-supervised MVS approach remains challenging for the lacking of training data and necessity of ensuring consistency between views. To address the issues, we propose a novel geometry inference training scheme by selectively masking regions with rich textures, where geometry can be well recovered and used for supervisory signal, and then lead a deliberately designed cost volume completion network to learn how to recover geometry of the masked regions. During inference, we then mask the low-confident regions instead and use the cost volume completion network for geometry correction. To deal with the different depth hypotheses of the cost volume pyramid, we design a three-branch volume inference structure for the completion network. Further, by considering plane as a special geometry, we first identify planar regions from pseudo labels and then correct the low-confident pixels by high-confident labels through plane normal consistency. Extensive experiments on DTU and Tanks & Temples demonstrate the effectiveness of the proposed framework and the state-of-the-art performance.

Skip Supplemental Material Section

Supplemental Material

MM22-fp0750.mp4

mp4

205.8 MB

References

  1. Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision 120, 2 (2016), 153--168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Neill DF Campbell, George Vogiatzis, Carlos Hernández, and Roberto Cipolla. 2008. Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision. Springer, 766--779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, and Hao Su. 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2524--2534.Google ScholarGoogle ScholarCross RefCross Ref
  4. Yuchao Dai, Zhidong Zhu, Zhibo Rao, and Bo Li. 2019. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yasutaka Furukawa and Jean Ponce. 2009. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence 32, 8 (2009), 1362--1376.Google ScholarGoogle Scholar
  6. Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2016. Gipuma: Massively parallel multi-view stereo reconstruction. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e. V 25, 361--369 (2016), 1--2.Google ScholarGoogle Scholar
  7. Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2495--2504.Google ScholarGoogle ScholarCross RefCross Ref
  8. Baichuan Huang, Can Huang, Yijia He, Jingbin Liu, and Xiao Liu. 2020. M? 3VSNet: Unsupervised Multi-metric Multi-view Stereo Network. arXiv preprint arXiv:2005.00363 (2020).Google ScholarGoogle Scholar
  9. Benjamin Irving. 2016. maskSLIC: regional superpixel generation with application to local pathology characterisation in medical images. arXiv preprint arXiv:1606.09518 (2016).Google ScholarGoogle Scholar
  10. Mengqi Ji, Juergen Gall, Haitian Zheng, Yebin Liu, and Lu Fang. 2017. Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision. 2307--2315.Google ScholarGoogle ScholarCross RefCross Ref
  11. Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, and Martial Hebert. 2019. Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv preprint arXiv:1905.02706 (2019).Google ScholarGoogle Scholar
  12. Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Keyang Luo, Tao Guan, Lili Ju, Haipeng Huang, and Yawei Luo. 2019. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE International Conference on Computer Vision. 10452--10461.Google ScholarGoogle ScholarCross RefCross Ref
  14. Keyang Luo, Tao Guan, Lili Ju, Yuesong Wang, Zhuo Chen, and Yawei Luo. 2020. Attention-Aware Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1590--1599.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2019. Significanceaware information bottleneck for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6778-- 6787.Google ScholarGoogle Scholar
  16. Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2020. Adversarial style mining for one-shot unsupervised domain adaptation. Advances in Neural Information Processing Systems 33 (2020), 20612--20623.Google ScholarGoogle Scholar
  17. Yawei Luo, Ping Liu, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2021. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2507--2516.Google ScholarGoogle ScholarCross RefCross Ref
  19. Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. 2019. Selfsupervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 3288--3295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Arijit Mallick, Jörg Stückler, and Hendrik Lensch. 2020. Learning to adapt multiview stereo by self-supervision. arXiv preprint arXiv:2009.13278 (2020).Google ScholarGoogle Scholar
  21. Paul Merrell, Amir Akbarzadeh, Liang Wang, Philippos Mordohai, Jan-Michael Frahm, Ruigang Yang, David Nistér, and Marc Pollefeys. 2007. Real-time visibilitybased fusion of depth maps. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 165--174.Google ScholarGoogle ScholarCross RefCross Ref
  24. Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4104--4113.Google ScholarGoogle ScholarCross RefCross Ref
  25. Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision. Springer, 501--518.Google ScholarGoogle ScholarCross RefCross Ref
  26. Engin Tola, Christoph Strecha, and Pascal Fua. 2012. Efficient large-scale multiview stereo for ultra high-resolution image sets. Machine Vision and Applications 23, 5 (2012), 903--920.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14194--14203.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yuesong Wang, Tao Guan, Zhuo Chen, Yawei Luo, Keyang Luo, and Lili Ju. 2020. Mesh-Guided Multi-View Stereo With Pyramid Architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2039--2048.Google ScholarGoogle ScholarCross RefCross Ref
  29. YuesongWang, Keyang Luo, Zhuo Chen, Lili Ju, and Tao Guan. 2021. DeepFusion: A simple way to improve traditional multi-view stereo methods using deep learning. Knowledge-Based Systems 221 (2021), 106968.Google ScholarGoogle ScholarCross RefCross Ref
  30. Zizhuang Wei, Qingtian Zhu, Chen Min, Yisong Chen, and Guoping Wang. 2021. AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 6167--6176.Google ScholarGoogle Scholar
  31. Alex Wong, Safa Cicek, and Stefano Soatto. 2021. Learning topology from synthetic data for unsupervised depth completion. IEEE Robotics and Automation Letters 6, 2 (2021), 1495--1502.Google ScholarGoogle ScholarCross RefCross Ref
  32. AlexWong, Xiaohan Fei, Byung-Woo Hong, and Stefano Soatto. 2021. An adaptive framework for learning unsupervised depth completion. IEEE Robotics and Automation Letters 6, 2 (2021), 3120--3127.Google ScholarGoogle ScholarCross RefCross Ref
  33. AlexWong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. 2020. Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters 5, 2 (2020), 1899--1906.Google ScholarGoogle ScholarCross RefCross Ref
  34. Alex Wong and Stefano Soatto. 2021. Unsupervised Depth Completion with Calibrated Backprojection Layers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12747--12756.Google ScholarGoogle ScholarCross RefCross Ref
  35. Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, and Qiuxia Wu. 2021. Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data- Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 2. 6.Google ScholarGoogle ScholarCross RefCross Ref
  36. Hongbin Xu, Zhipeng Zhou, YaliWang,Wenxiong Kang, Baigui Sun, Hao Li, and Yu Qiao. 2021. Digging into Uncertainty in Self-supervised Multi-view Stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6078-- 6087.Google ScholarGoogle ScholarCross RefCross Ref
  37. Luoyuan Xu, Yawei Luo, Keyang Luo, YuesongWang, Tao Guan, Zhuo Chen, and Wenkai Liu. 2021. Exploiting the Structure Information of Suppositional Mesh for Unsupervised Multi-View Stereo. IEEE MultiMedia (2021).Google ScholarGoogle Scholar
  38. Qingshan Xu and Wenbing Tao. 2020. Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12508--12515.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jianfeng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, and Yu-Wing Tai. 2020. Dense hybrid recurrent multiview stereo net with dynamic consistency checking. In European Conference on Computer Vision. Springer, 674--689.Google ScholarGoogle Scholar
  40. Jiayu Yang, Jose M Alvarez, and Miaomiao Liu. 2021. Self-supervised Learning of Depth Inference for Multi-view Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7526--7534.Google ScholarGoogle ScholarCross RefCross Ref
  41. Jiayu Yang, Wei Mao, Jose M Alvarez, and Miaomiao Liu. 2020. Cost Volume Pyramid Based Depth Inference for Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4877--4886.Google ScholarGoogle ScholarCross RefCross Ref
  42. Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665 (2017).Google ScholarGoogle Scholar
  43. Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV). 767--783.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5525--5534.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '22: Proceedings of the 30th ACM International Conference on Multimedia
      October 2022
      7537 pages
      ISBN:9781450392037
      DOI:10.1145/3503161

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)123
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader