research-article

Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion

Authors:
Luoyuan Xu

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Tao Guan

chool of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

chool of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Yuesong Wang

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Yawei Luo

School of Computer Science and Technology, Zhejiang University, Hangzhou, China

School of Computer Science and Technology, Zhejiang University, Hangzhou, China
View Profile

,
Zhuo Chen

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Wenkai Liu

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Wei Yang

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 2202–2210https://doi.org/10.1145/3503161.3547926

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 2202–2210

ABSTRACT

Existing self-supervised multi-view stereo (MVS) approaches largely rely on photometric consistency for geometry inference, and hence suffer from low-texture or non-Lambertian appearances. In this paper, we observe that adjacent geometry shares certain commonality that can help to infer the correct geometry of the challenging or low-confident regions. Yet exploiting such property in a non-supervised MVS approach remains challenging for the lacking of training data and necessity of ensuring consistency between views. To address the issues, we propose a novel geometry inference training scheme by selectively masking regions with rich textures, where geometry can be well recovered and used for supervisory signal, and then lead a deliberately designed cost volume completion network to learn how to recover geometry of the masked regions. During inference, we then mask the low-confident regions instead and use the cost volume completion network for geometry correction. To deal with the different depth hypotheses of the cost volume pyramid, we design a three-branch volume inference structure for the completion network. Further, by considering plane as a special geometry, we first identify planar regions from pseudo labels and then correct the low-confident pixels by high-confident labels through plane normal consistency. Extensive experiments on DTU and Tanks & Temples demonstrate the effectiveness of the proposed framework and the state-of-the-art performance.

Supplemental Material

MM22-fp0750.mp4

mp4

205.8 MB

Download

References

Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision 120, 2 (2016), 153--168.Google ScholarDigital Library
Neill DF Campbell, George Vogiatzis, Carlos Hernández, and Roberto Cipolla. 2008. Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision. Springer, 766--779.Google ScholarDigital Library
Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, and Hao Su. 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2524--2534.Google ScholarCross Ref
Yuchao Dai, Zhidong Zhu, Zhibo Rao, and Bo Li. 2019. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV). IEEE, 1--8.Google ScholarCross Ref
Yasutaka Furukawa and Jean Ponce. 2009. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence 32, 8 (2009), 1362--1376.Google Scholar
Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2016. Gipuma: Massively parallel multi-view stereo reconstruction. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e. V 25, 361--369 (2016), 1--2.Google Scholar
Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2495--2504.Google ScholarCross Ref
Baichuan Huang, Can Huang, Yijia He, Jingbin Liu, and Xiao Liu. 2020. M? 3VSNet: Unsupervised Multi-metric Multi-view Stereo Network. arXiv preprint arXiv:2005.00363 (2020).Google Scholar
Benjamin Irving. 2016. maskSLIC: regional superpixel generation with application to local pathology characterisation in medical images. arXiv preprint arXiv:1606.09518 (2016).Google Scholar
Mengqi Ji, Juergen Gall, Haitian Zheng, Yebin Liu, and Lu Fang. 2017. Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision. 2307--2315.Google ScholarCross Ref
Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, and Martial Hebert. 2019. Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv preprint arXiv:1905.02706 (2019).Google Scholar
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1--13.Google ScholarDigital Library
Keyang Luo, Tao Guan, Lili Ju, Haipeng Huang, and Yawei Luo. 2019. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE International Conference on Computer Vision. 10452--10461.Google ScholarCross Ref
Keyang Luo, Tao Guan, Lili Ju, Yuesong Wang, Zhuo Chen, and Yawei Luo. 2020. Attention-Aware Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1590--1599.Google ScholarCross Ref
Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2019. Significanceaware information bottleneck for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6778-- 6787.Google Scholar
Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2020. Adversarial style mining for one-shot unsupervised domain adaptation. Advances in Neural Information Processing Systems 33 (2020), 20612--20623.Google Scholar
Yawei Luo, Ping Liu, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2021. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarDigital Library
Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2507--2516.Google ScholarCross Ref
Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. 2019. Selfsupervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 3288--3295.Google ScholarDigital Library
Arijit Mallick, Jörg Stückler, and Hendrik Lensch. 2020. Learning to adapt multiview stereo by self-supervision. arXiv preprint arXiv:2009.13278 (2020).Google Scholar
Paul Merrell, Amir Akbarzadeh, Liang Wang, Philippos Mordohai, Jan-Michael Frahm, Ruigang Yang, David Nistér, and Marc Pollefeys. 2007. Real-time visibilitybased fusion of depth maps. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.Google ScholarCross Ref
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405--421.Google ScholarDigital Library
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 165--174.Google ScholarCross Ref
Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4104--4113.Google ScholarCross Ref
Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision. Springer, 501--518.Google ScholarCross Ref
Engin Tola, Christoph Strecha, and Pascal Fua. 2012. Efficient large-scale multiview stereo for ultra high-resolution image sets. Machine Vision and Applications 23, 5 (2012), 903--920.Google ScholarDigital Library
Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14194--14203.Google ScholarCross Ref
Yuesong Wang, Tao Guan, Zhuo Chen, Yawei Luo, Keyang Luo, and Lili Ju. 2020. Mesh-Guided Multi-View Stereo With Pyramid Architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2039--2048.Google ScholarCross Ref
YuesongWang, Keyang Luo, Zhuo Chen, Lili Ju, and Tao Guan. 2021. DeepFusion: A simple way to improve traditional multi-view stereo methods using deep learning. Knowledge-Based Systems 221 (2021), 106968.Google ScholarCross Ref
Zizhuang Wei, Qingtian Zhu, Chen Min, Yisong Chen, and Guoping Wang. 2021. AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 6167--6176.Google Scholar
Alex Wong, Safa Cicek, and Stefano Soatto. 2021. Learning topology from synthetic data for unsupervised depth completion. IEEE Robotics and Automation Letters 6, 2 (2021), 1495--1502.Google ScholarCross Ref
AlexWong, Xiaohan Fei, Byung-Woo Hong, and Stefano Soatto. 2021. An adaptive framework for learning unsupervised depth completion. IEEE Robotics and Automation Letters 6, 2 (2021), 3120--3127.Google ScholarCross Ref
AlexWong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. 2020. Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters 5, 2 (2020), 1899--1906.Google ScholarCross Ref
Alex Wong and Stefano Soatto. 2021. Unsupervised Depth Completion with Calibrated Backprojection Layers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12747--12756.Google ScholarCross Ref
Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, and Qiuxia Wu. 2021. Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data- Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 2. 6.Google ScholarCross Ref
Hongbin Xu, Zhipeng Zhou, YaliWang,Wenxiong Kang, Baigui Sun, Hao Li, and Yu Qiao. 2021. Digging into Uncertainty in Self-supervised Multi-view Stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6078-- 6087.Google ScholarCross Ref
Luoyuan Xu, Yawei Luo, Keyang Luo, YuesongWang, Tao Guan, Zhuo Chen, and Wenkai Liu. 2021. Exploiting the Structure Information of Suppositional Mesh for Unsupervised Multi-View Stereo. IEEE MultiMedia (2021).Google Scholar
Qingshan Xu and Wenbing Tao. 2020. Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12508--12515.Google ScholarCross Ref
Jianfeng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, and Yu-Wing Tai. 2020. Dense hybrid recurrent multiview stereo net with dynamic consistency checking. In European Conference on Computer Vision. Springer, 674--689.Google Scholar
Jiayu Yang, Jose M Alvarez, and Miaomiao Liu. 2021. Self-supervised Learning of Depth Inference for Multi-view Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7526--7534.Google ScholarCross Ref
Jiayu Yang, Wei Mao, Jose M Alvarez, and Miaomiao Liu. 2020. Cost Volume Pyramid Based Depth Inference for Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4877--4886.Google ScholarCross Ref
Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665 (2017).Google Scholar
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV). 767--783.Google ScholarDigital Library
Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5525--5534.Google ScholarCross Ref

Index Terms

Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Self-supervised Multi-view Stereo via Inter and Intra Network Pseudo Depth
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Recent self-supervised learning-based multi-view stereo (MVS) approaches have shown promising results. However, previous methods primarily utilize view synthesis as the replacement for costly ground-truth depth data to guide network learning, still ...
Read More
Multi-view photometric stereo using surface deformation

This paper presents a hybrid approach for 3D reconstruction by fusing photometric stereo and multi-view stereo. The 3D surface is obtained by capturing a set of images taken from different viewpoints under time-varying illuminations. Key factors in the ...
Read More
A DAISY descriptor based multi-view stereo method for large-scale scenes

Display Omitted An improved patch based MVS method by a novel photometric discrepancy function.Proposed a new corresponding point matching method based on the DAISY descriptor.Proposed a new photometric discrepancy function based on DAISY descriptor. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adjacent geometry guided inference
cost volume completion
multi-view stereo
self-supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 259
  Total Downloads
- Downloads (Last 12 months)123
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Self-supervised Multi-view Stereo via Inter and Intra Network Pseudo Depth

Multi-view photometric stereo using surface deformation

A DAISY descriptor based multi-view stereo method for large-scale scenes