ABSTRACT
Existing self-supervised multi-view stereo (MVS) approaches largely rely on photometric consistency for geometry inference, and hence suffer from low-texture or non-Lambertian appearances. In this paper, we observe that adjacent geometry shares certain commonality that can help to infer the correct geometry of the challenging or low-confident regions. Yet exploiting such property in a non-supervised MVS approach remains challenging for the lacking of training data and necessity of ensuring consistency between views. To address the issues, we propose a novel geometry inference training scheme by selectively masking regions with rich textures, where geometry can be well recovered and used for supervisory signal, and then lead a deliberately designed cost volume completion network to learn how to recover geometry of the masked regions. During inference, we then mask the low-confident regions instead and use the cost volume completion network for geometry correction. To deal with the different depth hypotheses of the cost volume pyramid, we design a three-branch volume inference structure for the completion network. Further, by considering plane as a special geometry, we first identify planar regions from pseudo labels and then correct the low-confident pixels by high-confident labels through plane normal consistency. Extensive experiments on DTU and Tanks & Temples demonstrate the effectiveness of the proposed framework and the state-of-the-art performance.
Supplemental Material
- Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision 120, 2 (2016), 153--168.Google ScholarDigital Library
- Neill DF Campbell, George Vogiatzis, Carlos Hernández, and Roberto Cipolla. 2008. Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision. Springer, 766--779.Google ScholarDigital Library
- Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, and Hao Su. 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2524--2534.Google ScholarCross Ref
- Yuchao Dai, Zhidong Zhu, Zhibo Rao, and Bo Li. 2019. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV). IEEE, 1--8.Google ScholarCross Ref
- Yasutaka Furukawa and Jean Ponce. 2009. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence 32, 8 (2009), 1362--1376.Google Scholar
- Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2016. Gipuma: Massively parallel multi-view stereo reconstruction. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e. V 25, 361--369 (2016), 1--2.Google Scholar
- Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2495--2504.Google ScholarCross Ref
- Baichuan Huang, Can Huang, Yijia He, Jingbin Liu, and Xiao Liu. 2020. M? 3VSNet: Unsupervised Multi-metric Multi-view Stereo Network. arXiv preprint arXiv:2005.00363 (2020).Google Scholar
- Benjamin Irving. 2016. maskSLIC: regional superpixel generation with application to local pathology characterisation in medical images. arXiv preprint arXiv:1606.09518 (2016).Google Scholar
- Mengqi Ji, Juergen Gall, Haitian Zheng, Yebin Liu, and Lu Fang. 2017. Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision. 2307--2315.Google ScholarCross Ref
- Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, and Martial Hebert. 2019. Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv preprint arXiv:1905.02706 (2019).Google Scholar
- Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1--13.Google ScholarDigital Library
- Keyang Luo, Tao Guan, Lili Ju, Haipeng Huang, and Yawei Luo. 2019. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE International Conference on Computer Vision. 10452--10461.Google ScholarCross Ref
- Keyang Luo, Tao Guan, Lili Ju, Yuesong Wang, Zhuo Chen, and Yawei Luo. 2020. Attention-Aware Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1590--1599.Google ScholarCross Ref
- Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2019. Significanceaware information bottleneck for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6778-- 6787.Google Scholar
- Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, and Yi Yang. 2020. Adversarial style mining for one-shot unsupervised domain adaptation. Advances in Neural Information Processing Systems 33 (2020), 20612--20623.Google Scholar
- Yawei Luo, Ping Liu, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2021. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarDigital Library
- Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2019. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2507--2516.Google ScholarCross Ref
- Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. 2019. Selfsupervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 3288--3295.Google ScholarDigital Library
- Arijit Mallick, Jörg Stückler, and Hendrik Lensch. 2020. Learning to adapt multiview stereo by self-supervision. arXiv preprint arXiv:2009.13278 (2020).Google Scholar
- Paul Merrell, Amir Akbarzadeh, Liang Wang, Philippos Mordohai, Jan-Michael Frahm, Ruigang Yang, David Nistér, and Marc Pollefeys. 2007. Real-time visibilitybased fusion of depth maps. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.Google ScholarCross Ref
- Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405--421.Google ScholarDigital Library
- Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 165--174.Google ScholarCross Ref
- Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4104--4113.Google ScholarCross Ref
- Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision. Springer, 501--518.Google ScholarCross Ref
- Engin Tola, Christoph Strecha, and Pascal Fua. 2012. Efficient large-scale multiview stereo for ultra high-resolution image sets. Machine Vision and Applications 23, 5 (2012), 903--920.Google ScholarDigital Library
- Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14194--14203.Google ScholarCross Ref
- Yuesong Wang, Tao Guan, Zhuo Chen, Yawei Luo, Keyang Luo, and Lili Ju. 2020. Mesh-Guided Multi-View Stereo With Pyramid Architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2039--2048.Google ScholarCross Ref
- YuesongWang, Keyang Luo, Zhuo Chen, Lili Ju, and Tao Guan. 2021. DeepFusion: A simple way to improve traditional multi-view stereo methods using deep learning. Knowledge-Based Systems 221 (2021), 106968.Google ScholarCross Ref
- Zizhuang Wei, Qingtian Zhu, Chen Min, Yisong Chen, and Guoping Wang. 2021. AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 6167--6176.Google Scholar
- Alex Wong, Safa Cicek, and Stefano Soatto. 2021. Learning topology from synthetic data for unsupervised depth completion. IEEE Robotics and Automation Letters 6, 2 (2021), 1495--1502.Google ScholarCross Ref
- AlexWong, Xiaohan Fei, Byung-Woo Hong, and Stefano Soatto. 2021. An adaptive framework for learning unsupervised depth completion. IEEE Robotics and Automation Letters 6, 2 (2021), 3120--3127.Google ScholarCross Ref
- AlexWong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. 2020. Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters 5, 2 (2020), 1899--1906.Google ScholarCross Ref
- Alex Wong and Stefano Soatto. 2021. Unsupervised Depth Completion with Calibrated Backprojection Layers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12747--12756.Google ScholarCross Ref
- Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, and Qiuxia Wu. 2021. Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data- Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 2. 6.Google ScholarCross Ref
- Hongbin Xu, Zhipeng Zhou, YaliWang,Wenxiong Kang, Baigui Sun, Hao Li, and Yu Qiao. 2021. Digging into Uncertainty in Self-supervised Multi-view Stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6078-- 6087.Google ScholarCross Ref
- Luoyuan Xu, Yawei Luo, Keyang Luo, YuesongWang, Tao Guan, Zhuo Chen, and Wenkai Liu. 2021. Exploiting the Structure Information of Suppositional Mesh for Unsupervised Multi-View Stereo. IEEE MultiMedia (2021).Google Scholar
- Qingshan Xu and Wenbing Tao. 2020. Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12508--12515.Google ScholarCross Ref
- Jianfeng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, and Yu-Wing Tai. 2020. Dense hybrid recurrent multiview stereo net with dynamic consistency checking. In European Conference on Computer Vision. Springer, 674--689.Google Scholar
- Jiayu Yang, Jose M Alvarez, and Miaomiao Liu. 2021. Self-supervised Learning of Depth Inference for Multi-view Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7526--7534.Google ScholarCross Ref
- Jiayu Yang, Wei Mao, Jose M Alvarez, and Miaomiao Liu. 2020. Cost Volume Pyramid Based Depth Inference for Multi-View Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4877--4886.Google ScholarCross Ref
- Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665 (2017).Google Scholar
- Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV). 767--783.Google ScholarDigital Library
- Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5525--5534.Google ScholarCross Ref
Index Terms
- Self-Supervised Multi-view Stereo via Adjacent Geometry Guided Volume Completion
Recommendations
Self-supervised Multi-view Stereo via Inter and Intra Network Pseudo Depth
MM '22: Proceedings of the 30th ACM International Conference on MultimediaRecent self-supervised learning-based multi-view stereo (MVS) approaches have shown promising results. However, previous methods primarily utilize view synthesis as the replacement for costly ground-truth depth data to guide network learning, still ...
Multi-view photometric stereo using surface deformation
This paper presents a hybrid approach for 3D reconstruction by fusing photometric stereo and multi-view stereo. The 3D surface is obtained by capturing a set of images taken from different viewpoints under time-varying illuminations. Key factors in the ...
A DAISY descriptor based multi-view stereo method for large-scale scenes
Display Omitted An improved patch based MVS method by a novel photometric discrepancy function.Proposed a new corresponding point matching method based on the DAISY descriptor.Proposed a new photometric discrepancy function based on DAISY descriptor. ...
Comments