Skip to main content
Log in

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Learning-based multi-view stereo (MVS) methods have demonstrated promising results. However, very few existing networks explicitly take the pixel-wise visibility into consideration, resulting in erroneous cost aggregation from occluded pixels. In this paper, we explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation. The pair-wise uncertainty map is jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion. As such, the adverse influence of occluded pixels is suppressed in the cost fusion. The proposed framework Vis-MVSNet significantly improves depth accuracy in reconstruction scenes with severe occlusion. Extensive experiments are performed on DTU, BlendedMVS, Tanks and Temples and ETH3D datasets to justify the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Campbell, N. D., Vogiatzis, G., Hernández, C., & Cipolla, R. (2008). Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 766–779.

  • Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In International Conference on Computer Vision (ICCV), pp. 1538–1547.

  • Cheng, S., Xu, Z., Zhu, S., Li, Z., Li, L. E., Ramamoorthi, R., & Su, H. (2020). Deep stereo using adaptive thin volume representation with uncertainty awareness. In Computer Vision and Pattern Recognition (CVPR), pp. 2524–2534.

  • Furukawa, Y. & Ponce, J. (2006). Carved visual hulls for image-based modeling. In European Conference on Computer Vision (ECCV), Springer, pp. 564–577.

  • Furukawa, Y., & Ponce, J. (2009). Accurate, dense, and Robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8), 1362–1376.

    Article  Google Scholar 

  • Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In International Conference on Computer Vision (ICCV), pp 873–881.

  • Grum, M., & Bors, A. G. (2014). 3d modeling of multiple-object scenes from sets of images. Pattern Recognition, 47(1), 326–343.

    Article  Google Scholar 

  • Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F. & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504.

  • Guo, X., Yang, K., Yang, W., Wang, X. & Li, H. (2019). Group-wise correlation stereo network. In Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282.

  • Hartmann, W., Galliani, S., Havlena, M., Van Gool, L. & Schindler, K. (2017). Learned multi-patch similarity. In International Conference on Computer Vision (ICCV), pp 1586–1594.

  • He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), pp. 770–778.

  • Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2121–2133.

    Article  Google Scholar 

  • Huang, P. H., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. B. (2018). Deepmvs: Learning multi-view stereopsis. In Computer Vision and Pattern Recognition (CVPR), pp. 2821–2830.

  • Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., & Aanæs, H. (2014). Large scale multi-view stereopsis evaluation. In Computer Vision and Pattern Recognition (CVPR), pp. 406–413.

  • Ji, M., Gall, J., Zheng, H., Liu, Y., & Fang, L. (2017). Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In International Conference on Computer Vision (ICCV), pp. 2307–2315.

  • Ji, M., Zhang, J., Dai, Q., & Fang, L. (2020). Surfacenet+: An end-to-end 3d neural network for very sparse multi-view stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 4078–4093.

    Article  Google Scholar 

  • Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In Neural Information Processing Systems (NeurIPS), vol 30.

  • Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision? In Neural Information Processing Systems (NeurIPS), vol 30.

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In International Conference on Computer Vision (ICCV), pp. 66–75.

  • Kim, S., Min, D., Kim, S., & Sohn, K. (2018). Unified confidence estimation networks for robust stereo matching. IEEE Transactions on Image Processing, 28(3), 1299–1313.

    Article  MathSciNet  Google Scholar 

  • Kim, S., Kim, S., Min, D., & Sohn, K. (2019). Laf-net: Locally adaptive fusion networks for stereo confidence estimation. In Computer Vision and Pattern Recognition (CVPR), pp. 205–214.

  • Kingma, D. P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Knapitsch, A., Park, J., Zhou, Q. Y., & Koltun, V. (2017). Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4), 78.

    Article  Google Scholar 

  • Kuhn, A., Hirschmüller, H., Scharstein, D., & Mayer, H. (2017). A tv prior for high-quality scalable multi-view stereo reconstruction. International Journal of Computer Vision, 124(1), 2–17.

    Article  MathSciNet  Google Scholar 

  • Kuhn, A., Lin, S., & Erdler, O. (2019). Plane completion and filtering for multi-view stereo reconstruction. In German Conference on Pattern Recognition (GCPR), pp. 18–32.

  • Kuhn, A., Sormann, C., Rossi, M., Erdler, O., & Fraundorfer, F. (2020). Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In International Conference on 3D Vision (3DV), pp. 404–413.

  • Kutulakos, K. N., & Seitz, S. M. (2000). A theory of shape by space carving. International Journal of Computer Vision, 38(3), 199–218.

    Article  MATH  Google Scholar 

  • Lhuillier, M., & Quan, L. (2005). A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 418–433.

    Article  Google Scholar 

  • Li, Z., Zuo, W., Wang, Z., & Zhang, L. (2020). Confidence-based large-scale dense multi-view stereo. IEEE Transactions on Image Processing, 29, 7176–7191.

    Article  MATH  Google Scholar 

  • Liao, J., Fu, Y., Yan, Q., & Xiao, C. (2019). Pyramid multi-view stereo with local consistency. Computer Graphics Forum, 38(7), 335–346.

    Article  Google Scholar 

  • Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J. M., Yang, R., Nistér, D., & Pollefeys, M. (2007). Real-time visibility-based fusion of depth maps. In International Conference on Computer Vision (ICCV), pp. 1–8.

  • Paschalidou, D., Ulusoy, O., Schmitt, C., Van Gool, L., & Geiger, A. (2018). Raynet: Learning volumetric 3d reconstruction with ray potentials. In Computer Vision and Pattern Recognition (CVPR), pp. 3897–3906.

  • Poggi, M., & Mattoccia, S. (2016). Learning from scratch a confidence measure. In British Machine Vision Conference (BMVC), vol 2, pp. 4.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention, pp. 234–241.

  • Schönberger, J. L., Zheng, E., Frahm, J. M., & Pollefeys, M. (2016). Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 501–518.

  • Schops, T., Schonberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., & Geiger, A. (2017). A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269.

  • Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, vol 1, pp. 519–528.

  • Slabaugh, G. G., Culbertson, W. B., Malzbender, T., Stevens, M. R., & Schafer, R. W. (2004). Methods for volumetric reconstruction of visual scenes. International Journal of Computer Vision, 57(3), 179–199.

    Article  Google Scholar 

  • Sormann, C., Knöbelreiter, P., Kuhn, A., Rossi, M., Pock, T., & Fraundorfer, F. (2020). Bp-mvsnet: Belief-propagation-layers for multi-view-stereo. In International Conference on 3D Vision (3DV), pp. 394–403.

  • Tola, E., Strecha, C., & Fua, P. (2012). Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications, 23(5), 903–920.

    Article  Google Scholar 

  • Tosi, F., Poggi, M., Benincasa, A., & Mattoccia, S. (2018). Beyond local reasoning for stereo confidence estimation with deep learning. In European Conference on Computer Vision (ECCV), pp. 319–334.

  • Wang, F., Galliani, S., Vogel, C., Speciale, P., & Pollefeys, M. (2021). Patchmatchnet: Learned multi-view patchmatch stereo. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14194–14203.

  • Xu, Q., & Tao, W. (2019). Multi-scale geometric consistency guided multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), pp. 5483–5492.

  • Xu, Q., & Tao, W. (2020). Planar prior assisted patchmatch multi-view stereo. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12516–12523.

  • Xu, Z., Liu, Y., Shi, X., Wang, Y., & Zheng, Y. (2020). Marmvs: Matching ambiguity reduced multiple view stereo for efficient large scale scene reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5981–5990.

  • Xue, Y., Chen, J., Wan, W., Huang, Y., Yu, C., Li, T., & Bao, J. (2019). Mvscrf: Learning multi-view stereo with conditional random fields. In International Conference on Computer Vision (ICCV), pp. 4312–4321.

  • Yan, J., Wei, Z., Yi, H., Ding, M., Zhang, R., Chen, Y., Wang, G., & Tai, Y.W. (2020). Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In European Conference on Computer Vision (ECCV), pp. 674–689.

  • Yang, J., Mao, W., Alvarez, J. M., & Liu, M. (2020). Cost volume pyramid based depth inference for multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886.

  • Yao, Y., Li, S., Zhu, S., Deng, H., Fang, T., & Quan, L. (2017). Relative camera refinement for accurate dense reconstruction. In 2017 International Conference on 3D Vision (3DV), IEEE, pp. 185–194.

  • Yao, Y., Luo, Z., Li, S., Fang, T., & Quan, L. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), pp. 767–783.

  • Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2019). Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Computer Vision and Pattern Recognition (CVPR), pp. 5525–5534.

  • Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., Fang, T., & Quan, L. (2020). Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Computer Vision and Pattern Recognition (CVPR), pp. 1790–1799.

  • Zhang, J., Yao, Y., Li, S., Luo, Z., & Fang, T. (2020). Visibility-aware multi-view stereo network. In British Machine Vision Conference (BMVC).

  • Zhang, J., Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., & Quan, L. (2020). Learning stereo matchability in disparity regression networks. In International Conference on Pattern Recognition (ICPR), pp. 1611–1618.

  • Zhang, R., Li, S., Fang, T., Zhu, S., & Quan, L. (2015). Joint camera clustering and surface segmentation for large-scale multi-view stereo. In International Conference on Computer Vision (ICCV), pp. 2084–2092.

  • Zheng, E., Dunn, E., Jojic, V., & Frahm, J. M, (2014). Patchmatch based joint view selection and depthmap estimation. In Computer Vision and Pattern Recognition (CVPR), pp. 1510–1517.

Download references

Acknowledgements

This work is supported by Hong Kong RGC GRF 16206819, 16203518 and T22-603/15N.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao Yao.

Additional information

Communicated by William Smith.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Li, S., Luo, Z. et al. Vis-MVSNet: Visibility-Aware Multi-view Stereo Network. Int J Comput Vis 131, 199–214 (2023). https://doi.org/10.1007/s11263-022-01697-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01697-3

Keywords

Navigation