Skip to main content
Log in

Rethinking probability volume for multi-view stereo: A probability analysis method

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The existing learning-based multi-view stereo (MVS) models primarily focus on predicting depth maps through a cascaded structure to achieve more robust reconstruction results. However, they often emphasize improving the quality of stereo matching while overlooking the importance of depth hypotheses. In this paper, we propose a novel MVS model from the perspective of probability volume analysis. First, the guiding effect of the probability volume is considered for depth refinement. Ideally, the probability distribution along the depth dimension of the probability volume follows an unimodal pattern. We design an unimodal curve to fit this pattern. Then, a reasonable depth refinement range is adaptively selected for each pixel position based on a predefined probability threshold. Additionally, considering that matching noise may cause the probability volume to appear as a blurred unimodal peak, we design the probability volume split-merge module (PVS-PVM). This module performs a peak search based on conditional constraints, splitting the probability volume into main and sub probability volumes, then computes the two sets of depth hypotheses from them. Finally, the new main and sub probability volumes are computed based on these depth hypotheses and merged to predict the depth. This approach allows for a more comprehensive consideration of the regions with higher probability, improving the robustness of depth hypotheses. Experimental results demonstrate that our method effectively utilizes probability volume information to guide depth map refinement and yields enhanced reconstruction results on the DTU and Tanks & Temples datasets. Our code will be released at https://github.com/zongh5a/ProbMVSNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets used in this study can be downloaded from https://github.com/YoYo000/MVSNet, and the results of the proposed model tested on the tanks and temples dataset are submitted to https://www.tanksandtemples.org/leaderboard/.

References

  1. Ayman B, Malik M, Lotfi B (2023) DAM-SLAM: depth attention module in a semantic visual SLAM based on objects interaction for dynamic environments. Appl Intell 53(21):25802–25815

    Article  Google Scholar 

  2. Rodriguez-Lozano FJ, Gámez-Granados JC, Martínez H, Palomares JM, Olivares J (2023) 3d reconstruction system and multiobject local tracking algorithm designed for billiards. Appl Intell 53(19):21543–21575

    Article  Google Scholar 

  3. Zhang Z, Yu Y, Da F (2023) Vgpcnet: viewport group point clouds network for 3d shape recognition. Appl Intell 53(16):19060–19073

  4. Chen J, Yu Z, Ma L, Zhang K (2023) Uncertainty awareness with adaptive propagation for multi-view stereo. Appl Intell 53:26230–26239

    Article  MATH  Google Scholar 

  5. Cai Y, Li L, Wang D, Liu X (2023) MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3d reconstruction. Appl Intell 53(4):4289–4301

    Article  MATH  Google Scholar 

  6. Zhao R, Gu Z, Han X, He L, Sun F, Jiao S (2023) Multi-view stereo network with point attention. Appl Intell 53(22):26622–26636

    Article  MATH  Google Scholar 

  7. Giang K.T, Song S, Jo S (2022) Curvature-guided dynamic scale networks for multi-view stereo. In: International conference on learning representations (ICLR)

  8. Yan J, Wei Z, Yi H, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: European conference on computer vision (ECCV), pp 674–689

  9. Zhang Z, Peng R, Hu Y, Wang R (2023) GeoMVSNet: Learning multi-view stereo with geometry perception. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21508–21518

  10. Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2524–2534

  11. Yang J, Mao W, Alvarez JM, Liu M (2020) Cost volume pyramid based depth inference for multi-view stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4877–4886

  12. Wang L, Gong Y, Ma X, Wang Q, Zhou K, Chen L (2022) IS-MVSNet: Importance sampling-based mvsnet. In: European conference on computer vision (ECCV), pp 668–683. Springer

  13. Han M, Yin H, Chong A, Du Q: Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume. Appl Intell, pp 1–17 (2024)

  14. Zhang J, Li S, Luo Z, Fang T, Yao Y (2023) Ijcv. Int J Comput Vis 131:199–214

    Article  Google Scholar 

  15. Chen W, Xu H, Zhou Z, Liu Y, Sun B, Kang W, Xie X (2023) CostFormer: Cost transformer for cost aggregation in multi-view stereo, pp 599–608

  16. Xu Q, Su W, Qi Y, Tao W, Pollefeys M (2022) Learning inverse depth regression for pixelwise visibility-aware multi-view stereo networks. Int J Comput Vis 130(8):2040–2059

    Article  MATH  Google Scholar 

  17. Wang F, Galliani S, Vogel C, Pollefeys M (2022) IterMVS: Iterative probability estimation for efficient multi-view stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8606–8615

  18. Wang F, Galliani S, Vogel C, Speciale P, Pollefeys M (2021) Patchmatchnet: Learned multi-view patchmatch stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14194–14203

  19. Peng R, Wang R, Wang Z, Lai Y, Wang R (2022) Rethinking depth estimation for multi-view stereo: A unified representation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8645–8654

  20. Su W, Tao W (2023) Efficient edge-preserving multi-view stereo network for depth estimation. In: AAAI conference on artificial intelligence vol 37, pp 2348–2356

  21. Li Y, Zhao Z, Fan J, Li W (2022) ADR-MVSNet: A cascade network for 3d point cloud reconstruction with pixel occlusion. Pattern Recognit 125:108516

    Article  MATH  Google Scholar 

  22. Zhang S, Xu W, Wei Z, Zhang L, Wang Y, Liu J (2023) ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval. Pattern Recognit 144:109885

    Article  MATH  Google Scholar 

  23. Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE international conference on computer vision (ICCV), pp 873–881

  24. Schönberger JL, Zheng E, Frahm J-M, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision (ECCV), pp 501–518. Springer

  25. Yao Y, Luo Z, Li S, Fang T, Quan L (2018) MVSNet: Depth inference for unstructured multi-view stereo. In: European conference on computer vision (ECCV), pp 767–783

  26. Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5525–5534

  27. Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2495–2504

  28. Yi H, Wei Z, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Pyramid multi-view stereo net with self-adaptive view aggregation. In: European conference on computer vision (ECCV), pp 766–782. Springer

  29. Ma X, Gong Y, Wang Q, Huang J, Chen L, Yu F (2021) EPP-MVSNet: Epipolar-assembling based depth prediction for multi-view stereo. In: IEEE/CVF international conference on computer vision (ICCV), pp 5732–5740

  30. Song S, Truong KG, Kim D, Jo S (2023) Prior depth-based multi-view stereo network for online 3D model reconstruction. Pattern Recognit 136:109198

    Article  Google Scholar 

  31. Yan Q, Wang Q, Zhao K, Li B, Chu X, Deng F (2023) Rethinking disparity: a depth range free multi-view stereo based on disparity. AAAI conference on artificial intelligence 37:3091–3099

    Article  MATH  Google Scholar 

  32. Xu G, Wang X, Ding X, Yang X (2023) Iterative geometry encoding volume for stereo matching. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21919–21928

  33. Ding Y, Yuan W, Zhu Q, Zhang H, Liu X, Wang Y, Liu X (2022) TransMVSNet: Global context-aware multi-view stereo network with transformers. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8585–8594

  34. Xi J, Shi Y, Wang Y, Guo Y, Xu K (2022) RayMVSNet: Learning ray-based 1D implicit fields for accurate multi-view stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8595–8605

  35. Wang X, Luo H, Wang Z, Zheng J, Bai X (2024) Robust training for multi-view stereo networks with noisy labels. Displays 81:102604

    Article  MATH  Google Scholar 

  36. Yang R, Miao W, Zhang Z, Liu Z, Li M, Lin B (2024) SA-MVSNet: Self-attention-based multi-view stereo network for 3d reconstruction of images with weak texture. Eng Appl Artif Intell 131:107800

    Article  Google Scholar 

  37. Wang Z, Luo H, Wang X, Zheng J, Ning X, Bai X (2024) A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy. Displays 83:102672

    Article  MATH  Google Scholar 

  38. Wang L, Sun L, Duan F (2024) CT-MVSNet: Curvature-guided multi-view stereo with transformers. Multimedia Tools and Applications, pp 1–22

  39. Chen Z, Zhao Y, He J, Lu Y, Cui Z, Li W, Zhang Y (2024) Feature distribution normalization network for multi-view stereo. The Visual Computer, pp 1–13

  40. Lu P, Cai Y, Yang J, Wang D, Wu T (2024) UANet: Uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction. The Visual Computer, pp 1–14

  41. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125

  42. Chen J, Yu Z, Ma L, Zhang K (2023) Multi-distribution fitting for multi-view stereo. Mach Vis Appl 34(5):93

    Article  MATH  Google Scholar 

  43. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention (MICCAI), pp 234–241. Springer

  44. Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: IEEE international conference on computer vision (ICCV), pp 66–75

  45. Aanæs H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis 120:153–168

    Article  MathSciNet  Google Scholar 

  46. Yao Y, Luo Z, Li S, Zhang J, Ren Y, Zhou L, Fang T, Quan L (2020) Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1790–1799

  47. Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36(4):1–13

    Article  MATH  Google Scholar 

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62105258, No. 62272383).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaijun Wang.

Ethics declarations

Conflict of Interest

We declare that we do not have any commercial or associative interest that represents a Conflict of interest in connection with the work submitted.

Ethics Approval and Consent to Participate

Ethics approval was not required for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Z., Wang, H., Li, J. et al. Rethinking probability volume for multi-view stereo: A probability analysis method. Appl Intell 55, 396 (2025). https://doi.org/10.1007/s10489-025-06284-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-025-06284-w

Keywords