Abstract
The existing learning-based multi-view stereo (MVS) models primarily focus on predicting depth maps through a cascaded structure to achieve more robust reconstruction results. However, they often emphasize improving the quality of stereo matching while overlooking the importance of depth hypotheses. In this paper, we propose a novel MVS model from the perspective of probability volume analysis. First, the guiding effect of the probability volume is considered for depth refinement. Ideally, the probability distribution along the depth dimension of the probability volume follows an unimodal pattern. We design an unimodal curve to fit this pattern. Then, a reasonable depth refinement range is adaptively selected for each pixel position based on a predefined probability threshold. Additionally, considering that matching noise may cause the probability volume to appear as a blurred unimodal peak, we design the probability volume split-merge module (PVS-PVM). This module performs a peak search based on conditional constraints, splitting the probability volume into main and sub probability volumes, then computes the two sets of depth hypotheses from them. Finally, the new main and sub probability volumes are computed based on these depth hypotheses and merged to predict the depth. This approach allows for a more comprehensive consideration of the regions with higher probability, improving the robustness of depth hypotheses. Experimental results demonstrate that our method effectively utilizes probability volume information to guide depth map refinement and yields enhanced reconstruction results on the DTU and Tanks & Temples datasets. Our code will be released at https://github.com/zongh5a/ProbMVSNet.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets used in this study can be downloaded from https://github.com/YoYo000/MVSNet, and the results of the proposed model tested on the tanks and temples dataset are submitted to https://www.tanksandtemples.org/leaderboard/.
References
Ayman B, Malik M, Lotfi B (2023) DAM-SLAM: depth attention module in a semantic visual SLAM based on objects interaction for dynamic environments. Appl Intell 53(21):25802–25815
Rodriguez-Lozano FJ, Gámez-Granados JC, Martínez H, Palomares JM, Olivares J (2023) 3d reconstruction system and multiobject local tracking algorithm designed for billiards. Appl Intell 53(19):21543–21575
Zhang Z, Yu Y, Da F (2023) Vgpcnet: viewport group point clouds network for 3d shape recognition. Appl Intell 53(16):19060–19073
Chen J, Yu Z, Ma L, Zhang K (2023) Uncertainty awareness with adaptive propagation for multi-view stereo. Appl Intell 53:26230–26239
Cai Y, Li L, Wang D, Liu X (2023) MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3d reconstruction. Appl Intell 53(4):4289–4301
Zhao R, Gu Z, Han X, He L, Sun F, Jiao S (2023) Multi-view stereo network with point attention. Appl Intell 53(22):26622–26636
Giang K.T, Song S, Jo S (2022) Curvature-guided dynamic scale networks for multi-view stereo. In: International conference on learning representations (ICLR)
Yan J, Wei Z, Yi H, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: European conference on computer vision (ECCV), pp 674–689
Zhang Z, Peng R, Hu Y, Wang R (2023) GeoMVSNet: Learning multi-view stereo with geometry perception. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21508–21518
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2524–2534
Yang J, Mao W, Alvarez JM, Liu M (2020) Cost volume pyramid based depth inference for multi-view stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4877–4886
Wang L, Gong Y, Ma X, Wang Q, Zhou K, Chen L (2022) IS-MVSNet: Importance sampling-based mvsnet. In: European conference on computer vision (ECCV), pp 668–683. Springer
Han M, Yin H, Chong A, Du Q: Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume. Appl Intell, pp 1–17 (2024)
Zhang J, Li S, Luo Z, Fang T, Yao Y (2023) Ijcv. Int J Comput Vis 131:199–214
Chen W, Xu H, Zhou Z, Liu Y, Sun B, Kang W, Xie X (2023) CostFormer: Cost transformer for cost aggregation in multi-view stereo, pp 599–608
Xu Q, Su W, Qi Y, Tao W, Pollefeys M (2022) Learning inverse depth regression for pixelwise visibility-aware multi-view stereo networks. Int J Comput Vis 130(8):2040–2059
Wang F, Galliani S, Vogel C, Pollefeys M (2022) IterMVS: Iterative probability estimation for efficient multi-view stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8606–8615
Wang F, Galliani S, Vogel C, Speciale P, Pollefeys M (2021) Patchmatchnet: Learned multi-view patchmatch stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 14194–14203
Peng R, Wang R, Wang Z, Lai Y, Wang R (2022) Rethinking depth estimation for multi-view stereo: A unified representation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8645–8654
Su W, Tao W (2023) Efficient edge-preserving multi-view stereo network for depth estimation. In: AAAI conference on artificial intelligence vol 37, pp 2348–2356
Li Y, Zhao Z, Fan J, Li W (2022) ADR-MVSNet: A cascade network for 3d point cloud reconstruction with pixel occlusion. Pattern Recognit 125:108516
Zhang S, Xu W, Wei Z, Zhang L, Wang Y, Liu J (2023) ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval. Pattern Recognit 144:109885
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE international conference on computer vision (ICCV), pp 873–881
Schönberger JL, Zheng E, Frahm J-M, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision (ECCV), pp 501–518. Springer
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) MVSNet: Depth inference for unstructured multi-view stereo. In: European conference on computer vision (ECCV), pp 767–783
Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5525–5534
Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2495–2504
Yi H, Wei Z, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Pyramid multi-view stereo net with self-adaptive view aggregation. In: European conference on computer vision (ECCV), pp 766–782. Springer
Ma X, Gong Y, Wang Q, Huang J, Chen L, Yu F (2021) EPP-MVSNet: Epipolar-assembling based depth prediction for multi-view stereo. In: IEEE/CVF international conference on computer vision (ICCV), pp 5732–5740
Song S, Truong KG, Kim D, Jo S (2023) Prior depth-based multi-view stereo network for online 3D model reconstruction. Pattern Recognit 136:109198
Yan Q, Wang Q, Zhao K, Li B, Chu X, Deng F (2023) Rethinking disparity: a depth range free multi-view stereo based on disparity. AAAI conference on artificial intelligence 37:3091–3099
Xu G, Wang X, Ding X, Yang X (2023) Iterative geometry encoding volume for stereo matching. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21919–21928
Ding Y, Yuan W, Zhu Q, Zhang H, Liu X, Wang Y, Liu X (2022) TransMVSNet: Global context-aware multi-view stereo network with transformers. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8585–8594
Xi J, Shi Y, Wang Y, Guo Y, Xu K (2022) RayMVSNet: Learning ray-based 1D implicit fields for accurate multi-view stereo. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8595–8605
Wang X, Luo H, Wang Z, Zheng J, Bai X (2024) Robust training for multi-view stereo networks with noisy labels. Displays 81:102604
Yang R, Miao W, Zhang Z, Liu Z, Li M, Lin B (2024) SA-MVSNet: Self-attention-based multi-view stereo network for 3d reconstruction of images with weak texture. Eng Appl Artif Intell 131:107800
Wang Z, Luo H, Wang X, Zheng J, Ning X, Bai X (2024) A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy. Displays 83:102672
Wang L, Sun L, Duan F (2024) CT-MVSNet: Curvature-guided multi-view stereo with transformers. Multimedia Tools and Applications, pp 1–22
Chen Z, Zhao Y, He J, Lu Y, Cui Z, Li W, Zhang Y (2024) Feature distribution normalization network for multi-view stereo. The Visual Computer, pp 1–13
Lu P, Cai Y, Yang J, Wang D, Wu T (2024) UANet: Uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction. The Visual Computer, pp 1–14
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2117–2125
Chen J, Yu Z, Ma L, Zhang K (2023) Multi-distribution fitting for multi-view stereo. Mach Vis Appl 34(5):93
Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention (MICCAI), pp 234–241. Springer
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: IEEE international conference on computer vision (ICCV), pp 66–75
Aanæs H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis 120:153–168
Yao Y, Luo Z, Li S, Zhang J, Ren Y, Zhou L, Fang T, Quan L (2020) Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1790–1799
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36(4):1–13
Funding
This work was supported in part by the National Natural Science Foundation of China (No. 62105258, No. 62272383).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
We declare that we do not have any commercial or associative interest that represents a Conflict of interest in connection with the work submitted.
Ethics Approval and Consent to Participate
Ethics approval was not required for this research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Z., Wang, H., Li, J. et al. Rethinking probability volume for multi-view stereo: A probability analysis method. Appl Intell 55, 396 (2025). https://doi.org/10.1007/s10489-025-06284-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06284-w