Abstract
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multi-view images. While the existing state-of-the-art methods have achieved satisfactory results, the accuracy and scalability remain an open problem due to unreliable dense matching and memory-consuming cost volume regularization. To this end, we propose a multi-level fusion aware feature pyramid based multi-view stereo network (MFNet) for reliable depth inference. First, we adopt a coarse-to-fine strategy that achieves high-resolution depth estimation based on the coarse depth map. This strategy gradually narrows the depth search interval by using the prior information from the previous stage, which dramatically reduces memory consumption. Second, we conduct multi-level fusions to construct the feature pyramid such that the different level features receive information from each other, thus enabling rich multi-level feature representations. Finally, the group-wise correlation similarity measure is introduced to replace the variance-based approach used in previous works for cost volume construction, resulting in a lightweight and effective cost volume representation. Experimental results on the DTU, Tanks & Temples, and BlendedMVS benchmark datasets show that MFNet achieves better results than the state-of-the-art methods.
Similar content being viewed by others
References
Seitz SM, Curless B, Diebel J, Scharstein D, Szeliski R (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE Conference on computer vision and pattern recognition. vol 1, pp 519–528
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE Conference on computer vision and pattern recognition. pp 873–881
Tola E, Strecha C, Fua P (2012) Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach Vis Appl 23:903–920
Furukawa Y (2010) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: European conference on computer vision. pp 785–801
Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: IEEE Conference on computer vision and pattern recognition. pp 5525–5534
Chen R, Han S, Xu J, Su H (2019) Point-based multi-view stereo network. In: IEEE International conference on computer vision. pp 1538–1547
Yan J, Wei Z, Yi H, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: European conference on computer vision. pp 674–689
Yang J, Mao W, Alvarez JM, Liu M (2021) Cost volume pyramid based depth inference for multi-view stereo. IEEE transactions on pattern analysis and machine intelligence
Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE Conference on computer vision and pattern recognition. pp 2495–2504
Aanaes H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis 120(2):153–168
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph 36(4):1–13
Yao Y, Luo Z, Li S, Zhang J, Ren Y, Zhou L, Fang T, Quan L (2020) Blendedmvs: a large-scale dataset for generalized multi-view stereo networks. In: IEEE Conference on computer vision and pattern recognition. pp 1790–1799
Sinha SN, Mordohai P, Pollefeys M (2007) Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In: IEEE Conference on computer vision and pattern recognition. pp 1–8
Ulusoy AO, Black MJ, Geiger A (2017) Semantic multi-view stereo: jointly estimating objects and voxels. In: IEEE Conference on computer vision and pattern recognition. pp 4531–4540
Cremers D, Kolev K (2011) Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans Pattern Anal Mach Intell 33(6):1161–1174
Li Z, Wang K, Zuo W, Meng D, Zhang L (2016) Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Transactions on Image Processing 25(2):864–877
Locher A, Perdoch M, Gool LV (2016) Progressive prioritized multi-view stereo. In: IEEE Conference on computer vision and pattern recognition. pp 3244–3252
Xu Q, Tao W (2019) Multi-scale geometric consistency guided multi-view stereo. In: IEEE Conference on computer vision and pattern recognition. pp 5483–5492
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition. pp 770–778
Qian K, Tian L, Liu Y, Wen X, Bao J (2021) Image robust recognition based on feature-entropy-oriented differential fusion capsule network. Appl Intell 51(2):1108–1117
Xie E, Ding j, Wang W, Zhan X, Xu H, Sun P, Li Z, Luo P (2021) Detco: unsupervised contrastive learning for object detection. In: IEEE International conference on computer vision. pp 8392–8401
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51(9):6400–6429
Zhang X-L, Du B-C, Luo Z-C, Ma K (2021) Lightweight and efficient asymmetric network design for real-time semantic segmentation. Applied Intelligence. pp 1–16
Hartmann W, Galliani S, Havlena M, Van Gool L, Schindler K (2017) Learned multi-patch similarity. In: IEEE International conference on computer vision. pp 1586–1594
Kar A, Hane C (2017) Learning a multi-view stereo machine. In: Neural information processing systems. pp 365–376
Ji M, Gall J, Zheng H, Liu Y, Fang L (2017) Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: IEEE International conference on computer vision. pp 2326–2334
Ji M, Zhang J, Dai Q, Fang L (2020) surfacenet+: an end-to-end 3d neural network for very sparse multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 43(11):4078–4093
Wei Z, Zhu Q, Min C, Chen Y, Wang G (2021) Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: IEEE International conference on computer vision. pp 6187– 6196
Yu Z, Gao S (2020) Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: IEEE Conference on computer vision and pattern recognition. pp 1949–1958
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE Conference on computer vision and pattern recognition. pp 2524–2534
Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: National conference on artificial intelligence. vol 34, pp 12508–12515
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE Conference on computer vision and pattern recognition. pp 5693–5703
Zhang T, Qi G-J, Xiao B, Wang J (2017) Interleaved group convolutions. In: IEEE International conference on computer vision. pp 4383–4392
Zhao L, Li M, Meng D, Li X, Zhang Z, Zhuang Y, Tu Z, Wang J (2018) Deep convolutional neural networks with merge-and-run mappings. In: International joint conference on artificial intelligence. pp 3170–3176
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention. pp 234–241
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition. pp 936–944
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: IEEE Conference on computer vision and pattern recognition. pp 3273–3282
Campbell ND, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: European conference on computer vision. pp 766–779
Luo K, Guan T, Ju L, Huang H, Luo Y (2019) P-mvsnet: learning patch-wise matching confidence aggregation for multi-view stereo. In: IEEE International conference on computer vision. pp 10451–10460
Li Y, Zhao Z, Fan J, Li W (2022) Adr-mvsnet: a novel cascade network for 3d point cloud reconstruction with pixel occlusion. Pattern recognition 108516
Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: IEEE Conference on computer vision and pattern recognition. pp 4104–4113
Acknowledgements
This work was supported in part by the National Key Research and Development Program of China under Grant 2020YFC1523100 and in part the National Natural Science Foundation of China under Grant 61877016.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cai, Y., Li, L., Wang, D. et al. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction. Appl Intell 53, 4289–4301 (2023). https://doi.org/10.1007/s10489-022-03754-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03754-3