MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Cai, Youcheng; Li, Lin; Wang, Dong; Liu, Xiaoping

doi:10.1007/s10489-022-03754-3

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Published: 07 June 2022

Volume 53, pages 4289–4301, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Youcheng Cai¹,
Lin Li¹,
Dong Wang¹ &
…
Xiaoping Liu^1,2

678 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multi-view images. While the existing state-of-the-art methods have achieved satisfactory results, the accuracy and scalability remain an open problem due to unreliable dense matching and memory-consuming cost volume regularization. To this end, we propose a multi-level fusion aware feature pyramid based multi-view stereo network (MFNet) for reliable depth inference. First, we adopt a coarse-to-fine strategy that achieves high-resolution depth estimation based on the coarse depth map. This strategy gradually narrows the depth search interval by using the prior information from the previous stage, which dramatically reduces memory consumption. Second, we conduct multi-level fusions to construct the feature pyramid such that the different level features receive information from each other, thus enabling rich multi-level feature representations. Finally, the group-wise correlation similarity measure is introduced to replace the variance-based approach used in previous works for cost volume construction, resulting in a lightweight and effective cost volume representation. Experimental results on the DTU, Tanks & Temples, and BlendedMVS benchmark datasets show that MFNet achieves better results than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo

References

Seitz SM, Curless B, Diebel J, Scharstein D, Szeliski R (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE Conference on computer vision and pattern recognition. vol 1, pp 519–528
Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE Conference on computer vision and pattern recognition. pp 873–881
Tola E, Strecha C, Fua P (2012) Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach Vis Appl 23:903–920
Article Google Scholar
Furukawa Y (2010) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376
Article Google Scholar
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: European conference on computer vision. pp 785–801
Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: IEEE Conference on computer vision and pattern recognition. pp 5525–5534
Chen R, Han S, Xu J, Su H (2019) Point-based multi-view stereo network. In: IEEE International conference on computer vision. pp 1538–1547
Yan J, Wei Z, Yi H, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: European conference on computer vision. pp 674–689
Yang J, Mao W, Alvarez JM, Liu M (2021) Cost volume pyramid based depth inference for multi-view stereo. IEEE transactions on pattern analysis and machine intelligence
Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE Conference on computer vision and pattern recognition. pp 2495–2504
Aanaes H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis 120(2):153–168
Article Google Scholar
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph 36(4):1–13
Article Google Scholar
Yao Y, Luo Z, Li S, Zhang J, Ren Y, Zhou L, Fang T, Quan L (2020) Blendedmvs: a large-scale dataset for generalized multi-view stereo networks. In: IEEE Conference on computer vision and pattern recognition. pp 1790–1799
Sinha SN, Mordohai P, Pollefeys M (2007) Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In: IEEE Conference on computer vision and pattern recognition. pp 1–8
Ulusoy AO, Black MJ, Geiger A (2017) Semantic multi-view stereo: jointly estimating objects and voxels. In: IEEE Conference on computer vision and pattern recognition. pp 4531–4540
Cremers D, Kolev K (2011) Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans Pattern Anal Mach Intell 33(6):1161–1174
Article Google Scholar
Li Z, Wang K, Zuo W, Meng D, Zhang L (2016) Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Transactions on Image Processing 25(2):864–877
Article MATH Google Scholar
Locher A, Perdoch M, Gool LV (2016) Progressive prioritized multi-view stereo. In: IEEE Conference on computer vision and pattern recognition. pp 3244–3252
Xu Q, Tao W (2019) Multi-scale geometric consistency guided multi-view stereo. In: IEEE Conference on computer vision and pattern recognition. pp 5483–5492
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition. pp 770–778
Qian K, Tian L, Liu Y, Wen X, Bao J (2021) Image robust recognition based on feature-entropy-oriented differential fusion capsule network. Appl Intell 51(2):1108–1117
Article Google Scholar
Xie E, Ding j, Wang W, Zhan X, Xu H, Sun P, Li Z, Luo P (2021) Detco: unsupervised contrastive learning for object detection. In: IEEE International conference on computer vision. pp 8392–8401
Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51(9):6400–6429
Article Google Scholar
Zhang X-L, Du B-C, Luo Z-C, Ma K (2021) Lightweight and efficient asymmetric network design for real-time semantic segmentation. Applied Intelligence. pp 1–16
Hartmann W, Galliani S, Havlena M, Van Gool L, Schindler K (2017) Learned multi-patch similarity. In: IEEE International conference on computer vision. pp 1586–1594
Kar A, Hane C (2017) Learning a multi-view stereo machine. In: Neural information processing systems. pp 365–376
Ji M, Gall J, Zheng H, Liu Y, Fang L (2017) Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: IEEE International conference on computer vision. pp 2326–2334
Ji M, Zhang J, Dai Q, Fang L (2020) surfacenet+: an end-to-end 3d neural network for very sparse multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 43(11):4078–4093
Article Google Scholar
Wei Z, Zhu Q, Min C, Chen Y, Wang G (2021) Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: IEEE International conference on computer vision. pp 6187– 6196
Yu Z, Gao S (2020) Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: IEEE Conference on computer vision and pattern recognition. pp 1949–1958
Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE Conference on computer vision and pattern recognition. pp 2524–2534
Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: National conference on artificial intelligence. vol 34, pp 12508–12515
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE Conference on computer vision and pattern recognition. pp 5693–5703
Zhang T, Qi G-J, Xiao B, Wang J (2017) Interleaved group convolutions. In: IEEE International conference on computer vision. pp 4383–4392
Zhao L, Li M, Meng D, Li X, Zhang Z, Zhuang Y, Tu Z, Wang J (2018) Deep convolutional neural networks with merge-and-run mappings. In: International joint conference on artificial intelligence. pp 3170–3176
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention. pp 234–241
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition. pp 936–944
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: IEEE Conference on computer vision and pattern recognition. pp 3273–3282
Campbell ND, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: European conference on computer vision. pp 766–779
Luo K, Guan T, Ju L, Huang H, Luo Y (2019) P-mvsnet: learning patch-wise matching confidence aggregation for multi-view stereo. In: IEEE International conference on computer vision. pp 10451–10460
Li Y, Zhao Z, Fan J, Li W (2022) Adr-mvsnet: a novel cascade network for 3d point cloud reconstruction with pixel occlusion. Pattern recognition 108516
Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: IEEE Conference on computer vision and pattern recognition. pp 4104–4113

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2020YFC1523100 and in part the National Natural Science Foundation of China under Grant 61877016.

Author information

Authors and Affiliations

The School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, China
Youcheng Cai, Lin Li, Dong Wang & Xiaoping Liu
The Engineering Research Center of Safety Critical Industrial Measurement and Control Technology, Ministry of Education, Hefei, 230009, China
Xiaoping Liu

Authors

Youcheng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoping Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, Y., Li, L., Wang, D. et al. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction. Appl Intell 53, 4289–4301 (2023). https://doi.org/10.1007/s10489-022-03754-3

Download citation

Accepted: 10 May 2022
Published: 07 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03754-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Abstract

Access this article

Similar content being viewed by others

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Abstract

Access this article

Similar content being viewed by others

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

LE-MVSNet: Lightweight Efficient Multi-view Stereo Network

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation