Skip to main content
Log in

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multi-view images. While the existing state-of-the-art methods have achieved satisfactory results, the accuracy and scalability remain an open problem due to unreliable dense matching and memory-consuming cost volume regularization. To this end, we propose a multi-level fusion aware feature pyramid based multi-view stereo network (MFNet) for reliable depth inference. First, we adopt a coarse-to-fine strategy that achieves high-resolution depth estimation based on the coarse depth map. This strategy gradually narrows the depth search interval by using the prior information from the previous stage, which dramatically reduces memory consumption. Second, we conduct multi-level fusions to construct the feature pyramid such that the different level features receive information from each other, thus enabling rich multi-level feature representations. Finally, the group-wise correlation similarity measure is introduced to replace the variance-based approach used in previous works for cost volume construction, resulting in a lightweight and effective cost volume representation. Experimental results on the DTU, Tanks & Temples, and BlendedMVS benchmark datasets show that MFNet achieves better results than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Seitz SM, Curless B, Diebel J, Scharstein D, Szeliski R (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. In: IEEE Conference on computer vision and pattern recognition. vol 1, pp 519–528

  2. Galliani S, Lasinger K, Schindler K (2015) Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE Conference on computer vision and pattern recognition. pp 873–881

  3. Tola E, Strecha C, Fua P (2012) Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach Vis Appl 23:903–920

    Article  Google Scholar 

  4. Furukawa Y (2010) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376

    Article  Google Scholar 

  5. Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: European conference on computer vision. pp 785–801

  6. Yao Y, Luo Z, Li S, Shen T, Fang T, Quan L (2019) Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: IEEE Conference on computer vision and pattern recognition. pp 5525–5534

  7. Chen R, Han S, Xu J, Su H (2019) Point-based multi-view stereo network. In: IEEE International conference on computer vision. pp 1538–1547

  8. Yan J, Wei Z, Yi H, Ding M, Zhang R, Chen Y, Wang G, Tai Y-W (2020) Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: European conference on computer vision. pp 674–689

  9. Yang J, Mao W, Alvarez JM, Liu M (2021) Cost volume pyramid based depth inference for multi-view stereo. IEEE transactions on pattern analysis and machine intelligence

  10. Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE Conference on computer vision and pattern recognition. pp 2495–2504

  11. Aanaes H, Jensen RR, Vogiatzis G, Tola E, Dahl AB (2016) Large-scale data for multiple-view stereopsis. Int J Comput Vis 120(2):153–168

    Article  Google Scholar 

  12. Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph 36(4):1–13

    Article  Google Scholar 

  13. Yao Y, Luo Z, Li S, Zhang J, Ren Y, Zhou L, Fang T, Quan L (2020) Blendedmvs: a large-scale dataset for generalized multi-view stereo networks. In: IEEE Conference on computer vision and pattern recognition. pp 1790–1799

  14. Sinha SN, Mordohai P, Pollefeys M (2007) Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In: IEEE Conference on computer vision and pattern recognition. pp 1–8

  15. Ulusoy AO, Black MJ, Geiger A (2017) Semantic multi-view stereo: jointly estimating objects and voxels. In: IEEE Conference on computer vision and pattern recognition. pp 4531–4540

  16. Cremers D, Kolev K (2011) Multiview stereo and silhouette consistency via convex functionals over convex domains. IEEE Trans Pattern Anal Mach Intell 33(6):1161–1174

    Article  Google Scholar 

  17. Li Z, Wang K, Zuo W, Meng D, Zhang L (2016) Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Transactions on Image Processing 25(2):864–877

    Article  MATH  Google Scholar 

  18. Locher A, Perdoch M, Gool LV (2016) Progressive prioritized multi-view stereo. In: IEEE Conference on computer vision and pattern recognition. pp 3244–3252

  19. Xu Q, Tao W (2019) Multi-scale geometric consistency guided multi-view stereo. In: IEEE Conference on computer vision and pattern recognition. pp 5483–5492

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition. pp 770–778

  21. Qian K, Tian L, Liu Y, Wen X, Bao J (2021) Image robust recognition based on feature-entropy-oriented differential fusion capsule network. Appl Intell 51(2):1108–1117

    Article  Google Scholar 

  22. Xie E, Ding j, Wang W, Zhan X, Xu H, Sun P, Li Z, Luo P (2021) Detco: unsupervised contrastive learning for object detection. In: IEEE International conference on computer vision. pp 8392–8401

  23. Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51(9):6400–6429

    Article  Google Scholar 

  24. Zhang X-L, Du B-C, Luo Z-C, Ma K (2021) Lightweight and efficient asymmetric network design for real-time semantic segmentation. Applied Intelligence. pp 1–16

  25. Hartmann W, Galliani S, Havlena M, Van Gool L, Schindler K (2017) Learned multi-patch similarity. In: IEEE International conference on computer vision. pp 1586–1594

  26. Kar A, Hane C (2017) Learning a multi-view stereo machine. In: Neural information processing systems. pp 365–376

  27. Ji M, Gall J, Zheng H, Liu Y, Fang L (2017) Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: IEEE International conference on computer vision. pp 2326–2334

  28. Ji M, Zhang J, Dai Q, Fang L (2020) surfacenet+: an end-to-end 3d neural network for very sparse multi-view stereopsis. IEEE Trans Pattern Anal Mach Intell 43(11):4078–4093

    Article  Google Scholar 

  29. Wei Z, Zhu Q, Min C, Chen Y, Wang G (2021) Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: IEEE International conference on computer vision. pp 6187– 6196

  30. Yu Z, Gao S (2020) Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: IEEE Conference on computer vision and pattern recognition. pp 1949–1958

  31. Cheng S, Xu Z, Zhu S, Li Z, Li LE, Ramamoorthi R, Su H (2020) Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE Conference on computer vision and pattern recognition. pp 2524–2534

  32. Xu Q, Tao W (2020) Learning inverse depth regression for multi-view stereo with correlation cost volume. In: National conference on artificial intelligence. vol 34, pp 12508–12515

  33. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE Conference on computer vision and pattern recognition. pp 5693–5703

  34. Zhang T, Qi G-J, Xiao B, Wang J (2017) Interleaved group convolutions. In: IEEE International conference on computer vision. pp 4383–4392

  35. Zhao L, Li M, Meng D, Li X, Zhang Z, Zhuang Y, Tu Z, Wang J (2018) Deep convolutional neural networks with merge-and-run mappings. In: International joint conference on artificial intelligence. pp 3170–3176

  36. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention. pp 234–241

  37. Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition. pp 936–944

  38. Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: IEEE Conference on computer vision and pattern recognition. pp 3273–3282

  39. Campbell ND, Vogiatzis G, Hernández C, Cipolla R (2008) Using multiple hypotheses to improve depth-maps for multi-view stereo. In: European conference on computer vision. pp 766–779

  40. Luo K, Guan T, Ju L, Huang H, Luo Y (2019) P-mvsnet: learning patch-wise matching confidence aggregation for multi-view stereo. In: IEEE International conference on computer vision. pp 10451–10460

  41. Li Y, Zhao Z, Fan J, Li W (2022) Adr-mvsnet: a novel cascade network for 3d point cloud reconstruction with pixel occlusion. Pattern recognition 108516

  42. Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: IEEE Conference on computer vision and pattern recognition. pp 4104–4113

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2020YFC1523100 and in part the National Natural Science Foundation of China under Grant 61877016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoping Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, Y., Li, L., Wang, D. et al. MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction. Appl Intell 53, 4289–4301 (2023). https://doi.org/10.1007/s10489-022-03754-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03754-3

Keywords

Navigation