Skip to main content

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14356))

Included in the following conference series:

  • 588 Accesses

Abstract

The deep multi-view stereo (MVS) approaches generally construct 3D cost volumes to regularize and regress the depth map. These methods are limited with high-resolution outputs since the memory and time costs grow cubically as the volume resolution increases. In this paper, we presented an multi-stage iterative depth map estimation method for MVS. In our network, the cost volume is iteratively processed by lightweight 2D convolution based GRU modules, and the multi-stage coarse-to-fine structure is adopted to speed up the depth estimation process. To further improve the 3D reconstruction quality, we make improvements from two different perspectives of adaptive cost aggregation: a view-adaptive weighting module is proposed to account for the occlusion problem in cost volume fusion, and a spatial-adaptive deformable geometric feature encoding module is introduced to the cost volume feature encoding before feeding into GRUs for stronger modeling capability. Experiments on the DTU dataset demonstrated the effectiveness of the proposed network in accuracy with remarkable efficiency performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vision 120, 153–168 (2016)

    Article  MathSciNet  Google Scholar 

  2. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)

    Article  Google Scholar 

  3. Chen, R., Han, S., Xu, J., Su, H.: Visibility-aware point-based multi-view stereo network. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3695–3708 (2020)

    Article  Google Scholar 

  4. Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534 (2020)

    Google Scholar 

  5. Collins, R.T.: A space-sweep approach to true multi-image matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 358–363. IEEE (1996)

    Google Scholar 

  6. Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 764–773 (2017)

    Google Scholar 

  7. Furukawa, Y., Ponce, J.: Carved visual hulls for image-based modeling. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 564–577. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_44

    Chapter  Google Scholar 

  8. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2009)

    Article  Google Scholar 

  9. Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881 (2015)

    Google Scholar 

  10. Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504 (2020)

    Google Scholar 

  11. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  12. Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2307–2315 (2017)

    Google Scholar 

  13. Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16263–16272 (2022)

    Google Scholar 

  14. Li, Z., Wang, K., Zuo, W., Meng, D., Zhang, L.: Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans. Image Process. 25(2), 864–877 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ma, Z., Teed, Z., Deng, J.: Multiview stereo with cascaded epipolar RAFT. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 734–750. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_42

    Chapter  Google Scholar 

  16. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

  17. Sinha, S.N., Mordohai, P., Pollefeys, M.: Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–8. IEEE (2007)

    Google Scholar 

  18. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  19. Ulusoy, A.O., Black, M.J., Geiger, A.: Semantic multi-view stereo: jointly estimating objects and voxels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4531–4540. IEEE (2017)

    Google Scholar 

  20. Wang, C., et al.: Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recogn. 124, 108498 (2022)

    Article  Google Scholar 

  21. Wang, F., Galliani, S., Vogel, C., Pollefeys, M.: IterMVS: iterative probability estimation for efficient multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8606–8615 (2022)

    Google Scholar 

  22. Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchMatchNet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14203 (2021)

    Google Scholar 

  23. Wang, S., Li, B., Dai, Y.: Efficient multi-view stereo by iterative dynamic cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8655–8664 (2022)

    Google Scholar 

  24. Wang, X., et al.: Multi-view stereo in the deep learning era: a comprehensive review. Displays 70, 102102 (2021)

    Article  Google Scholar 

  25. Wei, Z., Zhu, Q., Min, C., Chen, Y., Wang, G.: AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6187–6196 (2021)

    Google Scholar 

  26. Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1959–1968 (2020)

    Google Scholar 

  27. Xu, Q., Tao, W.: Multi-scale geometric consistency guided multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5483–5492 (2019)

    Google Scholar 

  28. Xu, Q., Tao, W.: Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12508–12515 (2020)

    Google Scholar 

  29. Yan, J., et al.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 674–689. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_39

    Chapter  Google Scholar 

  30. Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4877–4886 (2020)

    Google Scholar 

  31. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)

    Google Scholar 

  32. Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5525–5534 (2019)

    Google Scholar 

  33. Yao, Y., et al.: BlendedMVS: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1790–1799 (2020)

    Google Scholar 

  34. Yi, H., et al.: Pyramid multi-view stereo net with self-adaptive view aggregation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 766–782. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_44

    Chapter  Google Scholar 

  35. Yu, Z., Gao, S.: Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1949–1958 (2020)

    Google Scholar 

  36. Zhang, H., et al.: Deep learning-based 3D point cloud classification: a systematic survey and outlook. Displays 102456 (2023)

    Google Scholar 

  37. Zhang, J., et al.: Revisiting domain generalized stereo matching networks from a feature consistency perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13001–13011 (2022)

    Google Scholar 

  38. Zhang, J., Yao, Y., Li, S., Luo, Z., Fang, T.: Visibility-aware multi-view stereo network. In: The British Machine Vision Conference (2020)

    Google Scholar 

  39. Zhang, P., et al.: Learning multi-view visual correspondences with self-supervision. Displays 72, 102160 (2022)

    Article  Google Scholar 

  40. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets V2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62276016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Bai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X., Bai, X., Wang, C. (2023). Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46308-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46307-5

  • Online ISBN: 978-3-031-46308-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics