Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo

Wang, Xiang; Bai, Xiao; Wang, Chen

doi:10.1007/978-3-031-46308-2_3

Xiang Wang¹⁴,
Xiao Bai¹⁴ &
Chen Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14356))

Included in the following conference series:

International Conference on Image and Graphics

588 Accesses

Abstract

The deep multi-view stereo (MVS) approaches generally construct 3D cost volumes to regularize and regress the depth map. These methods are limited with high-resolution outputs since the memory and time costs grow cubically as the volume resolution increases. In this paper, we presented an multi-stage iterative depth map estimation method for MVS. In our network, the cost volume is iteratively processed by lightweight 2D convolution based GRU modules, and the multi-stage coarse-to-fine structure is adopted to speed up the depth estimation process. To further improve the 3D reconstruction quality, we make improvements from two different perspectives of adaptive cost aggregation: a view-adaptive weighting module is proposed to account for the occlusion problem in cost volume fusion, and a spatial-adaptive deformable geometric feature encoding module is introduced to the cost volume feature encoding before feeding into GRUs for stronger modeling capability. Experiments on the DTU dataset demonstrated the effectiveness of the proposed network in accuracy with remarkable efficiency performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cost Volume Pyramid Network with Multi-strategies Range Searching for Multi-view Stereo

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

MFNet: Multi-level fusion aware feature pyramid based multi-view stereo network for 3D reconstruction

Article 07 June 2022

References

Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vision 120, 153–168 (2016)
Article MathSciNet Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Article Google Scholar
Chen, R., Han, S., Xu, J., Su, H.: Visibility-aware point-based multi-view stereo network. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3695–3708 (2020)
Article Google Scholar
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534 (2020)
Google Scholar
Collins, R.T.: A space-sweep approach to true multi-image matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 358–363. IEEE (1996)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Furukawa, Y., Ponce, J.: Carved visual hulls for image-based modeling. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 564–577. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_44
Chapter Google Scholar
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2009)
Article Google Scholar
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881 (2015)
Google Scholar
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504 (2020)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2307–2315 (2017)
Google Scholar
Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16263–16272 (2022)
Google Scholar
Li, Z., Wang, K., Zuo, W., Meng, D., Zhang, L.: Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans. Image Process. 25(2), 864–877 (2015)
Article MathSciNet MATH Google Scholar
Ma, Z., Teed, Z., Deng, J.: Multiview stereo with cascaded epipolar RAFT. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 734–750. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_42
Chapter Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Sinha, S.N., Mordohai, P., Pollefeys, M.: Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Google Scholar
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Chapter Google Scholar
Ulusoy, A.O., Black, M.J., Geiger, A.: Semantic multi-view stereo: jointly estimating objects and voxels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4531–4540. IEEE (2017)
Google Scholar
Wang, C., et al.: Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recogn. 124, 108498 (2022)
Article Google Scholar
Wang, F., Galliani, S., Vogel, C., Pollefeys, M.: IterMVS: iterative probability estimation for efficient multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8606–8615 (2022)
Google Scholar
Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchMatchNet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14203 (2021)
Google Scholar
Wang, S., Li, B., Dai, Y.: Efficient multi-view stereo by iterative dynamic cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8655–8664 (2022)
Google Scholar
Wang, X., et al.: Multi-view stereo in the deep learning era: a comprehensive review. Displays 70, 102102 (2021)
Article Google Scholar
Wei, Z., Zhu, Q., Min, C., Chen, Y., Wang, G.: AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6187–6196 (2021)
Google Scholar
Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1959–1968 (2020)
Google Scholar
Xu, Q., Tao, W.: Multi-scale geometric consistency guided multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5483–5492 (2019)
Google Scholar
Xu, Q., Tao, W.: Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12508–12515 (2020)
Google Scholar
Yan, J., et al.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 674–689. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_39
Chapter Google Scholar
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4877–4886 (2020)
Google Scholar
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)
Google Scholar
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5525–5534 (2019)
Google Scholar
Yao, Y., et al.: BlendedMVS: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1790–1799 (2020)
Google Scholar
Yi, H., et al.: Pyramid multi-view stereo net with self-adaptive view aggregation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 766–782. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_44
Chapter Google Scholar
Yu, Z., Gao, S.: Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1949–1958 (2020)
Google Scholar
Zhang, H., et al.: Deep learning-based 3D point cloud classification: a systematic survey and outlook. Displays 102456 (2023)
Google Scholar
Zhang, J., et al.: Revisiting domain generalized stereo matching networks from a feature consistency perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13001–13011 (2022)
Google Scholar
Zhang, J., Yao, Y., Li, S., Luo, Z., Fang, T.: Visibility-aware multi-view stereo network. In: The British Machine Vision Conference (2020)
Google Scholar
Zhang, P., et al.: Learning multi-view visual correspondences with self-supervision. Displays 72, 102160 (2022)
Article Google Scholar
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets V2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62276016).

Author information

Authors and Affiliations

School of Computer Science and Engineering, State Key Laboratory of Software Development Environment, Jiangxi Research Institute, Beihang University, Beijing, China
Xiang Wang, Xiao Bai & Chen Wang

Authors

Xiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Bai
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Bai .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Bai, X., Wang, C. (2023). Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-46308-2_3
Published: 30 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46307-5
Online ISBN: 978-3-031-46308-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo