Abstract
The deep multi-view stereo (MVS) approaches generally construct 3D cost volumes to regularize and regress the depth map. These methods are limited with high-resolution outputs since the memory and time costs grow cubically as the volume resolution increases. In this paper, we presented an multi-stage iterative depth map estimation method for MVS. In our network, the cost volume is iteratively processed by lightweight 2D convolution based GRU modules, and the multi-stage coarse-to-fine structure is adopted to speed up the depth estimation process. To further improve the 3D reconstruction quality, we make improvements from two different perspectives of adaptive cost aggregation: a view-adaptive weighting module is proposed to account for the occlusion problem in cost volume fusion, and a spatial-adaptive deformable geometric feature encoding module is introduced to the cost volume feature encoding before feeding into GRUs for stronger modeling capability. Experiments on the DTU dataset demonstrated the effectiveness of the proposed network in accuracy with remarkable efficiency performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vision 120, 153–168 (2016)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Chen, R., Han, S., Xu, J., Su, H.: Visibility-aware point-based multi-view stereo network. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3695–3708 (2020)
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534 (2020)
Collins, R.T.: A space-sweep approach to true multi-image matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 358–363. IEEE (1996)
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 764–773 (2017)
Furukawa, Y., Ponce, J.: Carved visual hulls for image-based modeling. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 564–577. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_44
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2009)
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881 (2015)
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3d neural network for multiview stereopsis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2307–2315 (2017)
Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16263–16272 (2022)
Li, Z., Wang, K., Zuo, W., Meng, D., Zhang, L.: Detail-preserving and content-aware variational multi-view stereo reconstruction. IEEE Trans. Image Process. 25(2), 864–877 (2015)
Ma, Z., Teed, Z., Deng, J.: Multiview stereo with cascaded epipolar RAFT. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 734–750. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_42
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Sinha, S.N., Mordohai, P., Pollefeys, M.: Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Ulusoy, A.O., Black, M.J., Geiger, A.: Semantic multi-view stereo: jointly estimating objects and voxels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4531–4540. IEEE (2017)
Wang, C., et al.: Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recogn. 124, 108498 (2022)
Wang, F., Galliani, S., Vogel, C., Pollefeys, M.: IterMVS: iterative probability estimation for efficient multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8606–8615 (2022)
Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchMatchNet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14203 (2021)
Wang, S., Li, B., Dai, Y.: Efficient multi-view stereo by iterative dynamic cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8655–8664 (2022)
Wang, X., et al.: Multi-view stereo in the deep learning era: a comprehensive review. Displays 70, 102102 (2021)
Wei, Z., Zhu, Q., Min, C., Chen, Y., Wang, G.: AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6187–6196 (2021)
Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1959–1968 (2020)
Xu, Q., Tao, W.: Multi-scale geometric consistency guided multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5483–5492 (2019)
Xu, Q., Tao, W.: Learning inverse depth regression for multi-view stereo with correlation cost volume. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12508–12515 (2020)
Yan, J., et al.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 674–689. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_39
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4877–4886 (2020)
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent MVSNet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5525–5534 (2019)
Yao, Y., et al.: BlendedMVS: a large-scale dataset for generalized multi-view stereo networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1790–1799 (2020)
Yi, H., et al.: Pyramid multi-view stereo net with self-adaptive view aggregation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 766–782. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_44
Yu, Z., Gao, S.: Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1949–1958 (2020)
Zhang, H., et al.: Deep learning-based 3D point cloud classification: a systematic survey and outlook. Displays 102456 (2023)
Zhang, J., et al.: Revisiting domain generalized stereo matching networks from a feature consistency perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13001–13011 (2022)
Zhang, J., Yao, Y., Li, S., Luo, Z., Fang, T.: Visibility-aware multi-view stereo network. In: The British Machine Vision Conference (2020)
Zhang, P., et al.: Learning multi-view visual correspondences with self-supervision. Displays 72, 102160 (2022)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets V2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Acknowledgement
This work is supported by the National Natural Science Foundation of China (No. 62276016).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X., Bai, X., Wang, C. (2023). Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-46308-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46307-5
Online ISBN: 978-3-031-46308-2
eBook Packages: Computer ScienceComputer Science (R0)