Abstract
This paper proposes PSP-MVSNet for depth inference problem in multi-view stereo (MVS). We first introduce a novel patch-based similarity perceptual (PSP) module for effectively constructing 3D cost volume. Unlike previous methods that leverage variance-based operators to fuse feature volumes of different views, our method leverages a cosine similarity measure to calculate matching scores for pairs of deep feature vectors and then treats these scores as weights for constructing the 3D cost volume. This is based on an important observation that many performance degradation factors, e.g., illumination changes or occlusions, will lead to pixel differences between multi-view images. We demonstrate that a patch-based cosine similarity can be used as explicit supervision for feature learning and can help speed up convergence. Furthermore, To adaptively set different depth ranges for different pixels, we extend an existing dynamic depth range searching method with a simple yet effective improvement. We can use this improved searching method to train our model in an end-to-end manner and further improve the performance of our method. Experimental results show that our method achieves state-of-the-art performance on the DTU dataset and comparative results on the intermediate set of Tanks and Temples dataset.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis 120(2), 153–168 (2016). https://doi.org/10.1007/s11263-016-0902-9
Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: ICCV, pp. 1538–1547 (2019)
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: CVPR, pp. 2524–2534 (2020)
Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends Comput. Graph. Vis 9(1–2), 1–148 (2015)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. TPAMI 32(8), 1362–1376 (2010)
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: ICCV, pp. 873–881 (2015)
Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. In: CVPR. vol. 2, pp. 2402–2409 (2006)
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: CVPR, pp. 2495–2504 (2020)
Hannah, M.J.: Computer matching of areas in stereo images. Ph.D. thesis (1974)
Kanade, T., Yoshida, A., Oda, K., Kano, H., Tanaka, M.: A stereo machine for video-rate dense depth mapping and its new applications. In: CVPR, pp. 196–202 (1996)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Tanks and temples: Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Benchmarking large-scale scene reconstruction. In: TOG. vol. 36, pp. 2651–2660 (2017)
Luo, K., Guan, T., Ju, L., Huang, H., Luo, Y.: P-mvsnet: learning patch-wise matching confidence aggregation for multi-view stereo. In: ICCV, pp. 10452–10461 (2019)
Luo, K., Guan, T., Ju, L., Wang, Y., Chen, Z., Luo, Y.: Attention-aware multi-view stereo. In: CVPR, pp. 1590–1599 (2020)
Paszke, A., et al.: Automatic differentiation in pytorch. In: NeurIPS Workshop (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. vol. 9351, pp. 234–241 (2015)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1–3), 7–42 (2002)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23(5), 903–920 (2012). https://doi.org/10.1007/s00138-011-0346-8
Wang, Y., Guan, T., Chen, Z., Luo, Y., Luo, K., Ju, L.: Mesh-guided multi-view stereo with pyramid architecture. In: CVPR, pp. 2039–2048 (2020)
Xu, Q., Tao, W.: Multi-scale geometric consistency guided multi-view stereo. In: CVPR, pp. 5483–5492 (2019)
Yan, J., Wei, Z., Yi, H., Ding, M., Zhang, R., Chen, Y., Wang, G., Tai, Y.-W.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 674–689. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_39
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: CVPR, pp. 4877–4886 (2020)
Yao, Y., Luo, Z., Li, S., Shen, T., Quan, T.F.L.: Recurrent mvsnet for high-resolution multiview stereo depth inference. In: CVPR, pp. 5525–5534 (2019)
Yao, Y., Luo, Z., Li, S., Tian, F., Long, Q.: Mvsnet: depth inference for unstructured multi-view stereo. In: ECCV, pp. 767–783 (2018)
Yu, Z., Gao, S.: Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: CVPR, pp. 1949–1958 (2020)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV, pp. 818–833 (2014)
Acknowledgement
This work was supported by the National Natural Science Foundation of China (62076029), the Natural Science Foundation of Guangdong Province (2022B1212010006, 2017A030313362) and internal funds of the United International College (R202012, R201802, UICR0400025-21).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jie, L., Zhang, H. (2022). PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13529. Springer, Cham. https://doi.org/10.1007/978-3-031-15919-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-15919-0_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15918-3
Online ISBN: 978-3-031-15919-0
eBook Packages: Computer ScienceComputer Science (R0)