PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

Jie, Leiping; Zhang, Hui

doi:10.1007/978-3-031-15919-0_27

PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference

Conference paper
First Online: 07 September 2022

2352 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13529))

Abstract

This paper proposes PSP-MVSNet for depth inference problem in multi-view stereo (MVS). We first introduce a novel patch-based similarity perceptual (PSP) module for effectively constructing 3D cost volume. Unlike previous methods that leverage variance-based operators to fuse feature volumes of different views, our method leverages a cosine similarity measure to calculate matching scores for pairs of deep feature vectors and then treats these scores as weights for constructing the 3D cost volume. This is based on an important observation that many performance degradation factors, e.g., illumination changes or occlusions, will lead to pixel differences between multi-view images. We demonstrate that a patch-based cosine similarity can be used as explicit supervision for feature learning and can help speed up convergence. Furthermore, To adaptively set different depth ranges for different pixels, we extend an existing dynamic depth range searching method with a simple yet effective improvement. We can use this improved searching method to train our model in an end-to-end manner and further improve the performance of our method. Experimental results show that our method achieves state-of-the-art performance on the DTU dataset and comparative results on the intermediate set of Tanks and Temples dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis 120(2), 153–168 (2016). https://doi.org/10.1007/s11263-016-0902-9
Article MathSciNet Google Scholar
Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58
Chapter Google Scholar
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: ICCV, pp. 1538–1547 (2019)
Google Scholar
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: CVPR, pp. 2524–2534 (2020)
Google Scholar
Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends Comput. Graph. Vis 9(1–2), 1–148 (2015)
Article Google Scholar
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. TPAMI 32(8), 1362–1376 (2010)
Article Google Scholar
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: ICCV, pp. 873–881 (2015)
Google Scholar
Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. In: CVPR. vol. 2, pp. 2402–2409 (2006)
Google Scholar
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: CVPR, pp. 2495–2504 (2020)
Google Scholar
Hannah, M.J.: Computer matching of areas in stereo images. Ph.D. thesis (1974)
Google Scholar
Kanade, T., Yoshida, A., Oda, K., Kano, H., Tanaka, M.: A stereo machine for video-rate dense depth mapping and its new applications. In: CVPR, pp. 196–202 (1996)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Tanks and temples: Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Benchmarking large-scale scene reconstruction. In: TOG. vol. 36, pp. 2651–2660 (2017)
Google Scholar
Luo, K., Guan, T., Ju, L., Huang, H., Luo, Y.: P-mvsnet: learning patch-wise matching confidence aggregation for multi-view stereo. In: ICCV, pp. 10452–10461 (2019)
Google Scholar
Luo, K., Guan, T., Ju, L., Wang, Y., Chen, Z., Luo, Y.: Attention-aware multi-view stereo. In: CVPR, pp. 1590–1599 (2020)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NeurIPS Workshop (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. vol. 9351, pp. 234–241 (2015)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1–3), 7–42 (2002)
Article MATH Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23(5), 903–920 (2012). https://doi.org/10.1007/s00138-011-0346-8
Article Google Scholar
Wang, Y., Guan, T., Chen, Z., Luo, Y., Luo, K., Ju, L.: Mesh-guided multi-view stereo with pyramid architecture. In: CVPR, pp. 2039–2048 (2020)
Google Scholar
Xu, Q., Tao, W.: Multi-scale geometric consistency guided multi-view stereo. In: CVPR, pp. 5483–5492 (2019)
Google Scholar
Yan, J., Wei, Z., Yi, H., Ding, M., Zhang, R., Chen, Y., Wang, G., Tai, Y.-W.: Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 674–689. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_39
Chapter Google Scholar
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: CVPR, pp. 4877–4886 (2020)
Google Scholar
Yao, Y., Luo, Z., Li, S., Shen, T., Quan, T.F.L.: Recurrent mvsnet for high-resolution multiview stereo depth inference. In: CVPR, pp. 5525–5534 (2019)
Google Scholar
Yao, Y., Luo, Z., Li, S., Tian, F., Long, Q.: Mvsnet: depth inference for unstructured multi-view stereo. In: ECCV, pp. 767–783 (2018)
Google Scholar
Yu, Z., Gao, S.: Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: CVPR, pp. 1949–1958 (2020)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV, pp. 818–833 (2014)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (62076029), the Natural Science Foundation of Guangdong Province (2022B1212010006, 2017A030313362) and internal funds of the United International College (R202012, R201802, UICR0400025-21).

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China
Leiping Jie
Guangdong Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, China
Leiping Jie & Hui Zhang

Authors

Leiping Jie
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zhang .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teesside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jie, L., Zhang, H. (2022). PSP-MVSNet: Deep Patch-Based Similarity Perceptual for Multi-view Stereo Depth Inference. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13529. Springer, Cham. https://doi.org/10.1007/978-3-031-15919-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-15919-0_27
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15918-3
Online ISBN: 978-3-031-15919-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics