Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization

Li, Pan; Wu, Suping; Zhang, Xitie; Peng, Yuxin; Zhang, Boyang; Wang, Bin

doi:10.1007/978-3-031-53311-2_33

Pan Li¹⁴,
Suping Wu¹⁴,
Xitie Zhang¹⁴,
Yuxin Peng¹⁴,
Boyang Zhang¹⁴ &
…
Bin Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14556))

Included in the following conference series:

International Conference on Multimedia Modeling

393 Accesses

Abstract

Recent studies have witnessed that many self-supervised methods obtain clear progress on the multi-view stereo (MVS). However, existing methods ignore the edge structure information of the reconstructed target, which includes the outer silhouette and the edge information of the internal structure. This may lead to less satisfactory edges and completeness of the reconstruction result. To solve this problem, we propose an extractor for extracting edge structure maps, and we innovatively design an edge structure Loss to constrain the network to pay more attention to edge structure features of the reference view to improve the texture details of the reconstruction results. Specially, we utilize the idea of constructing cost volume in multi-view stereo and warp the edge structure map of the source view to the reference view to provide reliable self-supervision. In addition, we design a masking mechanism that combines local and global properties, which ensures robustness and improves the reconstruction completeness of the model for complex samples. Furthermore, we adopt an effective parallel acceleration approach to improve the training speed and reconstruction efficiency. Extensive experiments on the DTU and Tanks &Temples benchmarks demonstrate that our method improves both accuracy and completeness in comparison with other unsupervised work. In addition, our parallel method improves efficiency while ensuring accuracy. The code will be published.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Khot, T., Agrawal, S., Tulsiani, S., Mertz, C., Lucey, S., Hebert, M.: Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv:abs/1905.02706 (2019)
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent mvsnet for high-resolution multi-view stereo depth inference, pp. 5520–5529 (2019)
Google Scholar
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)
Google Scholar
Xue, Y., et al.: MVSCRF: learning multi-view stereo with conditional random fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4311–4320 (2019)
Google Scholar
Yu, Z., Gao, S.: Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1946–1955 (2020)
Google Scholar
Zhong, Y., Li, H., Dai, Y.: Open-world stereo video matching with deep RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 104–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_7
Chapter Google Scholar
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47
Chapter Google Scholar
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4877–4886 (2020)
Google Scholar
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534 (2020)
Google Scholar
Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV, volume 2, pp. 508–515. IEEE (2001)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)
Google Scholar
Hirschmüller, H., Innocent, P.R., Garibaldi, J.: Real-time correlation-based stereo vision with reduced border errors. Int. J. Comput. Vis. 47, 229–246 (2002)
Article Google Scholar
Min, C., Chen, Y., Wei, Z., Zhu, Q., Wang, G.: Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6167–6176 (2021)
Google Scholar
Lin, K., Li, L., Zhang, J., Zheng, X., Wu, S.: High-resolution multi-view stereo with dynamic depth edge flow. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021)
Google Scholar
Zhou, Z., Qiao, Y., Kang, W., Wu, Q., Xu, H.: Self-supervised multi-view stereo via effective co-segmentation and data-augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3030–3038 (2021)
Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Seung, H.S., Lee, D.: Algorithms for non-negative matrix factorization (2000)
Google Scholar
Ding, X., He, C., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 606–610 (2005)
Google Scholar
Canny, J.: A computational approach to edge detection. In: Fischler, M.A., Firschein, O. (eds.) Readings in Computer Vision, pp. 184–203. Morgan Kaufmann, San Francisco (CA) (1987)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Hovy, Z., Luong, E., Xie, M.-T., Dai, Q., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv (2019)
Google Scholar
Norouzi, S., Chen, M., Kornblith, T., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv (2020)
Google Scholar
Vogiatzis, R.R., Tola, G., Aanæs, E., Jensen, H., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 120, 153–168 (2016)
Article MathSciNet Google Scholar
Zhou, J., Knapitsch, Q.-Y., Park, A., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM 36, 1–13 (2017)
Google Scholar
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis (2009)
Google Scholar
Tola, E., Strecha, C., Fua, P.: Efficient large scale multi-view stereo for ultra high resolution image sets (2011)
Google Scholar
Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58
Chapter Google Scholar
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE International Conference on Computer Vision (2015)
Google Scholar
Rao, Z., Dai, Y., Zhu, Z., Li, B.: Mvs2: deep unsupervised multi-view stereo with multi-view symmetry. arXiv:abs/2203.14237:1–8 (2019)
Huang, C., He,Y., Liu, J., Huang, B., Yi, H., Liu, X.: M3vsnet: unsupervised multi-metric multi-view stereo network. In: IEEE International Conference on Image Processing (ICIP), pp. 3163–3167 (2021)
Google Scholar
Chen, Q., Poullis, C.: End-to-end multi-view structure-from-motion with hypercorrelation volumes. arXiv preprint arXiv:2209.06926 (2022)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant 62062056, and in part by the Ningxia Graduate Education and Teaching Reform Research and Practice Project 2021.

Author information

Authors and Affiliations

Ningxia University, Ningxia, 750000, China
Pan Li, Suping Wu, Xitie Zhang, Yuxin Peng, Boyang Zhang & Bin Wang

Authors

Pan Li
View author publications
You can also search for this author in PubMed Google Scholar
Suping Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xitie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Peng
View author publications
You can also search for this author in PubMed Google Scholar
Boyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suping Wu .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, P., Wu, S., Zhang, X., Peng, Y., Zhang, B., Wang, B. (2024). Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14556. Springer, Cham. https://doi.org/10.1007/978-3-031-53311-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-53311-2_33
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53310-5
Online ISBN: 978-3-031-53311-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization