Skip to main content

Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14556))

Included in the following conference series:

  • 393 Accesses

Abstract

Recent studies have witnessed that many self-supervised methods obtain clear progress on the multi-view stereo (MVS). However, existing methods ignore the edge structure information of the reconstructed target, which includes the outer silhouette and the edge information of the internal structure. This may lead to less satisfactory edges and completeness of the reconstruction result. To solve this problem, we propose an extractor for extracting edge structure maps, and we innovatively design an edge structure Loss to constrain the network to pay more attention to edge structure features of the reference view to improve the texture details of the reconstruction results. Specially, we utilize the idea of constructing cost volume in multi-view stereo and warp the edge structure map of the source view to the reference view to provide reliable self-supervision. In addition, we design a masking mechanism that combines local and global properties, which ensures robustness and improves the reconstruction completeness of the model for complex samples. Furthermore, we adopt an effective parallel acceleration approach to improve the training speed and reconstruction efficiency. Extensive experiments on the DTU and Tanks &Temples benchmarks demonstrate that our method improves both accuracy and completeness in comparison with other unsupervised work. In addition, our parallel method improves efficiency while ensuring accuracy. The code will be published.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Khot, T., Agrawal, S., Tulsiani, S., Mertz, C., Lucey, S., Hebert, M.: Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv:abs/1905.02706 (2019)

  2. Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent mvsnet for high-resolution multi-view stereo depth inference, pp. 5520–5529 (2019)

    Google Scholar 

  3. Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)

    Google Scholar 

  4. Xue, Y., et al.: MVSCRF: learning multi-view stereo with conditional random fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4311–4320 (2019)

    Google Scholar 

  5. Yu, Z., Gao, S.: Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1946–1955 (2020)

    Google Scholar 

  6. Zhong, Y., Li, H., Dai, Y.: Open-world stereo video matching with deep RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 104–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_7

    Chapter  Google Scholar 

  7. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47

    Chapter  Google Scholar 

  8. Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4877–4886 (2020)

    Google Scholar 

  9. Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534 (2020)

    Google Scholar 

  10. Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV, volume 2, pp. 508–515. IEEE (2001)

    Google Scholar 

  11. Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)

    Google Scholar 

  12. Hirschmüller, H., Innocent, P.R., Garibaldi, J.: Real-time correlation-based stereo vision with reduced border errors. Int. J. Comput. Vis. 47, 229–246 (2002)

    Article  Google Scholar 

  13. Min, C., Chen, Y., Wei, Z., Zhu, Q., Wang, G.: Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6167–6176 (2021)

    Google Scholar 

  14. Lin, K., Li, L., Zhang, J., Zheng, X., Wu, S.: High-resolution multi-view stereo with dynamic depth edge flow. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021)

    Google Scholar 

  15. Zhou, Z., Qiao, Y., Kang, W., Wu, Q., Xu, H.: Self-supervised multi-view stereo via effective co-segmentation and data-augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3030–3038 (2021)

    Google Scholar 

  16. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49

    Chapter  Google Scholar 

  17. Seung, H.S., Lee, D.: Algorithms for non-negative matrix factorization (2000)

    Google Scholar 

  18. Ding, X., He, C., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 606–610 (2005)

    Google Scholar 

  19. Canny, J.: A computational approach to edge detection. In: Fischler, M.A., Firschein, O. (eds.) Readings in Computer Vision, pp. 184–203. Morgan Kaufmann, San Francisco (CA) (1987)

    Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  21. Hovy, Z., Luong, E., Xie, M.-T., Dai, Q., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv (2019)

    Google Scholar 

  22. Norouzi, S., Chen, M., Kornblith, T., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv (2020)

    Google Scholar 

  23. Vogiatzis, R.R., Tola, G., Aanæs, E., Jensen, H., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 120, 153–168 (2016)

    Article  MathSciNet  Google Scholar 

  24. Zhou, J., Knapitsch, Q.-Y., Park, A., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM 36, 1–13 (2017)

    Google Scholar 

  25. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis (2009)

    Google Scholar 

  26. Tola, E., Strecha, C., Fua, P.: Efficient large scale multi-view stereo for ultra high resolution image sets (2011)

    Google Scholar 

  27. Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58

    Chapter  Google Scholar 

  28. Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE International Conference on Computer Vision (2015)

    Google Scholar 

  29. Rao, Z., Dai, Y., Zhu, Z., Li, B.: Mvs2: deep unsupervised multi-view stereo with multi-view symmetry. arXiv:abs/2203.14237:1–8 (2019)

  30. Huang, C., He,Y., Liu, J., Huang, B., Yi, H., Liu, X.: M3vsnet: unsupervised multi-metric multi-view stereo network. In: IEEE International Conference on Image Processing (ICIP), pp. 3163–3167 (2021)

    Google Scholar 

  31. Chen, Q., Poullis, C.: End-to-end multi-view structure-from-motion with hypercorrelation volumes. arXiv preprint arXiv:2209.06926 (2022)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant 62062056, and in part by the Ningxia Graduate Education and Teaching Reform Research and Practice Project 2021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suping Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, P., Wu, S., Zhang, X., Peng, Y., Zhang, B., Wang, B. (2024). Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14556. Springer, Cham. https://doi.org/10.1007/978-3-031-53311-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53311-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53310-5

  • Online ISBN: 978-3-031-53311-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics