Abstract
Recent studies have witnessed that many self-supervised methods obtain clear progress on the multi-view stereo (MVS). However, existing methods ignore the edge structure information of the reconstructed target, which includes the outer silhouette and the edge information of the internal structure. This may lead to less satisfactory edges and completeness of the reconstruction result. To solve this problem, we propose an extractor for extracting edge structure maps, and we innovatively design an edge structure Loss to constrain the network to pay more attention to edge structure features of the reference view to improve the texture details of the reconstruction results. Specially, we utilize the idea of constructing cost volume in multi-view stereo and warp the edge structure map of the source view to the reference view to provide reliable self-supervision. In addition, we design a masking mechanism that combines local and global properties, which ensures robustness and improves the reconstruction completeness of the model for complex samples. Furthermore, we adopt an effective parallel acceleration approach to improve the training speed and reconstruction efficiency. Extensive experiments on the DTU and Tanks &Temples benchmarks demonstrate that our method improves both accuracy and completeness in comparison with other unsupervised work. In addition, our parallel method improves efficiency while ensuring accuracy. The code will be published.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Khot, T., Agrawal, S., Tulsiani, S., Mertz, C., Lucey, S., Hebert, M.: Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv:abs/1905.02706 (2019)
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent mvsnet for high-resolution multi-view stereo depth inference, pp. 5520–5529 (2019)
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3D neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)
Xue, Y., et al.: MVSCRF: learning multi-view stereo with conditional random fields. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4311–4320 (2019)
Yu, Z., Gao, S.: Fast-mvsnet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1946–1955 (2020)
Zhong, Y., Li, H., Dai, Y.: Open-world stereo video matching with deep RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 104–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_7
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4877–4886 (2020)
Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534 (2020)
Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV, volume 2, pp. 508–515. IEEE (2001)
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)
Hirschmüller, H., Innocent, P.R., Garibaldi, J.: Real-time correlation-based stereo vision with reduced border errors. Int. J. Comput. Vis. 47, 229–246 (2002)
Min, C., Chen, Y., Wei, Z., Zhu, Q., Wang, G.: Aa-rmvsnet: adaptive aggregation recurrent multi-view stereo network. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6167–6176 (2021)
Lin, K., Li, L., Zhang, J., Zheng, X., Wu, S.: High-resolution multi-view stereo with dynamic depth edge flow. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021)
Zhou, Z., Qiao, Y., Kang, W., Wu, Q., Xu, H.: Self-supervised multi-view stereo via effective co-segmentation and data-augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, pp. 3030–3038 (2021)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Seung, H.S., Lee, D.: Algorithms for non-negative matrix factorization (2000)
Ding, X., He, C., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 606–610 (2005)
Canny, J.: A computational approach to edge detection. In: Fischler, M.A., Firschein, O. (eds.) Readings in Computer Vision, pp. 184–203. Morgan Kaufmann, San Francisco (CA) (1987)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Hovy, Z., Luong, E., Xie, M.-T., Dai, Q., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv (2019)
Norouzi, S., Chen, M., Kornblith, T., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv (2020)
Vogiatzis, R.R., Tola, G., Aanæs, E., Jensen, H., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 120, 153–168 (2016)
Zhou, J., Knapitsch, Q.-Y., Park, A., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM 36, 1–13 (2017)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis (2009)
Tola, E., Strecha, C., Fua, P.: Efficient large scale multi-view stereo for ultra high resolution image sets (2011)
Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_58
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: IEEE International Conference on Computer Vision (2015)
Rao, Z., Dai, Y., Zhu, Z., Li, B.: Mvs2: deep unsupervised multi-view stereo with multi-view symmetry. arXiv:abs/2203.14237:1–8 (2019)
Huang, C., He,Y., Liu, J., Huang, B., Yi, H., Liu, X.: M3vsnet: unsupervised multi-metric multi-view stereo network. In: IEEE International Conference on Image Processing (ICIP), pp. 3163–3167 (2021)
Chen, Q., Poullis, C.: End-to-end multi-view structure-from-motion with hypercorrelation volumes. arXiv preprint arXiv:2209.06926 (2022)
Acknowledgements
This work was supported by National Natural Science Foundation of China under Grant 62062056, and in part by the Ningxia Graduate Education and Teaching Reform Research and Practice Project 2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, P., Wu, S., Zhang, X., Peng, Y., Zhang, B., Wang, B. (2024). Self-supervised Edge Structure Learning for Multi-view Stereo and Parallel Optimization. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14556. Springer, Cham. https://doi.org/10.1007/978-3-031-53311-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-53311-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53310-5
Online ISBN: 978-3-031-53311-2
eBook Packages: Computer ScienceComputer Science (R0)