Skip to main content
Log in

Robust 3D reconstruction from uncalibrated small motion clips

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Small motion can be induced from burst video clips captured by a handheld camera when the shutter button is pressed. Although uncalibrated burst video clip conveys valuable parallax information, it generally has small baseline between frames, making it difficult to reconstruct 3D scenes. Existing methods usually employ a simplified camera parameterization process with keypoint-based structure from small motion (SFSM), followed by a tailored dense reconstruction. However, such SFSM methods are sensitive to insufficient or unreliable keypoint features, and the subsequent dense reconstruction may fail to recover the detailed surface. In this paper, we propose a robust 3D reconstruction pipeline by leveraging both keypoint and line segment features from video clips to alleviate the uncertainty induced by small baseline. A joint feature-based structure from small motion method is first presented to improve the robustness of the self-calibration with line segment constraints, and then, a noise-aware PatchMatch stereo module is proposed to improve the accuracy of the dense reconstruction. Finally, a confidence weighted fusion process is utilized to further suppress depth noise and mitigate erroneous depth. The proposed method can reduce the failure cases of self-calibration when the keypoints are insufficient, while recovering the detailed 3D surfaces. In comparison with state of the arts, our method achieves more robust and accurate 3D reconstruction results for a variety of challenging scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The Zed camera is a binocular camera, and we use the output of the left camera in our experiments.

References

  1. https://support.apple.com/en-us/HT207310/

  2. https://colmap.github.io/

  3. https://www.blender.org/

  4. Agrawal, S., Pahuja, A., Lucey, S.: High accuracy face geometry capture using a smartphone video. In: 2020 Winter Conference on Applications of Computer Vision. pp. 1–6 (2020)

  5. Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo - stereo matching with slanted support windows. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition. pp. 14.1–14.11 (2011)

  6. Cui, Z., Gu, J., Shi, B., Tan, P., Kautz, J.: Polarimetric multi-view stereo. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1558–1567 (2018)

  7. Fuhrmann, S., Langguth, F., Goesele, M.: Mve - a multi-view reconstruction environment. In: 2014 EUROGRAPHICS Workshops on Graphics and Cultural Heritage. pp. 1–8 (2014)

  8. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)

    Article  Google Scholar 

  9. Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: 2015 IEEE International Conference on Computer Vision. pp. 873–881 (2015)

  10. Gallup, D., Frahm, J.M., Mordohai, P., Pollefeys, M.: Variable baseline/resolution stereo. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2008)

  11. Gioi, R., Jakubowicz, J., Morel, J.M., Randall, G.: Lsd: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 722–732 (2010)

    Article  Google Scholar 

  12. Ha, H., Im, S., Park, J., Jeon, H., Kweon, I.: High-quality depth from uncalibrated small motion clip. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. pp. 5413–5421 (2016)

  13. Ham, C., Chang, M., Lucey, S., Singh, S.: Monocular depth from small motion video accelerated. In: 2017 IEEE Conference on 3D Vision. pp. 575–583 (2016)

  14. Hane, C., Zach, C., Cohen, A., Pollefeys, M.: Dense semantic 3d reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1730–1743 (2017)

    Article  Google Scholar 

  15. Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conf. pp. 147–151 (1988)

  16. Hazzat, S.E., Merras, M., Akkad, N.E., Saaidi, A., Satori, K.: 3d reconstruction system based on incremental structure from motion using a camera with varying parameters. Vis. Comput. 34(1), 1443–1460 (2018)

    Google Scholar 

  17. Hedman, P., Kopf, J.: Instant 3d photography. ACM Trans. Gr. 37(4), 1–12 (2018)

    Article  Google Scholar 

  18. Heise, P., Klose, S., Jensen, B., Knoll, A.: Pm-huber: Patchmatch with huber regularization for stereo matching. In: 2013 IEEE International Conference on Computer Vision. pp. 2360–2367 (2013)

  19. Heo, Y., Lee, K., Lee, S.: Robust stereo matching using adaptive normalized cross-correlation. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 807–822 (2011)

    Article  Google Scholar 

  20. Hofer, M., Maurer, M., Bischof, H.: Efficient 3d scene abstraction using line segments. Comput. Vis. Image Understand. 157(1), 1–12 (2017)

    Google Scholar 

  21. Honauer, K., Johannsen, O., Kondermann, D., Goldluecke, B.: A dataset and evaluation methodology for depth estimation on 4d light fields. In: 2016 Asian Conference on Computer Vision (ACCV). pp. 1–16 (2016)

  22. Huang, J., Chen, Z., Ceylan, D., Jin, H.: 6-dof vr videos with a single 360-camera. In: 2017 IEEE Virtual Reality. pp. 37–44 (2017)

  23. Im, S., Ha, H., Choe, G., Jeon, H., Joo, K., Kweon, I.: Accurate 3d reconstruction from small motion clip for rolling shutter cameras. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 775–787 (2019)

    Article  Google Scholar 

  24. Im, S., Ha, H., Choe, G., Jeon, H.G., Joo, K., Kweon, I.S.: High quality structure from small motion for rolling shutter cameras. In: 2015 IEEE International Conference on Computer Vision. pp. 837–845 (2015)

  25. Im, S., Ha, H., Jeon, H.G., Lin, S., Kweon, I.S.: Deep depth from uncalibrated small motion clip. IEEE Trans. Pattern Anal. Mach. Intell. 1(1), 1–14 (2019)

    Google Scholar 

  26. Im, S., Ha, H., Rameau, F., Jeon, H., Choe, G., Kweon, I.S.: All-around depth from small motion with a spherical panoramic camera. In: 2016 European Conference on Computer Vision. pp. 156–172 (2016)

  27. Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3121–3128 (2011)

  28. Jenseny, R., Dahly, A., Vogiatzisz, G., Tolax, E., Aanæs, H.: Large scale multi-view stereopsis evaluation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2014)

  29. Khan, M.J., Curry, E.: Neuro-symbolic visual reasoning for multimedia event processing: Overview, prospects and challenges. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM’2020) Workshops. pp. 1–6 (2020)

  30. Khan, M.J., Khan, H.S., Yousaf, A., Khurshid, K., Abbas, A.: Modern trends in hyperspectral image analysis: A review. IEEE ACCESS 6(1), 14118–14129 (2018)

    Article  Google Scholar 

  31. Kuhn, A., Hirschmüller, H., Scharstein, D., Mayer, H.: A tv prior for high-quality scalable multi-view stereo reconstruction. Int. J. Comput. Vis. 124(1), 2–17 (2017)

    Article  MathSciNet  Google Scholar 

  32. Li, S., Yuan, L., Sun, J., Quan, L.: Dual-feature warping-based motion model estimation. In: 2015 IEEE International Conference on Computer Vision. pp. 4283–4291 (2015)

  33. Lin, S., Zhang, J., Chen, J., Wang, Y., Liu, Y., Ren, J.: Cross-spectral stereo matching for facial disparity estimation in the dark. Comput. Vis. Image Understand. 200(1), 1–10 (2020)

    Google Scholar 

  34. Liu, S., Li, M., Zhang, X., Liu, S., Li, Z., Liu, J., Mao, T.: Image-based rendering for large-scale outdoor scenes with fusion of monocular and multi-view stereo depth. IEEE Access 1(1), 117551–117565 (2020)

    Article  Google Scholar 

  35. Micusik, B., Wildenauer, H.: Structure from motion with line segments under relaxed endpoint constraints. In: 2014 International Conference on 3D Vision (3DV). pp. 13–19 (2014)

  36. Nurutdinova, I., Fitzgibbon, A.: Towards pointless structure from motion: 3d reconstruction and camera parameters from general 3d curves. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1–8 (2015)

  37. Ramalingam, S., Antunes, M., Snow, D., Lee, G.H., Pillai, S.: Line-sweep: Cross-ratio forwide-baseline matching and 3d reconstruction. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1238–1246 (2015)

  38. Santana-Cedrés, D., Gomez, L., Alemán-Flores, M., Salgado, A., Esclarín, J., Mazorra, L., Alvarez, L.: Invertibility and estimation of two-parameter polynomial and division lens distortion models. SIAM J. Imaging Sci. 8(3), 1574–1606 (2015)

    Article  MathSciNet  Google Scholar 

  39. Schilling, H., Diebold, M., Rother, C., Jähne, B.: Trust your model: Light field depth estimation with inline occlusion handling. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 4530–4538 (2018)

  40. Schindler, G., Krishnamurthy, P., Dellaert, F.: Line-based structure from motion for urban environments. In: Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06). pp. 1–8 (2006)

  41. Schönberger, Lutz, J., Frahm, J.M.: Structure-from-motion revisited. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1 – 10 (2016)

  42. Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: 2016 European Conference on Computer Vision. pp. 1–17 (2016)

  43. Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2017)

  44. Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2008)

    Article  Google Scholar 

  45. Song, P., Wu, X., Wang, M.: Volumetric stereo and silhouette fusion for image-based modeling. Vis. Comput. 26(12), 1435–1450 (2010)

    Article  Google Scholar 

  46. Strecha, C., von Hansen, W., Gool, L.V., Fua, P., Thoennessen, U.: On benchmarking camera calibration and multiview stereo for high resolution imagery. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2008)

  47. Sugiura, T., Torii, A., Okutomi, M.: 3d surface reconstruction from point-and-line cloud. In: 2015 International Conference on 3D Vision (3DV). pp. 264–272 (2015)

  48. Suwajanakorn, S., Hernandez, C., Seitz, S.M.: Depth from focus with your mobile phone. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3497–3506 (2015)

  49. Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23(5), 903–920 (2012)

    Article  Google Scholar 

  50. Tomasi, C., Kanade, T.: Detection and tracking of point features. Carnegie Mellon University Technical report CMU-CS-91-132

  51. Vo, M., Sheikh, Y., Narasimhan, S.G.: Spatiotemporal bundle adjustment for dynamic 3d human reconstruction in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 1(1), 1710–1718 (2016)

    Google Scholar 

  52. Wang, T.C., Efros, A.A., Ramamoorthi, R.: Occlusion-aware depth estimation using light-field cameras. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 3487–3495 (2015)

  53. Wei, M., Yan, Q., Luo, F., Song, C., Xiao, C.: Joint bilateral propagation upsampling for unstructured multi-view stereo. Vis. Comput. 35(1), 797–809 (2019)

    Article  Google Scholar 

  54. Xu, Q., Tao, W.: Multi-view stereo with asymmetric checkerboard propagation and multi-hypothesis joint view selection (2018), arXiv:1805.07920v1

  55. Xu, Q., Tao, W.: Multi-scale geometric consistency guided multiview stereo. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 5483–5492 (2019)

  56. Yang, Q.: A non-local cost aggregation method for stereo matching. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1402–1409 (2012)

  57. Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2019)

  58. Yu, F., Gallup, D.: 3d reconstruction from accidental motion. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2014)

  59. Zhang, S., Sheng, H., Li, C., Zhang, J., Xiong, Z.: Robust depth estimation for light field via spinning parallelogram operator. Comput. Vis. Image Understand. 145(1), 148–159 (2016)

    Article  Google Scholar 

  60. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)

    Article  Google Scholar 

  61. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. pp. 6230–6239 (2017)

  62. Zhou, H., Fan, H., Peng, K., Fan, W., Zhou, D., Liu, Y.: Monocular visual odometry initialization with points and line segments. IEEE Access 7(1), 73120–73130 (2019)

    Article  Google Scholar 

  63. Zhu, H., Wang, Q., Yu, J.: Occlusion-model guided antiocclusion depth estimation in light field. IEEE J. Select. Top. Signal Process. 11(7), 965–978 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (No. 2018AAA0103002) and National Natural Science Foundation of China (No. 61702482).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaoxin Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Zuo, W., Wang, Z. et al. Robust 3D reconstruction from uncalibrated small motion clips. Vis Comput 38, 1589–1605 (2022). https://doi.org/10.1007/s00371-021-02090-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02090-w

Keywords

Navigation