Skip to main content

Resolving Scale Ambiguity in Multi-view 3D Reconstruction Using Dual-Pixel Sensors

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Multi-view 3D reconstruction, namely structure-from-motion and multi-view stereo, is an essential component in 3D computer vision. In general, multi-view 3D reconstruction suffers from unknown scale ambiguity unless a reference object of known size is recorded together with the scene, or the camera poses are pre-calibrated. In this paper, we show that multi-view images recorded by a dual-pixel (DP) sensor allow us to automatically resolve the scale ambiguity without requiring a reference object or pre-calibration. Specifically, the observed defocus blurs in DP images provide sufficient information for determining the scale when paired together with the depth maps (up to scale) recovered from the multi-view 3D reconstruction. Based on this observation, we develop a simple yet effective linear solution method to determine the absolute scale in multi-view 3D reconstruction. Experiments demonstrate the effectiveness of the proposed method with diverse scenes recorded with different cameras/lenses. Code and data are available at https://github.com/kohei-ashida/dp-sfm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Agisoft Metashape https://www.agisoft.com/, last accessed on July 12, 2024.

  2. 2.

    DPRSplit https://www.fastrawviewer.com/DPRSplit, last accessed on July 12, 2024.

References

  1. Abuolaim, A., Afifi, M., Brown, M.S.: Improving single-image defocus deblurring: how dual-pixel images help through multi-task learning. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 82–90 (2022)

    Google Scholar 

  2. Abuolaim, A., Brown, M.S.: Defocus deblurring using dual-pixel data. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 111–126 (2020)

    Google Scholar 

  3. Abuolaim, A., Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Learning to reduce defocus blur by realistically modeling dual-pixel data. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2289–2298 (2021)

    Google Scholar 

  4. Abuolaim, A., Punnappurath, A., Brown, M.S.: Revisiting autofocus for smartphone cameras. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 545–559 (2018)

    Google Scholar 

  5. Abuolaim, A., Timofte, R., Brown, M.S.: NTIRE 2021 challenge for defocus deblurring using dual-pixel images: methods and results. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 578–587 (2021)

    Google Scholar 

  6. Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)

  7. Engel, J., Stückler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1935–1942 (2015)

    Google Scholar 

  8. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  9. Frost, D., Prisacariu, V., Murray, D.: Recovering stable scale in monocular SLAM using object-supplemented bundle adjustment. IEEE Trans. Rob. 34(3), 736–747 (2018)

    Article  MATH  Google Scholar 

  10. Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7628–7637 (2019)

    Google Scholar 

  11. Gentle, J.: Matrix Albegra. Springer Texts in Statistics, Springer, New York (2007)

    Book  MATH  Google Scholar 

  12. Giubilato, R., Chiodini, S., Pertile, M., Debei, S.: Scale correct monocular visual odometry using a lidar altimeter. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3694–3700 (2018)

    Google Scholar 

  13. Herrmann, C., et al.: Learning to autofocus. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2227–2236 (2020)

    Google Scholar 

  14. Jung, S.H., Heo, Y.S.: Disparity probability volume guided defocus deblurring using dual pixel data. In: Proceedings of International Conference on Information and Communication Technology Convergence (ICTC), pp. 305–308 (2021)

    Google Scholar 

  15. Kang, M., Choe, J., Ha, H., Jeon, H.G., Im, S., Kweon, I.S.: Facial depth and normal estimation using single dual-pixel camera. In: Proceedings of European Conference on Computer Vision (ECCV) (2022)

    Google Scholar 

  16. Kashiwagi, M., Mishima, N., Kozakaya, T., Hiura, S.: Deep depth from aberration map. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  17. Kim, D., Jang, H., Kim, I., Kim, M.H.: Spatio-focal bidirectional disparity estimation from a dual-pixel image. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5023–5032 (2023)

    Google Scholar 

  18. Knorr, S.B., Kurz, D.: Leveraging the user’s face for absolute scale estimation in handheld monocular SLAM. In: Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 11–17 (2016)

    Google Scholar 

  19. Lee, S.H., de Croon, G.: Stability-based scale estimation for monocular SLAM. IEEE Robot. Autom. Lett. 3(2), 780–787 (2018)

    Article  MATH  Google Scholar 

  20. Li, F., Guo, H., Santo, H., Okura, F., Matsushita, Y.: Learning to synthesize photorealistic dual-pixel images from RGBD frames. In: International Conference on Computational Photography (ICCP), pp. 1–11 (2023)

    Google Scholar 

  21. Mishima, N., Seki, A., Hiura, S.: Absolute scale from varifocal monocular camera through SfM and defocus combined. In: Proceedings of British Machine Vision Conference (BMVC) (2021)

    Google Scholar 

  22. Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61(1–4), 287–299 (2011)

    Article  MATH  Google Scholar 

  23. Pan, L., Chowdhury, S., Hartley, R., Liu, M., Zhang, H., Li, H.: Dual pixel exploration: Simultaneous depth estimation and image restoration. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4340–4349 (2021)

    Google Scholar 

  24. Punnappurath, A., Abuolaim, A., Afifi, M., Brown, M.S.: Modeling defocus-disparity in dual-pixel sensors. In: International Conference on Computational Photography (ICCP), pp. 1–12 (2020)

    Google Scholar 

  25. Punnappurath, A., Brown, M.S.: Reflection removal using a dual-pixel sensor. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1556–1565 (2019)

    Google Scholar 

  26. Roussel, T., Van Eycken, L., Tuytelaars, T.: Monocular depth estimation in new environments with absolute scale. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1735–1741 (2019)

    Google Scholar 

  27. Rukhovich, D., Mouritzen, D., Kaestner, R., Rufli, M., Velizhev, A.: Estimation of absolute scale in monocular SLAM using synthetic data. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 803–812 (2019)

    Google Scholar 

  28. Scaramuzza, D., Fraundorfer, F., Pollefeys, M., Siegwart, R.: Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1413–1419 (2009)

    Google Scholar 

  29. Shibata, A., Fujii, H., Yamashita, A., Asama, H.: Absolute scale structure from motion using a refractive plate. In: Proceedings of IEEE/SICE International Symposium on System Integration (SII), pp. 540–545 (2015)

    Google Scholar 

  30. Shibata, A., Fujii, H., Yamashita, A., Asama, H.: Scale-reconstructable structure from motion using refraction with a single camera. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5239–5244 (2015)

    Google Scholar 

  31. Shiozaki, T., Dissanayake, G.: Eliminating scale drift in monocular SLAM using depth from defocus. IEEE Robot. Autom. Lett. 3(1), 581–587 (2017)

    Article  MATH  Google Scholar 

  32. Song, S., Chandraker, M.: Robust scale estimation in real-time monocular sfm for autonomous driving. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1566–1573 (2014)

    Google Scholar 

  33. Sucar, E., Hayet, J.B.: Bayesian scale estimation for monocular SLAM based on generic object detection for correcting scale drift. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5152–5158 (2018)

    Google Scholar 

  34. Sumikura, S., Sakurada, K., Kawaguchi, N., Nakamura, R.: Scale estimation of monocular SfM for a multi-modal stereo camera. In: Proceedings of Asian Conference on Computer Vision (ACCV), pp. 281–297 (2019)

    Google Scholar 

  35. Wadhwa, N., et al.: Synthetic depth-of-field with a single-camera mobile phone. In: ACM Transactions on Graphics (TOG), pp. 1–13 (2018)

    Google Scholar 

  36. Wöhler, C., d’Angelo, P., Krüger, L., Kuhl, A., Groß, H.M.: Monocular 3D scene reconstruction at absolute scale. ISPRS J. Photogramm. Remote. Sens. 64(6), 529–540 (2009)

    Article  Google Scholar 

  37. Xin, S., et al.: Defocus map estimation and deblurring from a single dual-pixel image. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2228–2238 (2021)

    Google Scholar 

  38. Yang, Y., Pan, L., Liu, L., Liu, M.: K3DN: Disparity-aware kernel estimation for dual-pixel defocus deblurring. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13263–13272 (2023)

    Google Scholar 

  39. Zhang, S., Zhang, J., Tao, D.: Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating IMU motion dynamics. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 143–160 (2022)

    Google Scholar 

  40. Zhang, Y., Wadhwa, N., Orts-Escolano, S., Häne, C., Fanello, S.R., Garg, R.: Du2Net: Learning depth estimation from dual-cameras and dual-pixels. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 582–598 (2020)

    Google Scholar 

  41. Zhou, D., Dai, Y., Li, H.: Reliable scale estimation and correction for monocular visual odometry. In: Proceedings of IEEE Intelligent Vehicles Symposium (IV), pp. 490–495 (2016)

    Google Scholar 

  42. Zhou, D., Dai, Y., Li, H.: Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Trans. Intell. Transp. Syst. 21(2), 791–802 (2019)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP23H05491.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kohei Ashida .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8757 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ashida, K., Santo, H., Okura, F., Matsushita, Y. (2025). Resolving Scale Ambiguity in Multi-view 3D Reconstruction Using Dual-Pixel Sensors. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15108. Springer, Cham. https://doi.org/10.1007/978-3-031-72973-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72973-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72972-0

  • Online ISBN: 978-3-031-72973-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics