Resolving Scale Ambiguity in Multi-view 3D Reconstruction Using Dual-Pixel Sensors

Ashida, Kohei; Santo, Hiroaki; Okura, Fumio; Matsushita, Yasuyuki

doi:10.1007/978-3-031-72973-7_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15108))

Included in the following conference series:

European Conference on Computer Vision

300 Accesses
1 Altmetric

Abstract

Multi-view 3D reconstruction, namely structure-from-motion and multi-view stereo, is an essential component in 3D computer vision. In general, multi-view 3D reconstruction suffers from unknown scale ambiguity unless a reference object of known size is recorded together with the scene, or the camera poses are pre-calibrated. In this paper, we show that multi-view images recorded by a dual-pixel (DP) sensor allow us to automatically resolve the scale ambiguity without requiring a reference object or pre-calibration. Specifically, the observed defocus blurs in DP images provide sufficient information for determining the scale when paired together with the depth maps (up to scale) recovered from the multi-view 3D reconstruction. Based on this observation, we develop a simple yet effective linear solution method to determine the absolute scale in multi-view 3D reconstruction. Experiments demonstrate the effectiveness of the proposed method with diverse scenes recorded with different cameras/lenses. Code and data are available at https://github.com/kohei-ashida/dp-sfm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fixed-Lens camera setup and calibrated image registration for multifocus multiview 3D reconstruction

Article 06 April 2021

Robust 3D Reconstruction Through Noise Reduction of Ultra-Fast Images

Mirror-assisted Multi-view Digital Image Correlation with Improved Spatial Resolution

Article 03 December 2019

Notes

1.
Agisoft Metashape https://www.agisoft.com/, last accessed on July 12, 2024.
2.
DPRSplit https://www.fastrawviewer.com/DPRSplit, last accessed on July 12, 2024.

References

Abuolaim, A., Afifi, M., Brown, M.S.: Improving single-image defocus deblurring: how dual-pixel images help through multi-task learning. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 82–90 (2022)
Google Scholar
Abuolaim, A., Brown, M.S.: Defocus deblurring using dual-pixel data. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 111–126 (2020)
Google Scholar
Abuolaim, A., Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Learning to reduce defocus blur by realistically modeling dual-pixel data. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2289–2298 (2021)
Google Scholar
Abuolaim, A., Punnappurath, A., Brown, M.S.: Revisiting autofocus for smartphone cameras. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 545–559 (2018)
Google Scholar
Abuolaim, A., Timofte, R., Brown, M.S.: NTIRE 2021 challenge for defocus deblurring using dual-pixel images: methods and results. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 578–587 (2021)
Google Scholar
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
Engel, J., Stückler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1935–1942 (2015)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet MATH Google Scholar
Frost, D., Prisacariu, V., Murray, D.: Recovering stable scale in monocular SLAM using object-supplemented bundle adjustment. IEEE Trans. Rob. 34(3), 736–747 (2018)
Article MATH Google Scholar
Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7628–7637 (2019)
Google Scholar
Gentle, J.: Matrix Albegra. Springer Texts in Statistics, Springer, New York (2007)
Book MATH Google Scholar
Giubilato, R., Chiodini, S., Pertile, M., Debei, S.: Scale correct monocular visual odometry using a lidar altimeter. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3694–3700 (2018)
Google Scholar
Herrmann, C., et al.: Learning to autofocus. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2227–2236 (2020)
Google Scholar
Jung, S.H., Heo, Y.S.: Disparity probability volume guided defocus deblurring using dual pixel data. In: Proceedings of International Conference on Information and Communication Technology Convergence (ICTC), pp. 305–308 (2021)
Google Scholar
Kang, M., Choe, J., Ha, H., Jeon, H.G., Im, S., Kweon, I.S.: Facial depth and normal estimation using single dual-pixel camera. In: Proceedings of European Conference on Computer Vision (ECCV) (2022)
Google Scholar
Kashiwagi, M., Mishima, N., Kozakaya, T., Hiura, S.: Deep depth from aberration map. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Kim, D., Jang, H., Kim, I., Kim, M.H.: Spatio-focal bidirectional disparity estimation from a dual-pixel image. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5023–5032 (2023)
Google Scholar
Knorr, S.B., Kurz, D.: Leveraging the user’s face for absolute scale estimation in handheld monocular SLAM. In: Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 11–17 (2016)
Google Scholar
Lee, S.H., de Croon, G.: Stability-based scale estimation for monocular SLAM. IEEE Robot. Autom. Lett. 3(2), 780–787 (2018)
Article MATH Google Scholar
Li, F., Guo, H., Santo, H., Okura, F., Matsushita, Y.: Learning to synthesize photorealistic dual-pixel images from RGBD frames. In: International Conference on Computational Photography (ICCP), pp. 1–11 (2023)
Google Scholar
Mishima, N., Seki, A., Hiura, S.: Absolute scale from varifocal monocular camera through SfM and defocus combined. In: Proceedings of British Machine Vision Conference (BMVC) (2021)
Google Scholar
Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61(1–4), 287–299 (2011)
Article MATH Google Scholar
Pan, L., Chowdhury, S., Hartley, R., Liu, M., Zhang, H., Li, H.: Dual pixel exploration: Simultaneous depth estimation and image restoration. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4340–4349 (2021)
Google Scholar
Punnappurath, A., Abuolaim, A., Afifi, M., Brown, M.S.: Modeling defocus-disparity in dual-pixel sensors. In: International Conference on Computational Photography (ICCP), pp. 1–12 (2020)
Google Scholar
Punnappurath, A., Brown, M.S.: Reflection removal using a dual-pixel sensor. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1556–1565 (2019)
Google Scholar
Roussel, T., Van Eycken, L., Tuytelaars, T.: Monocular depth estimation in new environments with absolute scale. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1735–1741 (2019)
Google Scholar
Rukhovich, D., Mouritzen, D., Kaestner, R., Rufli, M., Velizhev, A.: Estimation of absolute scale in monocular SLAM using synthetic data. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 803–812 (2019)
Google Scholar
Scaramuzza, D., Fraundorfer, F., Pollefeys, M., Siegwart, R.: Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1413–1419 (2009)
Google Scholar
Shibata, A., Fujii, H., Yamashita, A., Asama, H.: Absolute scale structure from motion using a refractive plate. In: Proceedings of IEEE/SICE International Symposium on System Integration (SII), pp. 540–545 (2015)
Google Scholar
Shibata, A., Fujii, H., Yamashita, A., Asama, H.: Scale-reconstructable structure from motion using refraction with a single camera. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5239–5244 (2015)
Google Scholar
Shiozaki, T., Dissanayake, G.: Eliminating scale drift in monocular SLAM using depth from defocus. IEEE Robot. Autom. Lett. 3(1), 581–587 (2017)
Article MATH Google Scholar
Song, S., Chandraker, M.: Robust scale estimation in real-time monocular sfm for autonomous driving. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1566–1573 (2014)
Google Scholar
Sucar, E., Hayet, J.B.: Bayesian scale estimation for monocular SLAM based on generic object detection for correcting scale drift. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5152–5158 (2018)
Google Scholar
Sumikura, S., Sakurada, K., Kawaguchi, N., Nakamura, R.: Scale estimation of monocular SfM for a multi-modal stereo camera. In: Proceedings of Asian Conference on Computer Vision (ACCV), pp. 281–297 (2019)
Google Scholar
Wadhwa, N., et al.: Synthetic depth-of-field with a single-camera mobile phone. In: ACM Transactions on Graphics (TOG), pp. 1–13 (2018)
Google Scholar
Wöhler, C., d’Angelo, P., Krüger, L., Kuhl, A., Groß, H.M.: Monocular 3D scene reconstruction at absolute scale. ISPRS J. Photogramm. Remote. Sens. 64(6), 529–540 (2009)
Article Google Scholar
Xin, S., et al.: Defocus map estimation and deblurring from a single dual-pixel image. In: Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2228–2238 (2021)
Google Scholar
Yang, Y., Pan, L., Liu, L., Liu, M.: K3DN: Disparity-aware kernel estimation for dual-pixel defocus deblurring. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13263–13272 (2023)
Google Scholar
Zhang, S., Zhang, J., Tao, D.: Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating IMU motion dynamics. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 143–160 (2022)
Google Scholar
Zhang, Y., Wadhwa, N., Orts-Escolano, S., Häne, C., Fanello, S.R., Garg, R.: Du2Net: Learning depth estimation from dual-cameras and dual-pixels. In: Proceedings of European Conference on Computer Vision (ECCV), pp. 582–598 (2020)
Google Scholar
Zhou, D., Dai, Y., Li, H.: Reliable scale estimation and correction for monocular visual odometry. In: Proceedings of IEEE Intelligent Vehicles Symposium (IV), pp. 490–495 (2016)
Google Scholar
Zhou, D., Dai, Y., Li, H.: Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Trans. Intell. Transp. Syst. 21(2), 791–802 (2019)
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP23H05491.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Osaka University, Suita, Japan
Kohei Ashida, Hiroaki Santo, Fumio Okura & Yasuyuki Matsushita

Authors

Kohei Ashida
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Santo
View author publications
You can also search for this author in PubMed Google Scholar
Fumio Okura
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyuki Matsushita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kohei Ashida .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8757 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashida, K., Santo, H., Okura, F., Matsushita, Y. (2025). Resolving Scale Ambiguity in Multi-view 3D Reconstruction Using Dual-Pixel Sensors. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15108. Springer, Cham. https://doi.org/10.1007/978-3-031-72973-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-72973-7_10
Published: 01 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72972-0
Online ISBN: 978-3-031-72973-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Resolving Scale Ambiguity in Multi-view 3D Reconstruction Using Dual-Pixel Sensors