Skip to main content

Disambiguating Monocular Depth Estimation with a Single Transient

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

Monocular depth estimation algorithms successfully predict the relative depth order of objects in a scene. However, because of the fundamental scale ambiguity associated with monocular images, these algorithms fail at correctly predicting true metric depth. In this work, we demonstrate how a depth histogram of the scene, which can be readily captured using a single-pixel time-resolved detector, can be fused with the output of existing monocular depth estimation algorithms to resolve the depth ambiguity problem. We validate this novel sensor fusion technique experimentally and in extensive simulation. We show that it significantly improves the performance of several state-of-the-art monocular depth estimation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad Siddiqui, T., Madhok, R., O’Toole, M.: An extensible multi-sensor fusion framework for 3D imaging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1008–1009 (2020)

    Google Scholar 

  2. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv:1812.11941v2 (2018)

  3. Burri, S., Bruschini, C., Charbon, E.: Linospad: a compact linear SPAD camera system with 64 FPGA-based TDC modules for versatile 50 ps resolution time-resolved imaging. Instruments 1(1), 6 (2017)

    Article  Google Scholar 

  4. Burri, S., Homulle, H., Bruschini, C., Charbon, E.: Linospad: a time-resolved \(256 \times 1\) CMOS SPAD line sensor system featuring 64 FPGA-based TDC channels running at up to 8.5 giga-events per second. In: Optical Sensing and Detection IV, vol. 9899, p. 98990D. International Society for Optics and Photonics (2016)

    Google Scholar 

  5. Caramazza, P., et al.: Neural network identification of people hidden from view with a single-pixel, single-photon detector. Sci. Rep. 8(1), 11945 (2018)

    Article  Google Scholar 

  6. Chang, J., Wetzstein, G.: Deep optics for monocular depth estimation and 3D object detection. In: Proceedings of ICCV (2019)

    Google Scholar 

  7. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of NeurIPS (2014)

    Google Scholar 

  8. Faccio, D., Velten, A., Wetzstein, G.: Non-line-of-sight imaging. Nat. Rev. Phys. 1–10 (2020)

    Google Scholar 

  9. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of CVPR (2018)

    Google Scholar 

  10. Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of ICCV (2019)

    Google Scholar 

  11. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  12. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of CVPR (2017)

    Google Scholar 

  13. Gonzales, R., Fittes, B.: Gray-level transformations for interactive image enhancement. Mech. Mach. Theory 12(1), 111–122 (1977)

    Article  Google Scholar 

  14. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice-Hall Inc, Upper Saddle River (2008)

    Google Scholar 

  15. Gupta, A., Ingle, A., Velten, A., Gupta, M.: Photon-flooded single-photon 3D cameras. In: Proceedings of CVPR. IEEE (2019)

    Google Scholar 

  16. Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings of CVPR (2013)

    Google Scholar 

  17. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Proceedings of ECCV (2014)

    Google Scholar 

  18. Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: Proceedings of 3DV (2018)

    Google Scholar 

  19. Heide, F., Diamond, S., Lindell, D.B., Wetzstein, G.: Sub-picosecond photon-efficient 3D imaging using single-photon sensors. Sci. Rep. 8(17726), 1–8 (2018)

    Google Scholar 

  20. Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. 24(3), 577–584 (2005)

    Article  Google Scholar 

  21. Karsch, K., Liu, C., Kang, S.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)

    Article  Google Scholar 

  22. Kirmani, A., Venkatraman, D., Shin, D., Colaço, A., Wong, F.N., Shapiro, J.H., Goyal, V.K.: First-photon imaging. Science 343(6166), 58–61 (2014)

    Article  Google Scholar 

  23. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings of 3DV. IEEE (2016)

    Google Scholar 

  24. Lamb, R., Buller, G.: Single-pixel imaging using 3D scanning time-of-flight photon counting. SPIE Newsroom (2010)

    Google Scholar 

  25. Lasinger, K., Ranftl, R., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv:1907.01341 (2019)

  26. Li, Z.P., et al.: Single-photon computational 3D imaging at 45 km. arXiv preprint arXiv:1904.10341 (2019)

  27. Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of ICCV (2013)

    Google Scholar 

  28. Lindell, D.B., O’Toole, M., Wetzstein, G.: Single-photon 3D imaging with deep sensor fusion. ACM Trans. Graph. (SIGGRAPH) 37(4), 113 (2018)

    Article  Google Scholar 

  29. Lindell, D.B., Wetzstein, G., O’Toole, M.: Wave-based non-line-of-sight imaging using fast F-K migration. ACM Trans. Graph. 38(4), 1–13 (2019)

    Article  Google Scholar 

  30. Liu, X., Bauer, S., Velten, A.: Phasor field diffraction based reconstruction for fast non-line-of-sight imaging systems. Nat. Commun. 11(1), 1–13 (2020)

    Article  Google Scholar 

  31. Liu, X., et al.: Non-line-of-sight imaging using phasor-field virtual wave optics. Nature 572(7771), 620–623 (2019)

    Article  Google Scholar 

  32. Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: Proceedings of IROS (2015)

    Google Scholar 

  33. McManamon, P.: Review of ladar: a historic, yet emerging, sensor technology with rich phenomenology. Opt. Eng. 51(6), 060901 (2012)

    Article  Google Scholar 

  34. Morovic, J., Shaw, J., Sun, P.L.: A fast, non-iterative and exact histogram matching algorithm. Pattern Recognit. Lett. 23(1–3), 127–135 (2002)

    Article  MATH  Google Scholar 

  35. Niclass, C., Rochas, A., Besse, P.A., Charbon, E.: Design and characterization of a CMOS 3-D image sensor based on single photon avalanche diodes. IEEE J. Solid-State Circuits 40(9), 1847–1854 (2005)

    Article  Google Scholar 

  36. Nikolova, M., Wen, Y.W., Chan, R.: Exact histogram specification for digital images using a variational approach. J. Math. Imaging. Vis. 46(3), 309–325 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. O’Connor, D.V., Phillips, D.: Time-Correlated Single Photon Counting. Academic Press, London (1984)

    Google Scholar 

  38. O’Toole, M., Heide, F., Lindell, D.B., Zang, K., Diamond, S., Wetzstein, G.: Reconstructing transient images from single-photon sensors. In: Proceedings of CVPR (2017)

    Google Scholar 

  39. O’Toole, M., Lindell, D.B., Wetzstein, G.: Confocal non-line-of-sight imaging based on the light-cone transform. Nature 555(7696), 338–341 (2018)

    Article  Google Scholar 

  40. Pawlikowska, A.M., Halimi, A., Lamb, R.A., Buller, G.S.: Single-photon three-dimensional imaging at up to 10 kilometers range. Opt. Express 25(10), 11919–11931 (2017)

    Article  Google Scholar 

  41. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of CVPR (2016)

    Google Scholar 

  42. Rapp, J., Ma, Y., Dawson, R.M.A., Goyal, V.K.: Dead time compensation for high-flux depth imaging. In: Proceedings of ICASSP (2019)

    Google Scholar 

  43. Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: Proceedings of CVPR (2012)

    Google Scholar 

  44. Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs. In: Proceedings of CVPR (2006)

    Google Scholar 

  45. Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of NeurIPS (2006)

    Google Scholar 

  46. Shin, D., Kirmani, A., Goyal, V.K., Shapiro, J.H.: Photon-efficient computational 3-D and reflectivity imaging with single-photon detectors. IEEE Trans. Computat. Imag. 1(2), 112–125 (2015)

    Article  MathSciNet  Google Scholar 

  47. Shin, D., et al.: Photon-efficient imaging with a single-photon camera. Nat. Commun. 7, 12046 (2016)

    Article  Google Scholar 

  48. Shrivastava, A., Gupta, A.: Building part-based object detectors via 3D geometry. In: Proceedings of ICCV (2013)

    Google Scholar 

  49. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Proceedings of ECCV (2012)

    Google Scholar 

  50. Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Proceedings of ECCV (2014)

    Google Scholar 

  51. Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: Proceedings of CVPR (2016)

    Google Scholar 

  52. Stoppa, D., Pancheri, L., Scandiuzzo, M., Gonzo, L., Dalla Betta, G.F., Simoni, A.: A CMOS 3-D imager based on single photon avalanche diode. IEEE Trans. Circuits Syst. I Reg. Papers 54(1), 4–12 (2007)

    Google Scholar 

  53. Sun, Z., Lindell, D.B., Solgaard, O., Wetzstein, G.: Spadnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation. Opt. Express 28(10), 14948–14962 (2020)

    Article  Google Scholar 

  54. Swoboda, P., Schnörr, C.: Convex variational image restoration with histogram priors. SIAM J. Imaging Sci. 6(3), 1719–1735 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  55. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Heidelberg (2010)

    Google Scholar 

  56. Veerappan, C., et al.: A \(160 \times 128\) single-photon image sensor with on-pixel 55ps 10b time-to-digital converter. In: Proceedings of ISSCC (2011)

    Google Scholar 

  57. Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: PhaseCam3D–learning phase masks for passive single view depth estimation. In: Proceedings of ICCP (2019)

    Google Scholar 

  58. Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of CVPR (2015)

    Google Scholar 

  59. Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of Fermat paths for non-line-of-sight shape reconstruction. In: Proceedings of CVPR (2019)

    Google Scholar 

  60. Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Proceedings of CVPR (2017)

    Google Scholar 

  61. Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of CVPR (2018)

    Google Scholar 

  62. Zhang, C., Lindner, S., Antolovic, I., Wolf, M., Charbon, E.: A CMOS SPAD imager with collision detection and 128 dynamically reallocating TDCs for single-photon counting and 3D time-of-flight imaging. Sensors 18(11), 4016 (2018)

    Google Scholar 

  63. Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph. 9(4) (2017)

    Google Scholar 

Download references

Acknowledgments

D.L. was supported by a Stanford Graduate Fellowship. C.M. was supported by an ORISE Intelligence Community Postdoctoral Fellowship. G.W. was supported by an NSF CAREER Award (IIS 1553333), a Sloan Fellowship, by the KAUST Office of Sponsored Research through the Visual Computing Center CCF grant, and a PECASE by the ARL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Nishimura .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 34424 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nishimura, M., Lindell, D.B., Metzler, C., Wetzstein, G. (2020). Disambiguating Monocular Depth Estimation with a Single Transient. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58589-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58588-4

  • Online ISBN: 978-3-030-58589-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics