Skip to main content

GigaDepth: Learning Depth from Structured Light with Branching Neural Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Abstract

Structured light-based depth sensors provide accurate depth information independently of the scene appearance by extracting pattern positions from the captured pixel intensities. Spatial neighborhood encoding, in particular, is a popular structured light approach for off-the-shelf hardware. However, it suffers from the distortion and fragmentation of the projected pattern by the scene’s geometry in the vicinity of a pixel. This forces algorithms to find a delicate balance between depth prediction accuracy and robustness to pattern fragmentation or appearance change. While stereo matching provides more robustness at the expense of accuracy, we show that learning to regress a pixel’s position within the projected pattern is not only more accurate when combined with classification but can be made equally robust. We propose to split the regression problem into smaller classification sub-problems in a coarse-to-fine manner with the use of a weight-adaptive layer that efficiently implements branching per-pixel Multilayer Perceptrons applied to features extracted by a Convolutional Neural Network. As our approach requires full supervision, we train our algorithm on a rendered dataset sufficiently close to the real-world domain. On a separately captured real-world dataset, we show that our network outperforms state-of-the-art and is significantly more robust than other regression-based approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Randomly selected textures on planes used in walls as well as cube, sphere, cylinder and pill shapes.

References

  1. Altschuler, M.D., Posdamer, J.L., Frieder, G., Altschuler, B.R., Taboada, J.: The numerical stereo camera. In: Three-Dimensional Machine Perception, vol. 0283, pp. 15–24. International Society for Optics and Photonics (1981)

    Google Scholar 

  2. Bian, J.W., et al.: Unsupervised scale-consistent depth learning from video. Int. J. Comput. Vision 129, 2548–2564 (2021)

    Article  Google Scholar 

  3. Carrihill, B., Hummel, R.: Experiments with the intensity ratio depth sensor. Computer Vision, Graphics, and Image Processing 32(3), 337–358 (1985)

    Article  Google Scholar 

  4. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Tech. Rep. arXiv:1512.03012 [cs.GR], Stanford University – Princeton University – Toyota Technological Institute at Chicago (2015)

  5. Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)

    Google Scholar 

  6. Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)

    Google Scholar 

  7. Fanello, S.R., et al.: HyperDepth: learning depth from structured light without matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5441–5450 (2016)

    Google Scholar 

  8. Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3827–3837 (2019)

    Google Scholar 

  9. Gupta, M., Nakhate, N.: A geometric perspective on structured light coding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 90–107. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_6

    Chapter  Google Scholar 

  10. Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 807–814 (2005)

    Google Scholar 

  11. Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENet: towards precise and efficient image guided depth completion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 13656–13662 (2021)

    Google Scholar 

  12. Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)

    Google Scholar 

  13. Van der Jeught, S., Dirckx, J.J.J.: Deep neural networks for single shot structured light profilometry. Opt. Express 27(12), 17091–17101 (2019)

    Article  Google Scholar 

  14. Johari, M., Carta, C., Fleuret, F.: DepthInSpace: exploitation and fusion of multiple video frames for structured-light depth estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6019–6028 (2021)

    Google Scholar 

  15. Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S.: StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 596–613. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_35

    Chapter  Google Scholar 

  16. Martinez, M., Stiefelhagen, R.: Kinect unleashed: getting control over high resolution depth maps. In: Proceedings of the International Conference on Machine Vision Applications, pp. 247–240 (2013)

    Google Scholar 

  17. Miangoleh, S.M.H., Dille, S., Mai, L., Paris, S., Aksoy, Y.: Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9680–9689 (2021)

    Google Scholar 

  18. Mirdehghan, P., Chen, W., Kutulakos, K.N.: Optimal structured light à la carte. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6248–6257 (2018)

    Google Scholar 

  19. Nguyen, H., Wang, Y., Wang, Z.: Single-shot 3D shape reconstruction using structured light and deep convolutional neural networks. Sensors 20(13), 3718 (2020)

    Google Scholar 

  20. Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188 (2021)

    Google Scholar 

  21. Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A.: Connecting the dots: learning representations for active monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7616–7625 (2019)

    Google Scholar 

  22. Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: HITNet: hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14362–14372 (2021)

    Google Scholar 

  23. Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy LiDAR completion with RGB guidance and uncertainty. In: Proceedings of the International Conference on Machine Vision Applications, pp. 1–6 (2019)

    Google Scholar 

  24. Watson, J., Aodha, O.M., Prisacariu, V., Brostow, G., Firman, M.: The temporal opportunist: Self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition, pp. 1164–1174 (2021)

    Google Scholar 

  25. Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)

    Google Scholar 

  26. Zhang, Y., et al.: ActiveStereoNet: end-to-end self-supervised learning for active stereo systems. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 802–819. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_48

    Chapter  Google Scholar 

Download references

Acknowledgements

The research leading to these results has received funding from EC Horizon 2020 for Research and Innovation under grant agreement No. 101017089, TraceBot and the Austrian Science Foundation (FWF) under grant agreement No. I3969-N30, InDex.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Schreiberhuber .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15713 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schreiberhuber, S., Weibel, JB., Patten, T., Vincze, M. (2022). GigaDepth: Learning Depth from Structured Light with Branching Neural Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19827-4_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19826-7

  • Online ISBN: 978-3-031-19827-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics