Skip to main content
Log in

Sparse depth densification for monocular depth estimation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Now the dense depth prediction by single image and a few sparse depth measurements has attracted more and more attention because it provides a low-cost and efficient solution for estimating high-quality depth information. But the current existing methods for the field only take sparse depth as an independent dimension, and the relationship between sparse depth and image itself is always ignored, which undoubtedly limits the improvement of prediction accuracy. For solving the problem, in this paper, a sparse depth densification method is proposed to fully mine the relationship between sparse depth and image for achieving more accurate depth estimation. Based on a priori that the object areas of same category have similar depth values, a Depth Densification Map (DDM) is constructed by the segmentation label obtained from unsupervised image segmentation and sparse depth to realize sparse depth densification. Meantime, considering the potential error of DDM, a Depth Error Map (DEM) is designed to further correct DDM. Then, we use the idea of multi-scale fusion to build a depth estimation network. Finally, the proposed maps are combined with single image as the input of the network, and used to carry out the actual training and testing. Extensive experiments on NYU Depth v2 and Make3D datasets demonstrate the superiority of our proposed approach. Our code is available at https://github.com/Jennifer108/Sparse-Depth-Densification. https://github.com/Jennifer108/Sparse-Depth-Densification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available in the NYU Depth Dataset V2 and Make3D repository, https://cs.nyu.edu/silberman/datasets/nyu_depth_v2.html. http://make3d.cs.cornell.edu/data.html.

References

  1. Atapour-Abarghouei A, Breckon TP (2019) Veritatem dies aperit-temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 3373–3384

  2. Bhat SF, Alhashim I, Wonka P (2021) Adabins: Depth estimation using adaptive bins. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2021), Nashville, USA, pp. 4009–4018

  3. Bian J-W, Zhan H, Wang N, Chin T-J, Shen C, Reid I (2021) Auto-rectify network for unsupervised indoor depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence

  4. Bian J-W, Zhan H, Wang N, Li Z, Zhang L, Shen C, Cheng M-M, Reid I (2021) Unsupervised scale-consistent depth learning from video. Int J Comput Vis 129(9):2548–2564

  5. Bian J-W, Zhan H, Wang N, Li Z, Zhang L, Shen C, Cheng M-M, Reid I (2021) Unsupervised scale-consistent depth learning from video. Int J Comput Vis 129(9):2548–2564

    Article  Google Scholar 

  6. Chen Z, Badrinarayanan V, Drozdov G, Rabinovich A (2018) Estimating depth from rgb and sparse sensing. European Conference on Computer Vision (ECCV 2018). Munich, Germany, pp 167–182

  7. Chen P-Y, Liu AH, Liu Y-C, Wang Y-C F (2019) Towards scene understanding: Unsupervised monocular depth estimation with semanticaware representation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 2624–2632

  8. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Ad Neural Inf Process Syst 27:2366–2374

    Google Scholar 

  9. Gao H, Cheng B, Wang J, Li K, Zhao J, Li D (2018) Object classification using cnn-based fusion of vision and lidar in autonomous vehicle environment. IEEE Trans Ind Inform 14(9):4224–4231

    Article  Google Scholar 

  10. Gwn Lore K, Reddy K, Giering M, Bernal EA (2018) Generative adversarial networks for depth map estimation from rgb video. IEEE Conf on Computer Vision and Pattern Recognition Workshops (CVPRW 2018). Salt Lake City, USA, pp 1177–1185

  11. Han Y, Zhan IH, Zhao W, Pan J, Zhang Z, Wang Y, Liu Y-J (2022) Deep reinforcement learning for robot collision avoidance with self-stateattention and sensor fusion. IEEE Robot Autom Lett 7(3):6886–6893

    Article  Google Scholar 

  12. Han J, Liu B, Jia Y, Jin S, Sulowicz M, Glowacz A, Krolczyk G, Li Z (2022) A new kinect v2-based method for visual recognition and grasping of a yarn-bobbin-handling robot. Micromachines 13(6):886–896

    Article  Google Scholar 

  13. Han J, Liu B, Jia Y, Jin S, Sulowicz M, Glowacz A, Krolczyk G, li Z, (2022) A new kinect v2-based method for visual recognition and grasping of a yarn-bobbin-handling robot. Micromachines 13(6):886–896

  14. Hu J, Zhang Y, Okatani T (2019) Visualization of convolutional neural networks for monocular depth estimation. IEEE Int Conf on Computer Vision (ICCV 2019). Seoul, Korea (South), pp 3869–3878

  15. Hu J, Ozay M, Zhang Y, Okatani T (2019) Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: IEEE Winter Conference on Applications of Computer Vision (WACV 2019), Waikoloa, USA, pp. 1043–1051

  16. Johnston A, Carneiro G (2020) Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 4756–4765

  17. Jiao J, Cao Y, Song Y, Lau R (2018) Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In: European Conference on Computer Vision (ECCV 2018), Munich, Germany, pp. 53–69

  18. Jung G, Yoon SM (2022) Monocular depth estimation with multi-view attention autoencoder. Multimedia Tools and Applications, 1–12

  19. Karsch K, Liu C, Kang SB (2014) Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(11):2144–2158

    Article  Google Scholar 

  20. Kim W, Kanezaki A, Tanaka M (2020) Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans Image Process 29:8055–8068

    Article  Google Scholar 

  21. Konrad J, Wang M, Ishwar P, Wu C, Mukherjee D (2013) Learning-based, automatic 2d-to-3d image and video conversion. IEEE Transactions on Image Processing 22(9):3485–3496

    Article  Google Scholar 

  22. Kumar CSA, Bhandarkar SM, Prasad M (2018) Depthnet: A recurrent neural network architecture for monocular depth prediction. IEEE Conf on Computer Vision and Pattern Recognition Workshops (CVPRW 2018). Salt Lake City, USA, pp 283–291

  23. Kundu JN, Uppala PK. Pahuja A, Babu RV (2018) Adadepth: Unsupervised content congruent adaptation for depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, USA, pp. 2656–2665

  24. Ladicky L, Shi J, Pollefeys M (2014) Pulling things out of perspective. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, USA, pp. 89-96

  25. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. IEEE Int Conf on 3D Vision (3DV 2016). California, USA, pp 239–248

  26. Lee J-H, Kim C-S (2019) Monocular depth estimation using relative depth maps. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 9729-9738

  27. Li J, Klein R, Yao A (2017) A two-streamed network for estimating finescaled depth maps from single rgb images. IEEE Int Conf on Computer Vision (ICCV 2017). Venice, Italy, pp 3372–3380

  28. Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. IEEE Conf on Computer Vision and Pattern Recognition (CVPR 2017). Honolulu, USA, pp 3000–3009

  29. Liu M, Salzmann M, He X (2014) Discrete-continuous depth estimation from a single image. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, USA, pp. 716-723

  30. Mancini M, Costante G, Valigi P, Ciarfuglia TA, Delmerico J, Scaramuzza D (2017) Toward domain independence for learning-based monocular depth estimation. IEEE Robot Autom Lett 2(3):1778–1785

  31. Ma F, Karaman S (2018) Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: IEEE Int. Conf. on Robotics and Automation (ICRA 2018), Brisbane, Australia, pp. 4796–4803

  32. Mancini M, Costante G, Valigi P, Ciarfuglia TA, Delmerico J, Scaramuzza D (2017) Toward domain independence for learning-based monocular depth estimation. IEEE Robot Autom Lett 2(3):1778–1785

    Article  Google Scholar 

  33. Poggi M, Tosi F, Batsos K, Mordohai P, Mattoccia S (2021) On the synergies between machine learning and binocular stereo for depth estimation from images: a survey. IEEE Trans Pattern Anal Mach Intell 44(9):5314–5334

    Google Scholar 

  34. Poggi M, Aleotti F, Tosi F, Mattoccia S (2020) On the uncertainty of selfsupervised monocular depth estimation. IEEE Conf on Computer Vision and Pattern Recognition (CVPR 2020). Seattle, USA, pp 3227–3237

  35. Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) Geonet: Geometric neural network for joint depth and surface normal estimation. IEEE Conf on Computer Vision and Pattern Recognition (CVPR 2018). Salt Lake City, USA, pp 283–291

  36. Ranftl R, Bochkovskiy A, Koltun V (2021) Vision transformers for dense prediction. IEEE Int Conf on Computer Vision (ICCV 2021). Montreal, Canada, pp 12179–12188

  37. Ranftl R, Vineet V, Chen Q, Koltun V (2016) Dense monocular depth estimation in complex dynamic scenes. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, USA, pp. 4058-4066

  38. Saxena A, Sun M, Ng AY (2008) Make3d: Learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5):824–840

    Article  Google Scholar 

  39. Saxena A, Chung SH, Ng AY, et al (2005) Learning depth from single monocular images. In: Advances in Neural Information Processing Systems (NIPS 2005), Vancouver, Canada, vol. 18, pp. 1–8

  40. Shi J, Tao X, Xu L, Jia J (2015) Break ames room illusion: depth from general single images. ACM Trans Graph 34(6):1–11

    Article  Google Scholar 

  41. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. European Conference on Computer Vision (ECCV 2012). Florence, Italy, pp 746–760

  42. Sun L, Li Y, Liu B, Xu L, Zhang Z, Zhu J (2021) Transferring knowledge from monocular completion for self-supervised monocular depth estimation. Multimedia Tools and Applications, 1–11

  43. Tonioni A, Poggi M, Mattoccia S, Di Stefano L (2019) Unsupervised domain adaptation for depth prediction from images. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(10):2396–2409

    Article  Google Scholar 

  44. Wang T-H, Wang F-E, Lin J-T, Tsai Y-H, Chiu W-C, Sun M (2019) Plug-and-play: Improve depth prediction via sparse data propagation. International Conference on Robotics and Automation (ICRA 2019). Montreal, Canada, pp 5880–5886

  45. Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Trans Cybern 51(10):4770–4783

    Article  Google Scholar 

  46. Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Trans Cybern 51(10):4770–4783

  47. Wang L, Zhang J, Wang O, Lin Z, Lu H (2020) Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 541–550

  48. Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, pp. 135–150

  49. Xia Z, Sullivan P, Chakrabarti A (2020) Generating and exploiting probabilistic monocular depth estimates. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 65-74

  50. Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, pp. 1395–1403

  51. Xu D, Ouyang W, Wang X., Sebe N (2018) Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, USA, pp. 675-684

  52. Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, USA, pp. 3917-3925

  53. Yang G, Tang H, Ding M, Sebe N, Ricci E (October 2021) Transformer-based attention networks for continuous pixel-wise prediction. IEEE Int Conf on Computer Vision (ICCV 2021). Montreal, Canada, pp 16269–16279

  54. Ye X, Fan X, Zhang M, Xu R, Zhong W (2021) Unsupervised monocular depth estimation via recursive stereo distillation. IEEE Trans Image Process 30:4492–4504

    Article  Google Scholar 

  55. Zhang Z, Cui Z, Xu C, Yan Y, Sebe N, Yang J (2019) Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 4106-4115

  56. Zhao Y, Kong S, Shin D, Fowlkes C (2020) Domain decluttering: Simplifying images to mitigate synthetic-real domain shift and improve depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 3330-3340

  57. Zhao S, Fu H, Gong M, Tao D (2019) Geometry-aware symmetric domain adaptation for monocular depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 9788-9798

  58. Zhuo W, Salzmann M, He X, Liu M (2015) Indoor scene structure analysis for single image depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), Boston, USA, pp. 614-622

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (No. 2020YFC1511700).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanzhu Hu.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Z., Fang, T., Hu, Y. et al. Sparse depth densification for monocular depth estimation. Multimed Tools Appl 83, 14821–14838 (2024). https://doi.org/10.1007/s11042-023-15757-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15757-4

Keywords

Navigation