Sparse depth densification for monocular depth estimation

Liang, Zhen; Fang, Tiyu; Hu, Yanzhu; Wang, Yingjian

doi:10.1007/s11042-023-15757-4

Sparse depth densification for monocular depth estimation

Published: 11 July 2023

Volume 83, pages 14821–14838, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhen Liang^1,2,
Tiyu Fang³,
Yanzhu Hu^1,2 &
…
Yingjian Wang^1,2

228 Accesses
1 Altmetric
Explore all metrics

Abstract

Now the dense depth prediction by single image and a few sparse depth measurements has attracted more and more attention because it provides a low-cost and efficient solution for estimating high-quality depth information. But the current existing methods for the field only take sparse depth as an independent dimension, and the relationship between sparse depth and image itself is always ignored, which undoubtedly limits the improvement of prediction accuracy. For solving the problem, in this paper, a sparse depth densification method is proposed to fully mine the relationship between sparse depth and image for achieving more accurate depth estimation. Based on a priori that the object areas of same category have similar depth values, a Depth Densification Map (DDM) is constructed by the segmentation label obtained from unsupervised image segmentation and sparse depth to realize sparse depth densification. Meantime, considering the potential error of DDM, a Depth Error Map (DEM) is designed to further correct DDM. Then, we use the idea of multi-scale fusion to build a depth estimation network. Finally, the proposed maps are combined with single image as the input of the network, and used to carry out the actual training and testing. Extensive experiments on NYU Depth v2 and Make3D datasets demonstrate the superiority of our proposed approach. Our code is available at https://github.com/Jennifer108/Sparse-Depth-Densification. https://github.com/Jennifer108/Sparse-Depth-Densification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Data Availability

The datasets generated during and/or analysed during the current study are available in the NYU Depth Dataset V2 and Make3D repository, https://cs.nyu.edu/silberman/datasets/nyu_depth_v2.html. http://make3d.cs.cornell.edu/data.html.

References

Atapour-Abarghouei A, Breckon TP (2019) Veritatem dies aperit-temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 3373–3384
Bhat SF, Alhashim I, Wonka P (2021) Adabins: Depth estimation using adaptive bins. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2021), Nashville, USA, pp. 4009–4018
Bian J-W, Zhan H, Wang N, Chin T-J, Shen C, Reid I (2021) Auto-rectify network for unsupervised indoor depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence
Bian J-W, Zhan H, Wang N, Li Z, Zhang L, Shen C, Cheng M-M, Reid I (2021) Unsupervised scale-consistent depth learning from video. Int J Comput Vis 129(9):2548–2564
Bian J-W, Zhan H, Wang N, Li Z, Zhang L, Shen C, Cheng M-M, Reid I (2021) Unsupervised scale-consistent depth learning from video. Int J Comput Vis 129(9):2548–2564
Article Google Scholar
Chen Z, Badrinarayanan V, Drozdov G, Rabinovich A (2018) Estimating depth from rgb and sparse sensing. European Conference on Computer Vision (ECCV 2018). Munich, Germany, pp 167–182
Chen P-Y, Liu AH, Liu Y-C, Wang Y-C F (2019) Towards scene understanding: Unsupervised monocular depth estimation with semanticaware representation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 2624–2632
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Ad Neural Inf Process Syst 27:2366–2374
Google Scholar
Gao H, Cheng B, Wang J, Li K, Zhao J, Li D (2018) Object classification using cnn-based fusion of vision and lidar in autonomous vehicle environment. IEEE Trans Ind Inform 14(9):4224–4231
Article Google Scholar
Gwn Lore K, Reddy K, Giering M, Bernal EA (2018) Generative adversarial networks for depth map estimation from rgb video. IEEE Conf on Computer Vision and Pattern Recognition Workshops (CVPRW 2018). Salt Lake City, USA, pp 1177–1185
Han Y, Zhan IH, Zhao W, Pan J, Zhang Z, Wang Y, Liu Y-J (2022) Deep reinforcement learning for robot collision avoidance with self-stateattention and sensor fusion. IEEE Robot Autom Lett 7(3):6886–6893
Article Google Scholar
Han J, Liu B, Jia Y, Jin S, Sulowicz M, Glowacz A, Krolczyk G, Li Z (2022) A new kinect v2-based method for visual recognition and grasping of a yarn-bobbin-handling robot. Micromachines 13(6):886–896
Article Google Scholar
Han J, Liu B, Jia Y, Jin S, Sulowicz M, Glowacz A, Krolczyk G, li Z, (2022) A new kinect v2-based method for visual recognition and grasping of a yarn-bobbin-handling robot. Micromachines 13(6):886–896
Hu J, Zhang Y, Okatani T (2019) Visualization of convolutional neural networks for monocular depth estimation. IEEE Int Conf on Computer Vision (ICCV 2019). Seoul, Korea (South), pp 3869–3878
Hu J, Ozay M, Zhang Y, Okatani T (2019) Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: IEEE Winter Conference on Applications of Computer Vision (WACV 2019), Waikoloa, USA, pp. 1043–1051
Johnston A, Carneiro G (2020) Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 4756–4765
Jiao J, Cao Y, Song Y, Lau R (2018) Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In: European Conference on Computer Vision (ECCV 2018), Munich, Germany, pp. 53–69
Jung G, Yoon SM (2022) Monocular depth estimation with multi-view attention autoencoder. Multimedia Tools and Applications, 1–12
Karsch K, Liu C, Kang SB (2014) Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(11):2144–2158
Article Google Scholar
Kim W, Kanezaki A, Tanaka M (2020) Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans Image Process 29:8055–8068
Article Google Scholar
Konrad J, Wang M, Ishwar P, Wu C, Mukherjee D (2013) Learning-based, automatic 2d-to-3d image and video conversion. IEEE Transactions on Image Processing 22(9):3485–3496
Article Google Scholar
Kumar CSA, Bhandarkar SM, Prasad M (2018) Depthnet: A recurrent neural network architecture for monocular depth prediction. IEEE Conf on Computer Vision and Pattern Recognition Workshops (CVPRW 2018). Salt Lake City, USA, pp 283–291
Kundu JN, Uppala PK. Pahuja A, Babu RV (2018) Adadepth: Unsupervised content congruent adaptation for depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, USA, pp. 2656–2665
Ladicky L, Shi J, Pollefeys M (2014) Pulling things out of perspective. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, USA, pp. 89-96
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. IEEE Int Conf on 3D Vision (3DV 2016). California, USA, pp 239–248
Lee J-H, Kim C-S (2019) Monocular depth estimation using relative depth maps. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 9729-9738
Li J, Klein R, Yao A (2017) A two-streamed network for estimating finescaled depth maps from single rgb images. IEEE Int Conf on Computer Vision (ICCV 2017). Venice, Italy, pp 3372–3380
Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. IEEE Conf on Computer Vision and Pattern Recognition (CVPR 2017). Honolulu, USA, pp 3000–3009
Liu M, Salzmann M, He X (2014) Discrete-continuous depth estimation from a single image. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, USA, pp. 716-723
Mancini M, Costante G, Valigi P, Ciarfuglia TA, Delmerico J, Scaramuzza D (2017) Toward domain independence for learning-based monocular depth estimation. IEEE Robot Autom Lett 2(3):1778–1785
Ma F, Karaman S (2018) Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: IEEE Int. Conf. on Robotics and Automation (ICRA 2018), Brisbane, Australia, pp. 4796–4803
Mancini M, Costante G, Valigi P, Ciarfuglia TA, Delmerico J, Scaramuzza D (2017) Toward domain independence for learning-based monocular depth estimation. IEEE Robot Autom Lett 2(3):1778–1785
Article Google Scholar
Poggi M, Tosi F, Batsos K, Mordohai P, Mattoccia S (2021) On the synergies between machine learning and binocular stereo for depth estimation from images: a survey. IEEE Trans Pattern Anal Mach Intell 44(9):5314–5334
Google Scholar
Poggi M, Aleotti F, Tosi F, Mattoccia S (2020) On the uncertainty of selfsupervised monocular depth estimation. IEEE Conf on Computer Vision and Pattern Recognition (CVPR 2020). Seattle, USA, pp 3227–3237
Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) Geonet: Geometric neural network for joint depth and surface normal estimation. IEEE Conf on Computer Vision and Pattern Recognition (CVPR 2018). Salt Lake City, USA, pp 283–291
Ranftl R, Bochkovskiy A, Koltun V (2021) Vision transformers for dense prediction. IEEE Int Conf on Computer Vision (ICCV 2021). Montreal, Canada, pp 12179–12188
Ranftl R, Vineet V, Chen Q, Koltun V (2016) Dense monocular depth estimation in complex dynamic scenes. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, USA, pp. 4058-4066
Saxena A, Sun M, Ng AY (2008) Make3d: Learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5):824–840
Article Google Scholar
Saxena A, Chung SH, Ng AY, et al (2005) Learning depth from single monocular images. In: Advances in Neural Information Processing Systems (NIPS 2005), Vancouver, Canada, vol. 18, pp. 1–8
Shi J, Tao X, Xu L, Jia J (2015) Break ames room illusion: depth from general single images. ACM Trans Graph 34(6):1–11
Article Google Scholar
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. European Conference on Computer Vision (ECCV 2012). Florence, Italy, pp 746–760
Sun L, Li Y, Liu B, Xu L, Zhang Z, Zhu J (2021) Transferring knowledge from monocular completion for self-supervised monocular depth estimation. Multimedia Tools and Applications, 1–11
Tonioni A, Poggi M, Mattoccia S, Di Stefano L (2019) Unsupervised domain adaptation for depth prediction from images. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(10):2396–2409
Article Google Scholar
Wang T-H, Wang F-E, Lin J-T, Tsai Y-H, Chiu W-C, Sun M (2019) Plug-and-play: Improve depth prediction via sparse data propagation. International Conference on Robotics and Automation (ICRA 2019). Montreal, Canada, pp 5880–5886
Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Trans Cybern 51(10):4770–4783
Article Google Scholar
Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Trans Cybern 51(10):4770–4783
Wang L, Zhang J, Wang O, Lin Z, Lu H (2020) Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 541–550
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, pp. 135–150
Xia Z, Sullivan P, Chakrabarti A (2020) Generating and exploiting probabilistic monocular depth estimates. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 65-74
Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, pp. 1395–1403
Xu D, Ouyang W, Wang X., Sebe N (2018) Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, USA, pp. 675-684
Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, USA, pp. 3917-3925
Yang G, Tang H, Ding M, Sebe N, Ricci E (October 2021) Transformer-based attention networks for continuous pixel-wise prediction. IEEE Int Conf on Computer Vision (ICCV 2021). Montreal, Canada, pp 16269–16279
Ye X, Fan X, Zhang M, Xu R, Zhong W (2021) Unsupervised monocular depth estimation via recursive stereo distillation. IEEE Trans Image Process 30:4492–4504
Article Google Scholar
Zhang Z, Cui Z, Xu C, Yan Y, Sebe N, Yang J (2019) Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 4106-4115
Zhao Y, Kong S, Shin D, Fowlkes C (2020) Domain decluttering: Simplifying images to mitigate synthetic-real domain shift and improve depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, USA, pp. 3330-3340
Zhao S, Fu H, Gong M, Tao D (2019) Geometry-aware symmetric domain adaptation for monocular depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, USA, pp. 9788-9798
Zhuo W, Salzmann M, He X, Liu M (2015) Indoor scene structure analysis for single image depth estimation. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), Boston, USA, pp. 614-622

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (No. 2020YFC1511700).

Author information

Authors and Affiliations

Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, People’s Republic of China
Zhen Liang, Yanzhu Hu & Yingjian Wang
School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing, 100876, People’s Republic of China
Zhen Liang, Yanzhu Hu & Yingjian Wang
School of Control Science and Engineering, Shandong University, Jinan, 250100, People’s Republic of China
Tiyu Fang

Authors

Zhen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Tiyu Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yanzhu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yingjian Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanzhu Hu.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, Z., Fang, T., Hu, Y. et al. Sparse depth densification for monocular depth estimation. Multimed Tools Appl 83, 14821–14838 (2024). https://doi.org/10.1007/s11042-023-15757-4

Download citation

Received: 20 November 2022
Revised: 04 March 2023
Accepted: 25 April 2023
Published: 11 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15757-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse depth densification for monocular depth estimation

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse depth densification for monocular depth estimation

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation