Skip to main content
Log in

Robust depth completion based on Semantic Aggregation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Guided by information from RGB images, depth completion methods rebuild the dense depth from sparse depth input. However, the varying densities of valid pixels in sparse depth maps pose a significant challenge to the robustness of the completion model. To improve the robustness of depth completion, we propose a two-stage model called Semantic Aggregated Depth Completion (SADC) in this paper, comprising a coarse-grained completion stage and a fine-grained completion stage. In the coarse-grained completion stage, the Semantic Extraction Network (SEN) extracts RGB features and sends them to the Dynamic Semantic Aggregation (DSA) to predict the local semantic relationship (LSR) matrix. DSA aggregates the valid information based on the LSR matrix iteratively, resulting in coarse-grained completion results. In the fine-grained completion stage, SADC uses the Semantic Guidance Network (SGN) and Semantic Guidance Fusion (SGF) modules to refine the dense depth features from coarse-grained completion results by RGB features in multi-level and predict fine-grained completion results. We validate our method on NYU-v2 and KITTI with different valid pixel densities. The results demonstrate that SADC performs best results on benchmark tests and exhibits robustness to different densities without retraining.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

NYU-v2 Benchmark: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html KITTI Benchmark: https://www.cvlibs.net/datasets/kitti

References

  1. Song Z, Lu J, Yao Y et al (2021) Self-supervised depth completion from direct visual-lidar odometry in autonomous driving. IEEE Trans Intell Transp Syst 23(8):11654–11665. https://doi.org/10.1109/TITS.2021.3106055

    Article  Google Scholar 

  2. Dai X, Yuan X, Wei X (2020) Tirnet: object detection in thermal infrared images for autonomous driving. Appl Intell 51:1244–1261. https://doi.org/10.1007/s10489-020-01882-2

    Article  Google Scholar 

  3. Liu S, Bai X, Fang M et al (2021) Mixed graph convolution and residual transformation network for skeleton-based action recognition. Appl Intell 52:1544–1555. https://doi.org/10.1007/s10489-021-02517-w

    Article  Google Scholar 

  4. Shukla P, Pramanik N, Mehta D et al (2022) Generative model based robotic grasp pose prediction with limited dataset. Appl Intell 52:9952–9966. https://doi.org/10.1007/s10489-021-03011-z

    Article  Google Scholar 

  5. Tan Z, Gu X (2021) Depth scale balance saliency detection with connective feature pyramid and edge guidance. Appl Intell 51:5775–5792. https://doi.org/10.1007/s10489-020-02150-z

    Article  Google Scholar 

  6. Luo H, Gao Y, Wu Y et al (2019) Real-time dense monocular SLAM with online adapted depth prediction network. IEEE Trans Multimed 21(2):470–483. https://doi.org/10.1109/TMM.2018.2859034

    Article  Google Scholar 

  7. Sun J, Wang Z, Yu H et al (2022) Two-stage deep regression enhanced depth estimation from a single RGB image. IEEE Trans Emerg Top Comput 10(2):719–727. https://doi.org/10.1109/TETC.2020.3034559

    Article  Google Scholar 

  8. Zhao ZQ, Zheng P, St Xu et al (2019) Object detection with deep learning: a review. IEEE Trans Neur Net Lear 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865

    Article  Google Scholar 

  9. Pang Y, Li Y, Shen J et al (2019) Towards bridging semantic gap to improve semantic segmentation. In: International conference on computer vision (ICCV). IEEE, pp 4229–4238. https://doi.org/10.1109/ICCV.2019.00433

  10. Crespo J, Castillo JC, Mozos ÓM et al (2020) Semantic information for robot navigation: a survey. Appl Sci. https://doi.org/10.3390/app10020497

    Article  Google Scholar 

  11. Hu J, Ozay M, Zhang Y et al (2019) Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1043–1051. https://doi.org/10.1109/WACV.2019.00116

  12. Bhat SF, Alhashim I, Wonka P (2021) Adabins: depth estimation using adaptive bins. In: Conference on computer vision and pattern recognition (CVPR). Computer Vision Foundation / IEEE, p 4009–4018. https://doi.org/10.1109/CVPR46437.2021.00400

  13. Piccinelli L, Sakaridis C, Yu F (2023) idisc: internal discretization for monocular depth estimation. 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21477–21487. https://doi.org/10.1109/CVPR52729.2023.02057

  14. Wang T, Ray N (2023) Compact depth-wise separable precise network for depth completion. IEEE Access 11:72679–72688. https://doi.org/10.1109/ACCESS.2023.3294247

    Article  Google Scholar 

  15. Li Y, Jung C (2023) Deep sparse depth completion using joint depth and normal estimation. In: IEEE international symposium on circuits and systems. IEEE, pp 1–5. https://doi.org/10.1109/ISCAS46773.2023.10181618

  16. Liu L, Song X, Lyu X et al (2021) Fcfr-net: feature fusion based coarse-to-fine residual learning for depth completion. In: The AAAI conference on artificial intelligence, pp 2136–2144. https://doi.org/10.1609/aaai.v35i3.16311

  17. Wang H, Wang M, Che Z et al (2022) Rgb-depth fusion GAN for indoor depth completion. In: Conference on computer vision and pattern recognition (CVPR). IEEE, pp 6199–6208. https://doi.org/10.1109/CVPR52688.2022.00611

  18. Tang J, Tian F, Feng W et al (2020) Learning guided convolutional network for depth completion. IEEE Trans Image Process 30:1116–1129. https://doi.org/10.1109/TIP.2020.3040528

    Article  Google Scholar 

  19. Wang Y, Li B, Zhang G et al (2023) LRRU: long-short range recurrent updating networks for depth completion. In: International conference on computer vision (ICCV). IEEE, pp 9388–9398. https://doi.org/10.1109/ICCV51070.2023.00864

  20. Palmer Stephen E (1999) Vision science: photons to phenomenology. Q Rev Biol 77(4):233–234. https://doi.org/10.1086/420636

    Article  Google Scholar 

  21. Karsch K, Liu C, Kang SB (2014) Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans Pattern Anal Mach Intell 36(11):2144–2158. https://doi.org/10.1109/TPAMI.2014.2316835

    Article  Google Scholar 

  22. Konrad J, Wang M, Ishwar P (2012) 2d-to-3d image conversion by learning depth from examples. In: Proceedings of IEEE conference on computer vision and pattern recognition workshops. IEEE Computer Society, pp 16–22. https://doi.org/10.1109/CVPRW.2012.6238903

  23. Saxena A, Chung S, Ng A (2005) Learning depth from single monocular images. In: Advances in neural information processing systems, pp 1161–1168

  24. Hoiem D, Efros AA, Hebert M (2005) Automatic photo pop-up. In: International conference and exhibition on computer graphics and interactive techniques, pp 577–584. https://doi.org/10.1145/1073204.1073232

  25. Suwajanakorn S, Hernandez C, Seitz SM (2015) Depth from focus with your mobile phone. In: Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 3497–3506. https://doi.org/10.1109/CVPR.2015.7298972

  26. Yang X, Chang Q, Liu X et al (2021) Monocular depth estimation based on multi-scale depth map fusion. IEEE Access 9:67696–67705. https://doi.org/10.1109/ACCESS.2021.3076346

    Article  Google Scholar 

  27. Zhang A, Ma Y, Liu J et al (2023) Promoting monocular depth estimation by multi-scale residual laplacian pyramid fusion. IEEE Signal Process Lett 30:205–209. https://doi.org/10.1109/LSP.2023.3251921

    Article  Google Scholar 

  28. Wang L, Zhang J, Wang O et al (2020) Sdc-depth: semantic divide-and-conquer network for monocular depth estimation. In: Conference on computer vision and pattern recognition (CVPR). Computer Vision Foundation / IEEE, pp 538–547. https://doi.org/10.1109/CVPR42600.2020.00062

  29. Liu Y (2020) Multi-scale spatio-temporal feature extraction and depth estimation from sequences by ordinal classification. Sensors (Basel, Switzerland) 20. https://doi.org/10.3390/s20071979

  30. Meng X, Fan C, Ming Y et al (2022) Cornet: context-based ordinal regression network for monocular depth estimation. IEEE Trans Circuits Syst Video Technol 32:4841–4853. https://doi.org/10.1109/TCSVT.2021.3128505

    Article  Google Scholar 

  31. Lee JH, Kim CS (2019) Monocular depth estimation using relative depth maps. In: Conference on computer vision and pattern recognition (CVPR). Computer Vision Foundation / IEEE, pp 9729–9738. https://doi.org/10.1109/CVPR.2019.00996

  32. Lee JH, Kim CS (2022) Single-image depth estimation using relative depths. J Vis Commun Image Represent 84:103459. https://doi.org/10.1016/j.jvcir.2022.103459

    Article  Google Scholar 

  33. Uhrig J, Schneider N, Schneider L et al (2017) Sparsity invariant cnns. In: International conference on 3D vision (3DV). IEEE Computer Society, pp 11–20. https://doi.org/10.1109/3DV.2017.00012

  34. Knutsson H, Westin CF (1993) Normalized and differential convolution. In: Conference on computer vision and pattern recognition (CVPR). IEEE, pp 515–523. https://doi.org/10.1109/CVPR.1993.341081

  35. Eldesokey A, Felsberg M, Khan FS (2019) Confidence propagation through cnns for guided sparse depth regression. IEEE Trans Pattern Anal Mach Intell 42(10):2423–2436. https://doi.org/10.1109/TPAMI.2019.2929170

    Article  Google Scholar 

  36. Eldesokey A, Felsberg M, Holmquist K et al (2020) Uncertainty-aware cnns for depth completion: uncertainty from beginning to end. In: Conference on computer vision and pattern recognition (CVPR). Computer Vision Foundation / IEEE, pp 12011–12020. https://doi.org/10.1109/CVPR42600.2020.01203

  37. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: International conference on computer vision (ICCV). IEEE Computer Society, pp 839–846. https://doi.org/10.1109/ICCV.1998.710815

  38. Levin A, Lischinski D, Weiss Y (2004) Colorization using optimization. In: International conference and exhibition on computer graphics and interactive techniques, pp 689–694. https://doi.org/10.1145/1015706.1015780

  39. Qu C, Nguyen T, Taylor CJ (2020) Depth completion via deep basis fitting. IEEE Winter conference on applications of computer vision (WACV), pp 71–80. https://doi.org/10.1109/WACV45572.2020.9093349

  40. Senushkin D, Romanov M, Belikov I et al (2021) Decoder modulation for indoor depth completion. In: IEEE/RSJ international conference on intelligent robots and systems, pp 2181–2188. https://doi.org/10.1109/IROS51168.2021.9636870

  41. Deng Y, Deng X, Xu M (2023) A two-stage hybrid cnn-transformer network for rgb guided indoor depth completion. In: IEEE international conference on multimedia & expo (ICME), pp 1127–1132. https://doi.org/10.1109/ICME55011.2023.00197

  42. Cheng X, Wang P, Yang R (2018) Depth estimation via affinity learned with convolutional spatial propagation network. In: European conference on computer vision (ECCV), Lecture Notes in Computer Science, vol 11220. Springer, pp 108–125. https://doi.org/10.1007/978-3-030-01270-0_7

  43. Cheng X, Wang P, Guan C et al (2020) CSPN++: learning context and resource aware convolutional spatial propagation networks for depth completion. In: The AAAI conference on artificial intelligence. AAAI Press, pp 10615–10622. https://doi.org/10.1609/AAAI.V34I07.6635

  44. Xu Z, Yin H, Yao J (2020) Deformable spatial propagation networks for depth completion. In: IEEE international conference on image processing (ICIP). IEEE, pp 913–917. https://doi.org/10.1109/ICIP40778.2020.9191138

  45. Park J, Joo K, Hu Z et al (2020) Non-local spatial propagation network for depth completion. In: European conference on computer vision (ECCV), Lecture Notes in Computer Science, vol 12358. Springer, pp 120–136. https://doi.org/10.1007/978-3-030-58601-0_8

  46. Zhao Y, Bai L, Zhang Z et al (2021) A surface geometry model for lidar depth completion. IEEE Robot Autom Lett 6(3):4457–4464. https://doi.org/10.1109/LRA.2021.3068885

    Article  Google Scholar 

  47. Ren D, Yang M, Wu J et al (2023) Surface normal and gaussian weight constraints for indoor depth structure completion. Pattern Recognit 138:109362. https://doi.org/10.1016/j.patcog.2023.109362

    Article  Google Scholar 

  48. Zhu Y, Dong W, Li L et al (2022) Robust depth completion with uncertainty-driven loss functions. In: The AAAI conference on artificial intelligence. AAAI Press, pp 3626–3634. https://doi.org/10.1609/AAAI.V36I3.20275

  49. Chen R, Liu I, Yang E et al (2023) Activezero++: mixed domain learning stereo and confidence-based depth completion with zero annotation. IEEE Trans Pattern Anal Mach Intell 45(12):14098–14113. https://doi.org/10.1109/TPAMI.2023.3305399

    Article  Google Scholar 

  50. Zhao T, Pan S, Zhang H et al (2021) Dilated u-block for lightweight indoor depth completion with sobel edge. IEEE Signal Process Lett 28:1615–1619. https://doi.org/10.1109/LSP.2021.3092280

    Article  Google Scholar 

  51. Ramesh AN, Giovanneschi F, González-Huici MA (2023) Siunet: sparsity invariant u-net for edge-aware depth completion. In: IEEE winter conference on applications of computer vision (WACV). IEEE, pp 5807–5816. https://doi.org/10.1109/WACV56688.2023.00577

  52. Jeon Y, Kim H, Seo SW (2021) ABCD: attentive bilateral convolutional network for robust depth completion. IEEE Robot Autom Lett 7(1):81–87. https://doi.org/10.1109/LRA.2021.3117254

    Article  Google Scholar 

  53. Liu X, Shao X, Wang B et al (2022) Graphcspn: geometry-aware depth completion via dynamic gcns. In: European conference on computer vision (ECCV), Lecture Notes in Computer Science, vol 13693. Springer, pp 90–107. https://doi.org/10.1007/978-3-031-19827-4_6

  54. Yu Z, Sheng Z, Zhou Z et al (2023) Aggregating feature point cloud for depth completion. In: International conference on computer vision (ICCV). IEEE, pp 8698–8709. https://doi.org/10.1109/ICCV51070.2023.00802

  55. Zou Z, Chen K, Shi Z et al (2023) Object detection in 20 years: a survey. Proc IEEE 111(3):257–276. https://doi.org/10.1109/JPROC.2023.3238524

  56. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  57. Shi W, Caballero J, Huszár F et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1874–1883. https://doi.org/10.1109/CVPR.2016.207

  58. Silberman N, Hoiem D, Kohli P et al (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision (ECCV), vol 7576. Springer, pp 746–760. https://doi.org/10.1007/978-3-642-33715-4_54

  59. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Conference on computer vision and pattern recognition (CVPR). Computer Vision Foundation / IEEE Computer Society, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745

  60. Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp 8024–8035

  61. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations (ICLR). OpenReview.net

Download references

Acknowledgements

This research is funded by the Science and Technology Commission of Shanghai Municipality (20511105102), The computation is performed in ECNU Multifunctional Platform for Innovation (001).

Funding

This research is funded by the Science and Technology Commission of Shanghai Municipality (20511105102).

Author information

Authors and Affiliations

Authors

Contributions

Zhichao Fu contributed to the design and implementation of the research, analyze the results, and wrote the main manuscript text. Xin Li prepared Figs. 1-2 and wrote the manuscript. Tianyu Huai and Weijie Li prepared the experiment results and wrote the manuscript. Daoguo Dong and Liang He devised the project and supervised the research. All authors reviewed the manuscript.

Corresponding author

Correspondence to Daoguo Dong.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Z., Li, X., Huai, T. et al. Robust depth completion based on Semantic Aggregation. Appl Intell 54, 3825–3840 (2024). https://doi.org/10.1007/s10489-024-05366-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05366-5

Keywords

Navigation