Abstract
Recently, image guided depth completion has been widely explored and obtained remarkable performance. However, most of the existing image guided depth completion methods often fuse multilevel features or multimodal features by performing simple element-wise addition or concatenation, hence hindering the fusion effectiveness. To address this issue, we propose a gated recurrent fusion UNet for more effective feature fusion. Specifically, a gated recurrent model is used to adaptively select and fuse useful information from multilevel features. Moreover, we introduce a dimension-wise attention based CSPN++ module to iteratively refine depth estimation, which incorporates dimension-wise attention to fully learn feature interdependencies to obtain a more accurate neighborhood affinity matrix for spatial propagation. Comprehensive experimental results demonstrate that the proposed method outperforms many state-of-the-art methods in terms of both quantitative evaluations and visual qualities.
Similar content being viewed by others
References
Zheng WL, Shen SC, Lu BL (2017) Online depth image-based object tracking with sparse representation and object detection. Neural Process Lett 45(3):745–758
Zhao Z, Huang Z, Chai X, Wang J (2022) Depth enhanced cross-modal cascaded network for RGB-D salient object detection. Neural Process Lett 1–24
Li T, Lin H, Dong X, Zhang X (2020) Depth image super-resolution using correlation-controlled color guidance and multi-scale symmetric network. Pattern Recognit 107:107513
Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant cnns. In: International conference on 3D vision (3DV), pp 11–20
Eldesokey A, Felsberg M, Khan, FS (2018) Propagating confidences through cnns for sparse data regression. arXiv preprint arXiv:1805.11913
Chodosh N, Wang C, Lucey S (2018) Deep convolutional compressed sensing for lidar depth completion. In: Asian conference on computer vision, pp 499–513
Zhang Y, Funkhouser T (2018) Deep depth completion of a single rgb-d image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 175–185
Qiu J, Cui Z, Zhang Y, Zhang X, Liu S, Zeng B, Pollefeys M (2019) Deeplidar: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3313–3322
Xu Y, Zhu X, Shi J, Zhang G, Bao H, Li H (2019) Depth completion from sparse lidar data with depth-normal constraints. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2811–2820
Ma F, Karaman S (2018) Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: IEEE international conference on robotics and automation (ICRA), pp 4796–4803
Ma F, Cavalheiro GV, Karaman S (2019) Self-supervised sparse-to-dense: self-supervised depth completion from lidar and monocular camera. In: International conference on robotics and automation (ICRA), pp 3288–3295
Eldesokey A, Felsberg M, Khan FS (2019) Confidence propagation through cnns for guided sparse depth regression. IEEE Trans Pattern Anal Mach Intell 42(10):2423–2436
Zhao S, Gong M, Fu H, Tao D (2021) Adaptive context-aware multi-modal network for depth completion. IEEE Trans Image Process 30:5264–5276
Zou N, Xiang Z, Chen Y (2020) RSDCN: a road semantic guided sparse depth completion network. Neural Process Lett 51(3):2737–2749
Cheng X, Wang P, Yang R (2018) Depth estimation via affinity learned with convolutional spatial propagation network. In: Proceedings of the European conference on computer vision (ECCV), pp 103–119
Cheng X, Wang P, Guan C, Yang R (2020) Cspn++: learning context and resource aware convolutional spatial propagation networks for depth completion. In: Proceedings of the AAAI conference on artificial intelligence, pp 10615–10622
Hu M, Wang S, Li B, Ning S, Fan L, Gong X (2021) Penet: towards precise and efficient image guided depth completion. In: IEEE international conference on robotics and automation (ICRA), pp 13656–13662
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Arefin MR, Michalski V, St-Charles PL, Kalaitzis A, Kim S, Kahou SE, Bengio Y (2020) Multi-image super-resolution for remote sensing using deep recurrent networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 206–207
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 286–301
Dai T, Cai J, Zhang Y, Xia ST, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11065–11074
Xiang X, Ren W, Qiu Y, Zhang K, Lv N (2021) Multi-object tracking method based on efficient channel attention and switchable atrous convolution. Neural Process Lett 53(4):2747–2763
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, pp 746–760
Herrera CD, Kannala J, Heikkilä J (2013) Depth map inpainting under a second-order smoothness prior. In: Scandinavian conference on image analysis, pp 555–566
Schneider N, Schneider L, Pinggera P, Franke U, Pollefeys M, Stiller C (2016) Semantically guided depth upsampling. In: German conference on pattern recognition, pp 37–48
Barron JT, Poole B (2016) The fast bilateral solver. In: European conference on computer vision, pp 617–632
Tang J, Tian FP, Feng W, Li J, Tan P (2020) Learning guided convolutional network for depth completion. IEEE Trans Image Process 30:1116–1129
Perona P, Malik J (1990) Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 12(7):629–639
Wang Y, Ren W, Wang H (2013) Anisotropic second and fourth order diffusion models based on convolutional virtual electric field for image denoising. Comput Math Appl 66(10):1729–1742
Park J, Joo K, Hu Z, Liu CK, So Kweon I (2020) Non-local spatial propagation network for depth completion. In: European conference on computer vision, pp 120–136
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Sha G, Wu J, Yu B (2021) A robust segmentation method based on improved U-Net. Neural Process Lett 53(4):2947–2965
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ballas N, Yao L, Pal C, Courville A (2015) Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Khan AH, Cao X, Li S, Katsikis VN, Liao L (2020) BAS-ADAM: an ADAM based approach to improve the performance of beetle antennae search optimizer. IEEE J Autom Sin 7(2):461–471
Li A, Yuan Z, Ling Y, Chi W, Zhang C (2020) A multi-scale guided cascade hourglass network for depth completion. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 32–40
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Huang Z, Fan J, Cheng S, Yi S, Wang X, Li H (2019) Hms-net: hierarchical multi-scale sparsity-invariant network for sparse depth completion. IEEE Trans on Image Process 29:3429–3441
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61901392 and 62041109), the Department of Science and Technology of Sichuan Province (Grant No. 2021YJ0109).
Author information
Authors and Affiliations
Contributions
TL: Conceptualization, Methodology, Software, Formal Analysis, Writing—Original Draft; XD: Resources, Supervision; HL: Visualization, Writing—Review.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, T., Dong, X. & Lin, H. Gated Recurrent Fusion UNet for Depth Completion. Neural Process Lett 55, 10463–10481 (2023). https://doi.org/10.1007/s11063-023-11334-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11334-w