Abstract
Laser radar (Lidar) plays an indispensable role in lots of security critical applications such as autonomous driving. However, the high sparsity and non-uniformity nature of the raw laser data brings large difficulties to reliable 3D scene understanding. Traditional depth completion methods suffer from the highly ill-conditioned nature of the problem. A novel end-to-end road semantic guided depth completion neural network with a special designed Asymmetric Multiscale Convolution (AMC) structure is proposed in this paper. The whole network is composed of two parts: semantic part and depth completion part. The semantic part is constructed by an image-Lidar joint segmentation sub-network which produces semantic masks (ground or object) to the following network. The depth completion part is composed of a series of AMC convolution structure. By combining the semantic masks and treating the ground and non-ground objects separately, the proposed AMC structure can well fit the depth distribution pattern implied in road scene. The experiments carried on both synthesized and real datasets demonstrate that our method can effectively improve the accuracy of depth completion results.
Similar content being viewed by others
References
Geiger A, Lenz P, Stiller C, Urtasun R (2015) The kitti vision benchmark suite
Ku J, Harakeh A, Waslander SL (2018) In: defense of classical image processing: Fast depth completion on the cpu. arXiv preprint arXiv:1802.00036
Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant cnns. In: International conference on 3D vision (3DV) 2017
Chodosh N, Wang C, Lucey S (2018) Deep convolutional compressed sensing for lidar depth completion. arXiv preprint arXiv:1803.08949
Harrison A, Newman P (2010) Image and sparse laser fusion for dense scene reconstruction. In: Field and service robotics. Springer, New York, pp 219–228
Ferstl D, Reinbacher C, Ranftl R, Rüther M, Bischof H (2013) Image guided depth upsampling using anisotropic total generalized variation. In: 2013 IEEE International conference on computer vision (ICCV). IEEE, pp 993–1000
Schneider N, Schneider L, Pinggera P, Franke U, Pollefeys M, Stiller C (2016) Semantically guided depth upsampling. In: German conference on pattern recognition. Springer, New York, pp 37–48
Ma F, Karaman S (2017) Sparse-to-dense: depth prediction from sparse depth samples and a single image. arXiv preprint arXiv:1709.07492
Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: CVPR
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3354–3361
Saxena A, Chung SH, Ng AY (2006) Learning depth from single monocular images. In: Advances in neural information processing systems, pp 1161–1168
Saxena A, Sun M, Ng AY (2009) Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31:824–840
Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1253–1260
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658
Li B, Shen C, Dai Y, van den Hengel A, He M (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1119–1127
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2809
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV). IEEE, pp 239–248
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38:295–307
Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Hou H, Andrews H (1978) Cubic splines for image interpolation and digital filtering. IEEE Trans Acoust Speech Signal Process 26:508–517
Yang J, Wright J, Huang TS, Ma Y (2010) Image super-resolution via sparse representation. IEEE Trans Image Process 19:2861–2873
Riegler G, Rüther M, Bischof H (2016) Atgv-net: accurate depth super-resolution. In: European conference on computer vision. Springer, New York, pp 268–284
Eldesokey A, Felsberg M, Shahbaz Khan F (2018) Propagating confidences through cnns for sparse data regression. arXiv preprint arXiv:1805.11913
Tomasi C, Manduchi R(1998) Bilateral filtering for gray and color images. In: Sixth international conference on computer vision, 1998. IEEE, pp 839–846
Yang Q, Yang R, Davis J, Nistér D (2007) Spatial-depth super resolution for range images. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE pp 1–8
Park J, Kim H, Tai YW, Brown MS, Kweon I (2011) High quality depth map upsampling for 3d-tof cameras. In: 2011 IEEE international conference on computer vision (ICCV), IEEE, pp 1623–1630
Gong X, Ren J, Lai B, Yan C, Qian H (2014) Guided depth upsampling via a cosparse analysis model. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 724–731
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Acknowledgements
The work is supported by NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization under Grant No. U1709214 and NSFC Grant No. 61571390.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, N., Xiang, Z. & Chen, Y. RSDCN: A Road Semantic Guided Sparse Depth Completion Network. Neural Process Lett 51, 2737–2749 (2020). https://doi.org/10.1007/s11063-020-10226-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10226-7