Abstract
Depth sensing is critical to many computer vision applications but remains challenge to generate accurate dense information with single type sensor. The stereo camera sensor can provide dense depth prediction but underperforms in texture-less, repetitive and occlusion areas while the LiDAR sensor can generate accurate measurements but results in sparse map. In this paper, we advocate to fuse LiDAR and stereo camera for accurate dense depth sensing. We consider the fusion of multiple sensors as a multimodal prediction problem. We propose a novel end-to-end learning framework, dubbed as LSMD-Net to faithfully generate dense depth. The proposed method has dual-branch disparity predictor and predicts a bimodal Laplacian distribution over disparity at each pixel. This distribution has two modes which captures the information from two branches. Predictions from the branch with higher confidence is selected as the final disparity result at each specific pixel. Our fusion method can be applied for different type of LiDARs. Besides the existing dataset captured by conventional spinning LiDAR, we build a multiple sensor system with a non-repeating scanning LiDAR and a stereo camera and construct a depth prediction dataset with this system. Evaluations on both KITTI datasets and our home-made dataset demonstrate the superiority of our proposed method in terms of accuracy and computation time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Badino, H., Huber, D., Kanade, T.: Integrating lidar into stereo for fast and improved disparity computation. In: 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 405–412 (2011). https://doi.org/10.1109/3DIMPVT.2011.58
Bangunharcana, A., Cho, J.W., Lee, S., Kweon, I.S., Kim, K.S., Kim, S.: Correlate-and-excite: real-time stereo matching via guided cost volume excitation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3542–3548 (2021). https://doi.org/10.1109/IROS51168.2021.9635909
Bishop, C.M.: Mixture density networks. IEEE Computer Society (1994)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). https://doi.org/10.1109/CVPR.2018.00567
Cheng, X., Zhong, Y., Dai, Y., Ji, P., Li, H.: Noise-aware unsupervised deep lidar-stereo fusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6332–6341 (2019). https://doi.org/10.1109/CVPR.2019.00650
Choe, J., Joo, K., Imtiaz, T., Kweon, I.S.: Volumetric propagation network: stereo-lidar fusion for long-range depth estimation. IEEE Robot. Automation Lett. 6(3), 4672–4679 (2021). https://doi.org/10.1109/LRA.2021.3068712
Civera, J., Davison, A.J., Montiel, J.M.M.: Inverse depth parametrization for monocular slam. IEEE Trans. Rob. 24(5), 932–945 (2008). https://doi.org/10.1109/TRO.2008.2003276
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Gandhi, V., Čech, J., Horaud, R.: High-resolution depth maps based on tof-stereo fusion. In: 2012 IEEE International Conference on Robotics and Automation, pp. 4742–4749 (2012). https://doi.org/10.1109/ICRA.2012.6224771
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611 (2017). https://doi.org/10.1109/CVPR.2017.699
Guillaumes, A.B.: Mixture density networks for distribution and uncertainty estimation. Ph.D. thesis, Universitat Politècnica de Catalunya. Facultat d’Informàtica de Barcelona (2017)
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3268–3277 (2019). https://doi.org/10.1109/CVPR.2019.00339
Guzman-Rivera, A., Batra, D., Kohli, P.: Multiple choice learning: learning to produce multiple structured outputs. In: Advances in Neural Information Processing Systems 25 (2012)
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008). https://doi.org/10.1109/TPAMI.2007.1166
Huang, Z., Fan, J., Cheng, S., Yi, S., Wang, X., Li, H.: Hms-net: hierarchical multi-scale sparsity-invariant network for sparse depth completion. IEEE Trans. Image Process. 29, 3429–3441 (2020). https://doi.org/10.1109/TIP.2019.2960589
Häger, G., Persson, M., Felsberg, M.: Predicting disparity distributions. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4363–4369 (2021). https://doi.org/10.1109/ICRA48506.2021.9561617
Itseez: Open source computer vision library (2015). https://github.com/itseez/opencv
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017). https://doi.org/10.1109/ICCV.2017.17
Li, A., Yuan, Z., Ling, Y., Chi, W., Zhang, S., Zhang, C.: A multi-scale guided cascade hourglass network for depth completion. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 32–40 (2020). https://doi.org/10.1109/WACV45572.2020.9093407
Liu, Z., Zhang, F., Hong, X.: Low-cost retina-like robotic lidars based on incommensurable scanning. IEEE/ASME Trans. Mechatron. 27(1), 58–68 (2022). https://doi.org/10.1109/TMECH.2021.3058173
Maddern, W., Newman, P.: Real-time probabilistic fusion of sparse 3d lidar and dense stereo. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2181–2188 (2016). https://doi.org/10.1109/IROS.2016.7759342
Makansi, O., Ilg, E., Cicek, Z., Brox, T.: Overcoming limitations of mixture density networks: a sampling and fitting framework for multimodal future prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7137–7146 (2019). https://doi.org/10.1109/CVPR.2019.00731
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016). https://doi.org/10.1109/CVPR.2016.438
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015). https://doi.org/10.1109/CVPR.2015.7298925
Nickels, K., Castano, A., Cianci, C.: Fusion of lidar and stereo range for mobile robots. In: International Conference on Advanced Robotics (2003)
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 878–886 (2017). https://doi.org/10.1109/ICCVW.2017.108
Park, K., Kim, S., Sohn, K.: High-precision depth estimation with the 3d lidar and stereo fusion. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2156–2163 (2018). https://doi.org/10.1109/ICRA.2018.8461048
Poggi, M., Pallotti, D., Tosi, F., Mattoccia, S.: Guided stereo matching. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 979–988 (2019). https://doi.org/10.1109/CVPR.2019.00107
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3
Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2538–2547 (2017). https://doi.org/10.1109/CVPR.2017.272
Shivakumar, S.S., Mohta, K., Pfrommer, B., Kumar, V., Taylor, C.J.: Real time dense depth estimation by fusing stereo with sparse depth measurements. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6482–6488 (2019). https://doi.org/10.1109/ICRA.2019.8794023
Tosi, F., Liao, Y., Schmitt, C., Geiger, A.: Smd-nets: stereo mixture density networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8938–8948 (2021). https://doi.org/10.1109/CVPR46437.2021.00883
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20 (2017). https://doi.org/10.1109/3DV.2017.00012
Wang, T.H., Hu, H.N., Lin, C.H., Tsai, Y.H., Chiu, W.C., Sun, M.: 3d lidar and stereo fusion using stereo matching network with conditional cost volume normalization. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5895–5902 (2019). https://doi.org/10.1109/IROS40897.2019.8968170
Wu, C.Y., Neumann, U.: Scene completeness-aware lidar depth completion for driving scenario. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2490–2494 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414295
Xie, Y., et al.: A4lidartag: depth-based fiducial marker for extrinsic calibration of solid-state lidar and camera. IEEE Robot. Autom. Lett., 1 (2022). https://doi.org/10.1109/LRA.2022.3173033
Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., Zhou, B.: Drivingstereo: a large-scale dataset for stereo matching in autonomous driving scenarios. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 899–908 (2019). https://doi.org/10.1109/CVPR.2019.00099
Zhang, J., Ramanagopal, M.S., Vasudevan, R., Johnson-Roberson, M.: Listereo: generate dense depth maps from lidar and stereo imagery. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7829–7836 (2020). https://doi.org/10.1109/ICRA40945.2020.9196628
Zhu, Y., Zheng, C., Yuan, C., Huang, X., Hong, X.: Camvox: a low-cost and accurate lidar-assisted visual slam system. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5049–5055 (2021). https://doi.org/10.1109/ICRA48506.2021.9561149
Acknowledgement
This work was supported by the National Key Research and Development Program of China under Grant 2020AAA0108302.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yin, H. et al. (2023). LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-26319-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26318-7
Online ISBN: 978-3-031-26319-4
eBook Packages: Computer ScienceComputer Science (R0)