LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing

Yin, Hanxi; Deng, Lei; Chen, Zhixiang; Chen, Baohua; Sun, Ting; Xie, Yuseng; Xiao, Junwei; Fu, Yeyu; Deng, Shuixin; Li, Xiu

doi:10.1007/978-3-031-26319-4_6

Hanxi Yin⁶,
Lei Deng⁷,
Zhixiang Chen⁸,
Baohua Chen⁹,
Ting Sun⁷,
Yuseng Xie⁷,
Junwei Xiao⁶,
Yeyu Fu⁷,
Shuixin Deng⁷ &
…
Xiu Li⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13841))

Included in the following conference series:

Asian Conference on Computer Vision

444 Accesses

Abstract

Depth sensing is critical to many computer vision applications but remains challenge to generate accurate dense information with single type sensor. The stereo camera sensor can provide dense depth prediction but underperforms in texture-less, repetitive and occlusion areas while the LiDAR sensor can generate accurate measurements but results in sparse map. In this paper, we advocate to fuse LiDAR and stereo camera for accurate dense depth sensing. We consider the fusion of multiple sensors as a multimodal prediction problem. We propose a novel end-to-end learning framework, dubbed as LSMD-Net to faithfully generate dense depth. The proposed method has dual-branch disparity predictor and predicts a bimodal Laplacian distribution over disparity at each pixel. This distribution has two modes which captures the information from two branches. Predictions from the branch with higher confidence is selected as the final disparity result at each specific pixel. Our fusion method can be applied for different type of LiDARs. Besides the existing dataset captured by conventional spinning LiDAR, we build a multiple sensor system with a non-repeating scanning LiDAR and a stereo camera and construct a depth prediction dataset with this system. Evaluations on both KITTI datasets and our home-made dataset demonstrate the superiority of our proposed method in terms of accuracy and computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Badino, H., Huber, D., Kanade, T.: Integrating lidar into stereo for fast and improved disparity computation. In: 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 405–412 (2011). https://doi.org/10.1109/3DIMPVT.2011.58
Bangunharcana, A., Cho, J.W., Lee, S., Kweon, I.S., Kim, K.S., Kim, S.: Correlate-and-excite: real-time stereo matching via guided cost volume excitation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3542–3548 (2021). https://doi.org/10.1109/IROS51168.2021.9635909
Bishop, C.M.: Mixture density networks. IEEE Computer Society (1994)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). https://doi.org/10.1109/CVPR.2018.00567
Cheng, X., Zhong, Y., Dai, Y., Ji, P., Li, H.: Noise-aware unsupervised deep lidar-stereo fusion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6332–6341 (2019). https://doi.org/10.1109/CVPR.2019.00650
Choe, J., Joo, K., Imtiaz, T., Kweon, I.S.: Volumetric propagation network: stereo-lidar fusion for long-range depth estimation. IEEE Robot. Automation Lett. 6(3), 4672–4679 (2021). https://doi.org/10.1109/LRA.2021.3068712
Article Google Scholar
Civera, J., Davison, A.J., Montiel, J.M.M.: Inverse depth parametrization for monocular slam. IEEE Trans. Rob. 24(5), 932–945 (2008). https://doi.org/10.1109/TRO.2008.2003276
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Gandhi, V., Čech, J., Horaud, R.: High-resolution depth maps based on tof-stereo fusion. In: 2012 IEEE International Conference on Robotics and Automation, pp. 4742–4749 (2012). https://doi.org/10.1109/ICRA.2012.6224771
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611 (2017). https://doi.org/10.1109/CVPR.2017.699
Guillaumes, A.B.: Mixture density networks for distribution and uncertainty estimation. Ph.D. thesis, Universitat Politècnica de Catalunya. Facultat d’Informàtica de Barcelona (2017)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3268–3277 (2019). https://doi.org/10.1109/CVPR.2019.00339
Guzman-Rivera, A., Batra, D., Kohli, P.: Multiple choice learning: learning to produce multiple structured outputs. In: Advances in Neural Information Processing Systems 25 (2012)
Google Scholar
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008). https://doi.org/10.1109/TPAMI.2007.1166
Article Google Scholar
Huang, Z., Fan, J., Cheng, S., Yi, S., Wang, X., Li, H.: Hms-net: hierarchical multi-scale sparsity-invariant network for sparse depth completion. IEEE Trans. Image Process. 29, 3429–3441 (2020). https://doi.org/10.1109/TIP.2019.2960589
Article MATH Google Scholar
Häger, G., Persson, M., Felsberg, M.: Predicting disparity distributions. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4363–4369 (2021). https://doi.org/10.1109/ICRA48506.2021.9561617
Itseez: Open source computer vision library (2015). https://github.com/itseez/opencv
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017). https://doi.org/10.1109/ICCV.2017.17
Li, A., Yuan, Z., Ling, Y., Chi, W., Zhang, S., Zhang, C.: A multi-scale guided cascade hourglass network for depth completion. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 32–40 (2020). https://doi.org/10.1109/WACV45572.2020.9093407
Liu, Z., Zhang, F., Hong, X.: Low-cost retina-like robotic lidars based on incommensurable scanning. IEEE/ASME Trans. Mechatron. 27(1), 58–68 (2022). https://doi.org/10.1109/TMECH.2021.3058173
Article Google Scholar
Maddern, W., Newman, P.: Real-time probabilistic fusion of sparse 3d lidar and dense stereo. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2181–2188 (2016). https://doi.org/10.1109/IROS.2016.7759342
Makansi, O., Ilg, E., Cicek, Z., Brox, T.: Overcoming limitations of mixture density networks: a sampling and fitting framework for multimodal future prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7137–7146 (2019). https://doi.org/10.1109/CVPR.2019.00731
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016). https://doi.org/10.1109/CVPR.2016.438
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015). https://doi.org/10.1109/CVPR.2015.7298925
Nickels, K., Castano, A., Cianci, C.: Fusion of lidar and stereo range for mobile robots. In: International Conference on Advanced Robotics (2003)
Google Scholar
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 878–886 (2017). https://doi.org/10.1109/ICCVW.2017.108
Park, K., Kim, S., Sohn, K.: High-precision depth estimation with the 3d lidar and stereo fusion. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2156–2163 (2018). https://doi.org/10.1109/ICRA.2018.8461048
Poggi, M., Pallotti, D., Tosi, F., Mattoccia, S.: Guided stereo matching. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 979–988 (2019). https://doi.org/10.1109/CVPR.2019.00107
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3
Chapter Google Scholar
Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2538–2547 (2017). https://doi.org/10.1109/CVPR.2017.272
Shivakumar, S.S., Mohta, K., Pfrommer, B., Kumar, V., Taylor, C.J.: Real time dense depth estimation by fusing stereo with sparse depth measurements. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6482–6488 (2019). https://doi.org/10.1109/ICRA.2019.8794023
Tosi, F., Liao, Y., Schmitt, C., Geiger, A.: Smd-nets: stereo mixture density networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8938–8948 (2021). https://doi.org/10.1109/CVPR46437.2021.00883
Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20 (2017). https://doi.org/10.1109/3DV.2017.00012
Wang, T.H., Hu, H.N., Lin, C.H., Tsai, Y.H., Chiu, W.C., Sun, M.: 3d lidar and stereo fusion using stereo matching network with conditional cost volume normalization. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5895–5902 (2019). https://doi.org/10.1109/IROS40897.2019.8968170
Wu, C.Y., Neumann, U.: Scene completeness-aware lidar depth completion for driving scenario. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2490–2494 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414295
Xie, Y., et al.: A4lidartag: depth-based fiducial marker for extrinsic calibration of solid-state lidar and camera. IEEE Robot. Autom. Lett., 1 (2022). https://doi.org/10.1109/LRA.2022.3173033
Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., Zhou, B.: Drivingstereo: a large-scale dataset for stereo matching in autonomous driving scenarios. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 899–908 (2019). https://doi.org/10.1109/CVPR.2019.00099
Zhang, J., Ramanagopal, M.S., Vasudevan, R., Johnson-Roberson, M.: Listereo: generate dense depth maps from lidar and stereo imagery. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7829–7836 (2020). https://doi.org/10.1109/ICRA40945.2020.9196628
Zhu, Y., Zheng, C., Yuan, C., Huang, X., Hong, X.: Camvox: a low-cost and accurate lidar-assisted visual slam system. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5049–5055 (2021). https://doi.org/10.1109/ICRA48506.2021.9561149

Download references

Acknowledgement

This work was supported by the National Key Research and Development Program of China under Grant 2020AAA0108302.

Author information

Authors and Affiliations

Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Hanxi Yin, Junwei Xiao & Xiu Li
School of Instrument Science and Opto-Electronics Engineering, Beijing Information Science and Technology University, Beijing, China
Lei Deng, Ting Sun, Yuseng Xie, Yeyu Fu & Shuixin Deng
Department of Computer Science, The University of Sheffield, Sheffield, UK
Zhixiang Chen
Department of Automation, Tsinghua University, Beijing, China
Baohua Chen

Authors

Hanxi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Lei Deng
View author publications
You can also search for this author in PubMed Google Scholar
Zhixiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Baohua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ting Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yuseng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yeyu Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shuixin Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiu Li .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 60410 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, H. et al. (2023). LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-26319-4_6
Published: 04 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26318-7
Online ISBN: 978-3-031-26319-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing