Skip to main content

Depth Estimation via Sparse Radar Prior and Driving Scene Semantics

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Abstract

Depth estimation is an essential module for the perception system of autonomous driving. The state-of-the-art methods introduce LiDAR to improve the performance of monocular depth estimation, but it faces the challenges of weather durability and high hardware cost. Unlike existing LiDAR and image-based methods, a two-stage network is proposed to integrate highly sparse radar data in this paper, in which sparse pre-mapping module and feature fusion module are proposed for radar feature extraction and feature fusion respectively. Considering the highly structured driving scenario, we introduce semantic information of the scenario to further improve the loss function, thus making the network more focused on the target region. Finally, we propose a novel depth dataset construction strategy by integrating binary mask-based filtering and interpolation methods based on the nuScenes dataset. And the effectiveness of our proposed method has been demonstrated through extensive experiments, which outperform existing methods in all metrics.

K. Zheng and S. Li—Co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bhat, S.F., Alhashim, I., Wonka, P.: AdaBins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018 (2021)

    Google Scholar 

  2. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

    Google Scholar 

  3. Cao, Y., Wu, Z., Shen, C.: Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans. Circ. Syst. Video Technol. 28(11), 3174–3182 (2017)

    Article  Google Scholar 

  4. Chen, Y., Yang, B., Liang, M., Urtasun, R.: Learning joint 2D-3D representations for depth completion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10023–10032 (2019)

    Google Scholar 

  5. Chen, Y., Zhao, H., Hu, Z., Peng, J.: Attention-based context aggregation network for monocular depth estimation. Int. J. Mach. Learn. Cybern. 12(6), 1583–1596 (2021). https://doi.org/10.1007/s13042-020-01251-y

    Article  Google Scholar 

  6. Cheng, X., Wang, P., Guan, C., Yang, R.: CSPN++: learning context and resource aware convolutional spatial propagation networks for depth completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10615–10622 (2020)

    Google Scholar 

  7. Cheng, X., Wang, P., Yang, R.: Depth estimation via affinity learned with convolutional spatial propagation network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 108–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_7

    Chapter  Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  9. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)

    Google Scholar 

  10. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  11. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)

    Google Scholar 

  12. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)

    Google Scholar 

  13. Gurram, A., Urfalioglu, O., Halfaoui, I., Bouzaraa, F., López, A.M.: Monocular depth estimation by learning from heterogeneous datasets. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 2176–2181. IEEE (2018)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  15. Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENet: towards precise and efficient image guided depth completion. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13656–13662. IEEE (2021)

    Google Scholar 

  16. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4

    Chapter  Google Scholar 

  17. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)

    Google Scholar 

  18. Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)

  19. Li, B., Shen, C., Dai, Y., Van Den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1119–1127 (2015)

    Google Scholar 

  20. Li, R., Xian, K., Shen, C., Cao, Z., Lu, H., Hang, L.: Deep attention-based classification network for robust depth prediction. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 663–678. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_41

    Chapter  Google Scholar 

  21. Lin, J.T., Dai, D., Van Gool, L.: Depth estimation from monocular images and sparse radar data. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10233–10240. IEEE (2020)

    Google Scholar 

  22. Ma, F., Cavalheiro, G.V., Karaman, S.: Self-supervised sparse-to-dense: self-supervised depth completion from LiDAR and monocular camera. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 3288–3295. IEEE (2019)

    Google Scholar 

  23. Ma, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4796–4803. IEEE (2018)

    Google Scholar 

  24. Qiu, J., et al.: DeepLiDAR: deep surface normal guided depth prediction for outdoor scene from sparse LiDAR data and single color image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)

    Google Scholar 

  25. Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2008)

    Article  Google Scholar 

  26. Tang, J., Tian, F.P., Feng, W., Li, J., Tan, P.: Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 30, 1116–1129 (2020)

    Article  Google Scholar 

  27. Torralba, A., Oliva, A.: Depth estimation from image structure. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1226–1238 (2002)

    Article  Google Scholar 

  28. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant CNNs. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20. IEEE (2017)

    Google Scholar 

  29. Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy LiDAR completion with RGB guidance and uncertainty. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)

    Google Scholar 

  30. Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: SDC-Depth: semantic divide-and-conquer network for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 541–550 (2020)

    Google Scholar 

  31. Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)

    Google Scholar 

  32. Xu, Y., Zhu, X., Shi, J., Zhang, G., Bao, H., Li, H.: Depth completion from sparse LiDAR data with depth-normal constraints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2811–2820 (2019)

    Google Scholar 

  33. Xu, Z., Yin, H., Yao, J.: Deformable spatial propagation networks for depth completion. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 913–917. IEEE (2020)

    Google Scholar 

  34. Ye, X., Chen, S., Xu, R.: DPNet: detail-preserving network for high quality monocular depth estimation. Pattern Recogn. 109, 107578 (2021)

    Article  Google Scholar 

  35. Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5684–5693 (2019)

    Google Scholar 

Download references

Acknowledgement

This research was funded by the Key R &D Projects of Science & Technology Department of Sichuan Province of China under Grant 2021YFG0070.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuguang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, K. et al. (2023). Depth Estimation via Sparse Radar Prior and Driving Scene Semantics. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13842. Springer, Cham. https://doi.org/10.1007/978-3-031-26284-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26284-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26283-8

  • Online ISBN: 978-3-031-26284-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics