Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Li, Huadong; Jing, Minhao; Jin, Wang; Dong, Shichao; Liang, Jiajun; Fan, Haoqiang; Ji, Renhe

doi:10.1007/978-3-031-72967-6_8

Huadong Li¹³,
Minhao Jing¹³,
Wang Jin¹⁵,
Shichao Dong¹³,
Jiajun Liang¹³,
Haoqiang Fan¹³ &
…
Renhe Ji¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15107))

Included in the following conference series:

European Conference on Computer Vision

248 Accesses

Abstract

It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying reasons for this are rarely discussed. To this end, we revisit the task of radar-camera depth completion and present a new method with sparse LiDAR supervision to outperform previous dense LiDAR supervision methods in both accuracy and speed. Specifically, when trained by sparse LiDAR supervision, depth completion models usually output depth maps containing significant stripe-like artifacts. We find that such a phenomenon is caused by the implicitly learned positional distribution pattern from sparse LiDAR supervision, termed as LiDAR Distribution Leakage (LDL) in this paper. Based on such understanding, we present a novel Disruption-Compensation radar-camera depth completion framework to address this issue. The Disruption part aims to deliberately disrupt the learning of LiDAR distribution from sparse supervision, while the Compensation part aims to leverage 3D spatial and 2D semantic information to compensate for the information loss of previous disruptions. Extensive experimental results demonstrate that by reducing the impact of LDL, our framework with sparse supervision outperforms the state-of-the-art dense supervision methods with 11.6$\mathbf {\%}$ improvement in Mean Absolute Error (MAE) and $\mathbf {1.6 \times }$ speedup in Frame Per Second (FPS). The code is available at https://github.com/megvii-research/Sparse-Beats-Dense.

H. Li and M. Jing—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization

Article Open access 16 January 2025

Deep Cost Ray Fusion for Sparse Depth Video Completion

DarkMDE: Excavating Synthetic Images for Nighttime Depth Estimation Using Cross-Domain Supervision

Notes

1.
These object-level mask annotations will be released as well, along with the code.

References

Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR, pp. 11621–11631 (2020)
Google Scholar
Chabra, R., et al.: Deep local shapes: learning local SDF priors for detailed 3D reconstruction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12374, pp. 608–625. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58526-6_36
Chapter Google Scholar
Chen, X., Zhang, T., Wang, Y., Wang, Y., Zhao, H.: FUTR3D: a unified sensor fusion framework for 3D detection. In: CVPR, pp. 172–181 (2023)
Google Scholar
Chen, Z., et al.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR, pp. 5939–5948 (2019)
Google Scholar
Cheng, X., Wang, P., Yang, R.: Depth estimation via affinity learned with convolutional spatial propagation network. In: ECCV, pp. 103–119 (2018)
Google Scholar
Gasperini, S., Koch, P., Dallabetta, V., Navab, N., Busam, B., Tombari, F.: R4DYN: exploring radar for self-supervised monocular depth estimation of dynamic scenes. In: 3DV, pp. 751–760. IEEE (2021)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Google Scholar
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: Penet: towards precise and efficient image guided depth completion. In: ICRA, pp. 13656–13662. IEEE (2021)
Google Scholar
Hu, X., Mu, H., Zhang, X., Wang, Z., Tan, T., Sun, J.: Meta-SR: a magnification-arbitrary network for super-resolution. In: CVPR, pp. 1575–1584 (2019)
Google Scholar
Huang, Y., Zheng, W., Zhang, Y., Zhou, J., Lu, J.: Tri-perspective view for vision-based 3D semantic occupancy prediction. In: CVPR, pp. 9223–9232 (2023)
Google Scholar
Imran, S., Liu, X., Morris, D.: Depth completion with twin surface extrapolation at occlusion boundaries. In: CVPR, pp. 2583–2592 (2021)
Google Scholar
Kim, A., Ošep, A., Leal-Taixé, L.: Eagermot: 3D multi-object tracking via sensor fusion. In: ICRA, pp. 11315–11321. IEEE (2021)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Lin, J.T., Dai, D., Van Gool, L.: Depth estimation from monocular images and sparse radar data. In: IROS, pp. 10233–10240. IEEE (2020)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Google Scholar
Lin, Y., Cheng, T., Zhong, Q., Zhou, W., Yang, H.: Dynamic spatial propagation network for depth completion. In: AAAI, vol. 36, pp. 1638–1646 (2022)
Google Scholar
Liu, Y., Wang, T., Zhang, X., Sun, J.: PETR: position embedding transformation for multi-view 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 531–548. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_31
Chapter Google Scholar
Lo, C.C., Vandewalle, P.: Depth estimation from monocular images and sparse radar using deep ordinal regression network. In: ICIP, pp. 3343–3347. IEEE (2021)
Google Scholar
Long, Y., Morris, D., Liu, X., Castro, M., Chakravarty, P., Narayanan, P.: Radar-camera pixel depth association for depth completion. In: CVPR, pp. 12507–12516 (2021)
Google Scholar
Ma, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: ICRA, pp. 4796–4803. IEEE (2018)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR, pp. 4460–4470 (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)
Google Scholar
Philion, J., Fidler, S.: Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 194–210. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_12
Chapter Google Scholar
Qiu, J., et al.: Deeplidar: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: CVPR, pp. 3313–3322 (2019)
Google Scholar
Qureshi, A.H., Simeonov, A., Bency, M.J., Yip, M.C.: Motion planning networks. In: ICRA, pp. 2118–2124. IEEE (2019)
Google Scholar
Rho, K., Ha, J., Kim, Y.: Guideformer: transformers for image guided depth completion. In: CVPR, pp. 6250–6259 (2022)
Google Scholar
Singh, A.D., et al.: Depth estimation from camera image and mmwave radar point cloud. In: CVPR, pp. 9275–9285 (2023)
Google Scholar
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. NIPS 33, 7462–7473 (2020)
Google Scholar
Skolnik, M.I.: Introduction to Radar Systems. New York (1980)
Google Scholar
Wang, T.H., Wang, F.E., Lin, J.T., Tsai, Y.H., Chiu, W.C., Sun, M.: Plug-and-play: improve depth estimation via sparse data propagation. arXiv preprint arXiv:1812.08350 (2018)
Weinstein, L.: Electromagnetic waves. Radio i svyaz’, Moscow (1988)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 1492–1500 (2017)
Google Scholar
Yan, Z., Wang, K., Li, X., Zhang, Z., Li, J., Yang, J.: RigNet: repetitive image guided network for depth completion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 214–230. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_13
Chapter Google Scholar
Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., Mattoccia, S.: Completionformer: depth completion with convolutions and vision transformers. In: CVPR, pp. 18527–18536 (2023)
Google Scholar
Zhou, B., et al.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis. 127, 302–321 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

MEGVII Technology, Beijing, China
Huadong Li, Minhao Jing, Shichao Dong, Jiajun Liang & Haoqiang Fan
Fvidar Inc., Yuyao City, China
Renhe Ji
The University of Hong Kong, Pokfulam, Hong Kong
Wang Jin

Authors

Huadong Li
View author publications
You can also search for this author in PubMed Google Scholar
Minhao Jing
View author publications
You can also search for this author in PubMed Google Scholar
Wang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Liang
View author publications
You can also search for this author in PubMed Google Scholar
Haoqiang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Renhe Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renhe Ji .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H. et al. (2025). Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15107. Springer, Cham. https://doi.org/10.1007/978-3-031-72967-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-72967-6_8
Published: 03 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72966-9
Online ISBN: 978-3-031-72967-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion