Skip to main content

DecoratingFusion: A LiDAR-Camera Fusion Network with the Combination of Point-Level and Feature-Level Fusion

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15017))

Included in the following conference series:

Abstract

Lidars and cameras play essential roles in autonomous driving, offering complementary information for 3D detection. The state-of-the-art fusion methods integrate them at the feature level, but they mostly rely on the learned soft association between point clouds and images, which lacks interpretability and neglects the hard association between them. In this paper, we combine feature-level fusion with point-level fusion, using hard association established by the calibration matrices to guide the generation of object queries. Specifically, in the early fusion stage, we use the 2D CNN features of images to decorate the point cloud data, and employ two independent sparse convolutions to extract the decorated point cloud features. In the mid-level fusion stage, we initialize the queries with a center heatmap and embed the predicted class labels as auxiliary information into the queries, making the initial positions closer to the actual centers of the targets. Extensive experiments conducted on two popular datasets, i.e. KITTI, Waymo, demonstrate the superiority of DecoratingFusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y.: Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1090–1099 (2022)

    Google Scholar 

  2. Chen, Y., Li, Y., Zhang, X., Sun, J., Jia, J.: Focal sparse convolutional networks for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5428–5437 (2022)

    Google Scholar 

  3. Hu, J.S., Kuai, T., Waslander, S.L.: Point density-aware voxels for lidar 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8469–8478 (2022)

    Google Scholar 

  4. Huang, T., Liu, Z., Chen, X., Bai, X.: EPNet: Enhancing point features with image semantics for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV, pp. 35–52. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_3

    Chapter  Google Scholar 

  5. Li, Y., Yu, A.W., Meng, T., Caine, B., Ngiam, J.: Deepfusion: lidar-camera deep fusion for multi-modal 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17182–17191 (2022)

    Google Scholar 

  6. Pang, S., Morris, D., Radha, H.: Clocs: camera-lidar object candidates fusion for 3D object detection. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10386–10393 IEEE (2020)

    Google Scholar 

  7. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)

    Google Scholar 

  8. Shi, S., et al.: PV-rCNN: point-voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)

    Google Scholar 

  9. Shi, S., Wang, X., Li, H.: Pointrcnn: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)

    Google Scholar 

  10. Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2647–2664 (2020)

    Google Scholar 

  11. sbibitemvora2020pointpainting Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: sequential fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4612 (2020)

    Google Scholar 

  12. Wang, C., Ma, C., Zhu, M., Yang, X.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11794–11803 (2021)

    Google Scholar 

  13. Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  14. Yang, Z., Zhou, Y., Chen, Z., Ngiam, J.: 3D-man: 3D multi-frame attention network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1863–1872 (2021)

    Google Scholar 

  15. Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2021)

    Google Scholar 

  16. Zhang, Y., Chen, J., Huang, D.: Cat-det: contrastively augmented transformer for multi-modal 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 908–917 (2022)

    Google Scholar 

  17. Zheng, W., Tang, W., Jiang, L., Fu, C.W.: Se-ssd: self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14494–14503 (2021)

    Google Scholar 

  18. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

  19. Zhou, Z., Zhao, X., Wang, Yu., Wang, P., Foroosh, H.: CenterFormer: center-based transformer for 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp. 496–513. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-19839-7_29

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yin, Z., Sun, H., Liu, N., Zhou, H., Shen, J. (2024). DecoratingFusion: A LiDAR-Camera Fusion Network with the Combination of Point-Level and Feature-Level Fusion. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15017. Springer, Cham. https://doi.org/10.1007/978-3-031-72335-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72335-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72334-6

  • Online ISBN: 978-3-031-72335-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics