Skip to main content

SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

  • 906 Accesses

Abstract

Camera and lidar are considered as important sensors to achieve higher-level autonomous driving. And the complementary information provided by these sensors offer more opportunities for improving performance. However, it is difficulty to fuse them because of different representation of both. In this work, we propose a novel fusion framework SGFusion, which fuse the image and point clouds at semantic and geometric level. The SGFusion framework is divided in sequence into two stages: semantic fusion stage and geometric fusion stage. First, the point clouds are painted with the object-level semantic information obtained from an 2D object detector on the semantic fusion stage. At the same time, the output candidates of 2D object detector will be saved. Then, the painted point clouds fed to one LiDAR-based detector for obtaining high-quality 3D detection candidates. Finally, on the geometric fusion stage, these 3D detection candidates and the 2D detection candidates just saved are combined, and the more accurate detection results can be obtained by using their geometric consistence. The experimental results on KITTI detection benchmark show that our SGFusion achieve up to 3.64% AP improvement on car class compared to three different baselines. Furthermore, our method outperforms prior state-of-the-art works on 3D detection of car class on KITTI testing benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)

    Google Scholar 

  2. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  3. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 6526–6534. IEEE Computer Society (2017)

    Google Scholar 

  4. Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1201–1209. AAAI Press (2021)

    Google Scholar 

  5. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: International Conference on Computer Vision, pp. 6568–6577. IEEE (2019)

    Google Scholar 

  6. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  7. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, pp. 1–8. IEEE (2018)

    Google Scholar 

  8. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 12697–12705. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  9. Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 7345–7353. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  10. Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39

    Chapter  Google Scholar 

  11. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 9992–10002. IEEE (2021)

    Google Scholar 

  12. Pang, S., Morris, D.D., Radha, H.: CLOCs: camera-lidar object candidates fusion for 3D object detection. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10386–10393. IEEE (2020)

    Google Scholar 

  13. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927. Computer Vision Foundation/IEEE Computer Society (2018)

    Google Scholar 

  14. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 779–788. IEEE Computer Society (2016)

    Google Scholar 

  15. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018)

    Google Scholar 

  16. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  17. Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 10526–10535. Computer Vision Foundation/IEEE (2020)

    Google Scholar 

  18. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 770–779. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  19. Vora, S., Lang, A.H., Helou, B., Beijbom, O.: PointPainting: sequential fusion for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 4603–4611. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00466

  20. Xu, S., Zhou, D., Fang, J., Yin, J., Zhou, B., Zhang, L.: FusionPainting: multimodal fusion with adaptive attention for 3D object detection. In: 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, pp. 3047–3054. IEEE (2021)

    Google Scholar 

  21. Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  22. Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11037–11045. Computer Vision Foundation/IEEE (2020)

    Google Scholar 

  23. Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3D object detection and tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11784–11793. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  24. Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 720–736. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_43

    Chapter  Google Scholar 

  25. Zhao, X., Liu, Z., Hu, R., Huang, K.: 3D object detection using scale invariant and feature reweighting networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9267–9274. AAAI Press (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinhua Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Zeng, X., Pang, C., Hu, X. (2023). SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30111-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30110-0

  • Online ISBN: 978-3-031-30111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics