SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection

Chen, Xuhua; Zeng, Xinhua; Pang, Chengxin; Hu, Xin

doi:10.1007/978-3-031-30111-7_37

Xuhua Chen ORCID: orcid.org/0000-0001-7108-4844¹²,
Xinhua Zeng¹²,
Chengxin Pang¹³ &
…
Xin Hu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

International Conference on Neural Information Processing

906 Accesses

Abstract

Camera and lidar are considered as important sensors to achieve higher-level autonomous driving. And the complementary information provided by these sensors offer more opportunities for improving performance. However, it is difficulty to fuse them because of different representation of both. In this work, we propose a novel fusion framework SGFusion, which fuse the image and point clouds at semantic and geometric level. The SGFusion framework is divided in sequence into two stages: semantic fusion stage and geometric fusion stage. First, the point clouds are painted with the object-level semantic information obtained from an 2D object detector on the semantic fusion stage. At the same time, the output candidates of 2D object detector will be saved. Then, the painted point clouds fed to one LiDAR-based detector for obtaining high-quality 3D detection candidates. Finally, on the geometric fusion stage, these 3D detection candidates and the 2D detection candidates just saved are combined, and the more accurate detection results can be obtained by using their geometric consistence. The experimental results on KITTI detection benchmark show that our SGFusion achieve up to 3.64% AP improvement on car class compared to three different baselines. Furthermore, our method outperforms prior state-of-the-art works on 3D detection of car class on KITTI testing benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 6526–6534. IEEE Computer Society (2017)
Google Scholar
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1201–1209. AAAI Press (2021)
Google Scholar
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: International Conference on Computer Vision, pp. 6568–6577. IEEE (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, pp. 1–8. IEEE (2018)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 12697–12705. Computer Vision Foundation/IEEE (2019)
Google Scholar
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 7345–7353. Computer Vision Foundation/IEEE (2019)
Google Scholar
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 9992–10002. IEEE (2021)
Google Scholar
Pang, S., Morris, D.D., Radha, H.: CLOCs: camera-lidar object candidates fusion for 3D object detection. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10386–10393. IEEE (2020)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 779–788. IEEE Computer Society (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 10526–10535. Computer Vision Foundation/IEEE (2020)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 770–779. Computer Vision Foundation/IEEE (2019)
Google Scholar
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: PointPainting: sequential fusion for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 4603–4611. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00466
Xu, S., Zhou, D., Fang, J., Yin, J., Zhou, B., Zhang, L.: FusionPainting: multimodal fusion with adaptive attention for 3D object detection. In: 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, pp. 3047–3054. IEEE (2021)
Google Scholar
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11037–11045. Computer Vision Foundation/IEEE (2020)
Google Scholar
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3D object detection and tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11784–11793. Computer Vision Foundation/IEEE (2021)
Google Scholar
Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 720–736. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_43
Chapter Google Scholar
Zhao, X., Liu, Z., Hu, R., Huang, K.: 3D object detection using scale invariant and feature reweighting networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9267–9274. AAAI Press (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Academy for Engineering and Technology, Fudan University, Shanghai, 200433, China
Xuhua Chen & Xinhua Zeng
Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai, 200000, China
Chengxin Pang
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Xin Hu

Authors

Xuhua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xinhua Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Chengxin Pang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinhua Zeng .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Zeng, X., Pang, C., Hu, X. (2023). SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-30111-7_37
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection