Abstract
Camera and lidar are considered as important sensors to achieve higher-level autonomous driving. And the complementary information provided by these sensors offer more opportunities for improving performance. However, it is difficulty to fuse them because of different representation of both. In this work, we propose a novel fusion framework SGFusion, which fuse the image and point clouds at semantic and geometric level. The SGFusion framework is divided in sequence into two stages: semantic fusion stage and geometric fusion stage. First, the point clouds are painted with the object-level semantic information obtained from an 2D object detector on the semantic fusion stage. At the same time, the output candidates of 2D object detector will be saved. Then, the painted point clouds fed to one LiDAR-based detector for obtaining high-quality 3D detection candidates. Finally, on the geometric fusion stage, these 3D detection candidates and the 2D detection candidates just saved are combined, and the more accurate detection results can be obtained by using their geometric consistence. The experimental results on KITTI detection benchmark show that our SGFusion achieve up to 3.64% AP improvement on car class compared to three different baselines. Furthermore, our method outperforms prior state-of-the-art works on 3D detection of car class on KITTI testing benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162. Computer Vision Foundation/IEEE Computer Society (2018)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 6526–6534. IEEE Computer Society (2017)
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1201–1209. AAAI Press (2021)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: International Conference on Computer Vision, pp. 6568–6577. IEEE (2019)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, pp. 1–8. IEEE (2018)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 12697–12705. Computer Vision Foundation/IEEE (2019)
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 7345–7353. Computer Vision Foundation/IEEE (2019)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 9992–10002. IEEE (2021)
Pang, S., Morris, D.D., Radha, H.: CLOCs: camera-lidar object candidates fusion for 3D object detection. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10386–10393. IEEE (2020)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927. Computer Vision Foundation/IEEE Computer Society (2018)
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 779–788. IEEE Computer Society (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 10526–10535. Computer Vision Foundation/IEEE (2020)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 770–779. Computer Vision Foundation/IEEE (2019)
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: PointPainting: sequential fusion for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, pp. 4603–4611. Computer Vision Foundation/IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00466
Xu, S., Zhou, D., Fang, J., Yin, J., Zhou, B., Zhang, L.: FusionPainting: multimodal fusion with adaptive attention for 3D object detection. In: 24th IEEE International Intelligent Transportation Systems Conference, ITSC 2021, pp. 3047–3054. IEEE (2021)
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11037–11045. Computer Vision Foundation/IEEE (2020)
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3D object detection and tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11784–11793. Computer Vision Foundation/IEEE (2021)
Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 720–736. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_43
Zhao, X., Liu, Z., Hu, R., Huang, K.: 3D object detection using scale invariant and feature reweighting networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9267–9274. AAAI Press (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Zeng, X., Pang, C., Hu, X. (2023). SGFusion: Camera-LiDAR Semantic and Geometric Fusion for 3D Object Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-30111-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)