RPF3D: Range-Pillar Feature Deep Fusion 3D Detector for Autonomous Driving

Wang, Yihan; Yan, Qiao

doi:10.1007/978-981-99-8067-3_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14449))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

In this paper, we present RPF3D, an innovative single-stage framework that explores the complementary nature of point clouds and range images for 3D object detection. Our method addresses the sampling region imbalance issue inherent in fixed-dilation-rate convolutional layers, allowing for a more accurate representation of the input data. To enhance the model’s adaptability, we introduce several attention layers that accommodate a wide range of dilation rates necessary for processing range image scenes. To tackle the challenges of feature fusion and alignment, we propose the AttentiveFusion module and the Range Image Guided Deep Fusion (RIGDF) backbone architecture in the Range-Pillar Feature Fusion section, which effectively addresses the one-pillar-to-multiple-pixels feature alignment problem caused by the point cloud encoding strategy. These innovative components work together to provide a more robust and accurate fusion of features for improved 3D object detection. We validate the effectiveness of our RPF3D framework through extensive experiments on the KITTI and Waymo Open Datasets. The results demonstrate the superior performance of our approach compared to existing methods, particularly in the Car class detection where a significant enhancement is achieved on both datasets. This showcases the practical applicability and potential impact of our proposed framework in real-world scenarios and emphasizes its relevance in the domain of 3D object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ROFusion: Efficient Object Detection Using Hybrid Point-Wise Radar-Optical Fusion

3D object detection algorithm based on multi-sensor segmental fusion of frustum association for autonomous driving

Article 02 July 2023

RGB-D Object Classification for Autonomous Driving Perception

References

Bewley, A., Sun, P., Mensink, T., Anguelov, D., Sminchisescu, C.: Range conditioned dilated convolutions for scale invariant 3D object detection. arXiv preprint arXiv:2005.09927 (2020)
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1201–1209 (2021)
Google Scholar
Deng, J., Zhou, W., Zhang, Y., Li, H.: From multi-view to hollow-3D: hallucinated hollow-3D R-CNN for 3D object detection. IEEE Trans. Circuits Syst. Video Technol. 31(12), 4722–4734 (2021)
Article Google Scholar
Fan, L., Xiong, X., Wang, F., Wang, N., Zhang, Z.: RangeDet: in defense of range view for lidar-based 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2918–2927 (2021)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/CVPR.2012.6248074
Graham, B., Engelcke, M., Van Der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)
Google Scholar
He, C., Li, R., Li, S., Zhang, L.: Voxel set transformer: a set-to-set approach to 3D object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8417–8427 (2022)
Google Scholar
He, C., Zeng, H., Huang, J., Hua, X.S., Zhang, L.: Structure aware single-stage 3D object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11873–11882 (2020)
Google Scholar
He, Q., Wang, Z., Zeng, H., Zeng, Y., Liu, Y.: SVGA-Net: sparse voxel-graph attention network for 3D object detection from point clouds. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 870–878 (2022)
Google Scholar
Hu, J.S., Kuai, T., Waslander, S.L.: Point density-aware voxels for lidar 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8469–8478 (2022)
Google Scholar
Huang, T., Liu, Z., Chen, X., Bai, X.: EPNet: enhancing point features with image semantics for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 35–52. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_3
Chapter Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Li, X., et al.: Homogeneous multi-modal feature fusion and interaction for 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXXVIII, pp. 691–707. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19839-7_40
Li, Y., et al.: DeepFusion: lidar-camera deep fusion for multi-modal 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17182–17191 (2022)
Google Scholar
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7345–7353 (2019)
Google Scholar
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Chapter Google Scholar
Liang, Z., Zhang, M., Zhang, Z., Zhao, X., Pu, S.: RangeRCNN: towards fast and accurate 3D object detection with range image representation. arXiv preprint arXiv:2009.00206 (2020)
Liang, Z., Zhang, Z., Zhang, M., Zhao, X., Pu, S.: RangeioUDet: range image based real-time 3D object detector optimized by intersection over union. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7140–7149, June 2021
Google Scholar
Ma, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4796–4803. IEEE (2018)
Google Scholar
Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: LaserNet: an efficient probabilistic 3D object detector for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12677–12686 (2019)
Google Scholar
Miao, Z., et al.: PVGNet: a bottom-up one-stage 3D object detector with integrated multi-level features. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3278–3287 (2021). https://doi.org/10.1109/CVPR46437.2021.00329
Piergiovanni, A., Casser, V., Ryoo, M.S., Angelova, A.: 4D-net for learned multi-modal alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15435–15445 (2021)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)
Google Scholar
Shi, S., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. arXiv preprint arXiv:2102.00463 (2021)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Google Scholar
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2647–2664 (2020)
Google Scholar
Shi, W., Rajkumar, R.: Point-GNN: graph neural network for 3D object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1711–1719 (2020)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. CoRR abs/1912.04838 (2019). http://arxiv.org/abs/1912.04838
Sun, P., et al.: RSN: range sparse net for efficient, accurate lidar 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5725–5734 (2021)
Google Scholar
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: PointPainting: sequential fusion for 3D object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4603–4611 (2019)
Google Scholar
Wang, C., Ma, C., Zhu, M., Yang, X.: PointAugmenting: cross-modal augmentation for 3D object detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11789–11798 (2021). https://doi.org/10.1109/CVPR46437.2021.01162
Wang, L., Wang, C., Zhang, X., Lan, T., Li, J.: S-AT GCN: spatial-attention graph convolution network based feature enhancement for 3D object detection. arXiv preprint arXiv:2103.08439 (2021)
Wang, Y., et al.: Pillar-based object detection for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_2
Chapter Google Scholar
Wang, Z., Jia, K.: Frustum ConvNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1742–1749. IEEE (2019)
Google Scholar
Xie, L., Xu, G., Cai, D., He, X.: X-view: non-egocentric multi-view 3D object detector. IEEE Trans. Image Process. 32, 1488–1497 (2023). https://doi.org/10.1109/TIP.2023.3245337
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, B., Liang, M., Urtasun, R.: HDNET: exploiting HD maps for 3D object detection. In: Conference on Robot Learning, pp. 146–155. PMLR (2018)
Google Scholar
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)
Google Scholar
Yang, H., et al.: GD-MAE: generative decoder for MAE pre-training on lidar point clouds. arXiv preprint arXiv:2212.03010 (2022)
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020)
Google Scholar
Yuan, Z., Song, X., Bai, L., Wang, Z., Ouyang, W.: Temporal-channel transformer for 3D lidar-based video object detection for autonomous driving. IEEE Trans. Circuits Syst. Video Technol. 32(4), 2068–2078 (2022). https://doi.org/10.1109/TCSVT.2021.3082763
Article Google Scholar
Zhang, Y., Hu, Q., Xu, G., Ma, Y., Wan, J., Guo, Y.: Not all points are equal: learning highly efficient point-based detectors for 3D lidar point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18953–18962 (2022)
Google Scholar
Zheng, W., Tang, W., Chen, S., Jiang, L., Fu, C.W.: CIA-SSD: confident IoU-aware single-stage object detector from point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3555–3562 (2021)
Google Scholar
Zhou, C., Zhang, Y., Chen, J., Huang, D.: OcTr: octree-based transformer for 3D object detection. arXiv preprint arXiv:2303.12621 (2023)
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in lidar point clouds. In: Conference on Robot Learning, pp. 923–932. PMLR (2020)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Google Scholar
Zou, Y., Cheng, L., Li, Z.: A multimodal fusion model for estimating human hand force: comparing surface electromyography and ultrasound signals. IEEE Rob. Autom. Mag. 29(4), 10–24 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, 639798, Singapore
Yihan Wang & Qiao Yan

Authors

Yihan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiao Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiao Yan .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Yan, Q. (2024). RPF3D: Range-Pillar Feature Deep Fusion 3D Detector for Autonomous Driving. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14449. Springer, Singapore. https://doi.org/10.1007/978-981-99-8067-3_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-8067-3_10
Published: 16 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8066-6
Online ISBN: 978-981-99-8067-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RPF3D: Range-Pillar Feature Deep Fusion 3D Detector for Autonomous Driving