Skip to main content
Log in

SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

3D object detection has been used in many fields, such as virtual reality, automatic driving and target tracking. 3D object detection methods usually use point clouds as input, but point clouds are disordered and rotationally invariant. To solve this problem, voxel-based methods convert point clouds into voxels. However, the raw point clouds contain a large number of background points which are not relevant to the target or not helpful for subsequent detection, and voxel-based methods invariably feed them directly into the network for 3D object detection. Besides, the voxel-based approach is to model the point cloud data to generate a plenty of voxels with the same dimension. But random sampling during voxel partition will make voxel representation weaker and reduce the performance of classifier as well as box regressor. We, therefore, propose a plug-to-play module, which contains a shape-awared filter(SAF) and a semantic-ranked sampler(SRS). SAF can effectively remove some of the background points in the raw point clouds, to accelerate inference indirectly. And, SRS can enhance the expression of voxel feature by retaining the points of high confidence. Finally, we remove previous orientation classifier and propose a new loss method named ADIoU loss to improve the orientation estimation performance. Experiments on the KITTI car detection bench-mark demonstrate that our method shows faster inference speed and higher detection accuracy compared with SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability statements

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Wang Y, Ye J (2020) An overview of 3d object detection. arXiv preprint https://doi.org/10.48550/arXiv.2010.15614

  2. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  3. Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  4. Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18:10

    Article  Google Scholar 

  5. Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  6. Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  7. Graham B (2014) Spatially-sparse convolutional neural networks. CoRR abs/1409.6070 http://arxiv.org/abs/1409.6070

  8. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)

  9. Qi C.R, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR abs/1706.02413 http://arxiv.org/abs/1706.02413

  10. Ding Z, Han X, Niethammer M (2019) Votenet: a deep learning label fusion method for multi-atlas segmentation. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, Yap P-T, Khan A (eds) Medical image computing and computer assisted intervention - MICCAI 2019. Springer, Cham, pp 202–210

    Chapter  Google Scholar 

  11. He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  12. Zheng W, Tang W, Jiang L, Fu C-W (2021) Se-ssd: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14494–14503

  13. Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  14. Ali W, Abdelkarim S, Zahran M, Zidan M, Sallab AE (2018) Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp. 0–0

  15. Wang DZ, Posner I (2015) Voting for voting in online point cloud object detection. In: Robotics: Science and Systems 2015, pp. 10–15

  16. Qi C.R, Wei L, Wu C, Hao S, Guibas LJ (2017) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918–927

  17. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1907–1915

  18. Su P, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection

  19. Lang AH, Vora S, Caesar H, Zhou L, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  20. He C, Zeng H, Huang J, Hua X.S, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  21. Zheng W, Tang W, Chen S, Jiang L, Fu CW (2020) Cia-ssd: Confident iou-aware single-stage object detector from point cloud. In: Proceedings of the AAAI conference on artificial intelligence, pp. 3555–3562

  22. Franzen D, Cihacek L, Hofman V, Swenson L (1998) Topography-based sampling compared with grid sampling in the northern great plains. J Prod Agric 11(3):364–370

    Article  Google Scholar 

  23. Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: Boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4404–4413

  24. Huang Z, Yu Y, Xu J, Ni F, Le X (2020) Pf-net: Point fractal network for 3d point cloud completion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  25. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst 31:87

    Google Scholar 

  26. Wu W, Qi Z, Fuxin L (2019) Pointconv: Deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630

  27. Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598

  28. Zhang W, Su S, Wang B, Hong Q, Sun L (2020) Local k-nns pattern in omni-direction graph convolution neural network for 3d point clouds. Neurocomputing 413:487–498

    Article  Google Scholar 

  29. Calderhead B, Girolami M (2011) Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods. Interface Focus 1(6):821–835

    Article  MATH  Google Scholar 

  30. Lang I, Manor A, Avidan S (2020) Samplenet: Differentiable point cloud sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7578–7588

  31. Liu M, Sheng L, Yang S, Shao J, Hu S-M (2020) Morphing and sampling network for dense point cloud completion. Proceedings of the AAAI Conference on Artificial Intelligence 34:11596–11603

  32. Nezhadarya E, Taghavi E, Razani R, Liu B, Luo J (2020) Adaptive hierarchical down-sampling for point cloud classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12956–12964

  33. Xie J, Zheng Z, Gao R, Wang W, Zhu S-C, Wu YN (2020) Generative voxelnet: learning energy-based models for 3d shape synthesis and analysis. IEEE Trans Pattern Anal Mach Intell 25:87

    Google Scholar 

  34. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR

  35. Yang J, Zhang Q, Ni B, Li L, Liu J, Zhou M, Tian Q (2019) Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3323–3332

  36. Zhu L, Wang B, Tian G, Wang W, Li C (2021) Towards point cloud completion: point rank sampling and cross-cascade graph cnn. Neurocomputing 461:1–16

    Article  Google Scholar 

  37. He C, Zeng H, Huang J, Hua X.-S, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  38. Lecun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (2014) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  39. Pajdla T, Matas J (2004) Recognizing objects in range data using regional point descriptors. In: European Conference on Computer Vision

  40. Wang Y, Gang P, Wu Z, Shi H (2004) Sphere-spin-image: A viewpoint-invariant surface representation for 3d face recognition. In: Computational Science - ICCS 2004, 4th International Conference, Kraków, Poland, June 6-9, 2004, Proceedings, Part II

  41. Zhang Z (1998) Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: IEEE International Conference on Automatic Face & Gesture Recognition

  42. Mo K (2015) Spatial transformer network. Adv Neural Inf Process Syst 28:1–9

    Google Scholar 

  43. Abeywickrama T, Cheema M.A, Taniar D (2016) k-nearest neighbors on road networks: A journey in experimentation and in-memory implementation. Proceedings of the VLDB Endowment 9(6)

  44. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  45. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR

  46. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings

  47. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, pp. 12993–13000

  48. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR)

  49. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631

  50. Liang M, Yang B, Wang S, Urtasun R (2020) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 641–656

  51. Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9775–9784

  52. Yang Z, Sun Y, Liu S, Shen X, Jia J Std: Sparse-to-dense 3d object detector for point cloud. Tencent; Chinese University of Hong Kong

  53. Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2019) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538

  54. Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3ddet: Perceptual-to-conceptual association for 3d point cloud object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13329–13338

  55. Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1711–1719

  56. Hu J, Kuai T, Waslander SL (2022) Point density-aware voxels for lidar 3d object detection

Download references

Acknowledgements

This work was supported by the National Key R &D Program of China (2019YFA0708300) and National Natural Science Foundation of China (Grant No. 52074323).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhe Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, L., Chen, Z., Wang, B. et al. SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection. Neural Comput & Applic 35, 13417–13431 (2023). https://doi.org/10.1007/s00521-023-08382-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08382-7

Keywords

Navigation