Abstract
3D object detection has been used in many fields, such as virtual reality, automatic driving and target tracking. 3D object detection methods usually use point clouds as input, but point clouds are disordered and rotationally invariant. To solve this problem, voxel-based methods convert point clouds into voxels. However, the raw point clouds contain a large number of background points which are not relevant to the target or not helpful for subsequent detection, and voxel-based methods invariably feed them directly into the network for 3D object detection. Besides, the voxel-based approach is to model the point cloud data to generate a plenty of voxels with the same dimension. But random sampling during voxel partition will make voxel representation weaker and reduce the performance of classifier as well as box regressor. We, therefore, propose a plug-to-play module, which contains a shape-awared filter(SAF) and a semantic-ranked sampler(SRS). SAF can effectively remove some of the background points in the raw point clouds, to accelerate inference indirectly. And, SRS can enhance the expression of voxel feature by retaining the points of high confidence. Finally, we remove previous orientation classifier and propose a new loss method named ADIoU loss to improve the orientation estimation performance. Experiments on the KITTI car detection bench-mark demonstrate that our method shows faster inference speed and higher detection accuracy compared with SOTA methods.
Similar content being viewed by others
Data availability statements
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Wang Y, Ye J (2020) An overview of 3d object detection. arXiv preprint https://doi.org/10.48550/arXiv.2010.15614
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18:10
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Graham B (2014) Spatially-sparse convolutional neural networks. CoRR abs/1409.6070 http://arxiv.org/abs/1409.6070
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Qi C.R, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR abs/1706.02413 http://arxiv.org/abs/1706.02413
Ding Z, Han X, Niethammer M (2019) Votenet: a deep learning label fusion method for multi-atlas segmentation. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, Yap P-T, Khan A (eds) Medical image computing and computer assisted intervention - MICCAI 2019. Springer, Cham, pp 202–210
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zheng W, Tang W, Jiang L, Fu C-W (2021) Se-ssd: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14494–14503
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ali W, Abdelkarim S, Zahran M, Zidan M, Sallab AE (2018) Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp. 0–0
Wang DZ, Posner I (2015) Voting for voting in online point cloud object detection. In: Robotics: Science and Systems 2015, pp. 10–15
Qi C.R, Wei L, Wu C, Hao S, Guibas LJ (2017) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918–927
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1907–1915
Su P, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection
Lang AH, Vora S, Caesar H, Zhou L, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
He C, Zeng H, Huang J, Hua X.S, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zheng W, Tang W, Chen S, Jiang L, Fu CW (2020) Cia-ssd: Confident iou-aware single-stage object detector from point cloud. In: Proceedings of the AAAI conference on artificial intelligence, pp. 3555–3562
Franzen D, Cihacek L, Hofman V, Swenson L (1998) Topography-based sampling compared with grid sampling in the northern great plains. J Prod Agric 11(3):364–370
Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: Boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4404–4413
Huang Z, Yu Y, Xu J, Ni F, Le X (2020) Pf-net: Point fractal network for 3d point cloud completion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst 31:87
Wu W, Qi Z, Fuxin L (2019) Pointconv: Deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598
Zhang W, Su S, Wang B, Hong Q, Sun L (2020) Local k-nns pattern in omni-direction graph convolution neural network for 3d point clouds. Neurocomputing 413:487–498
Calderhead B, Girolami M (2011) Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods. Interface Focus 1(6):821–835
Lang I, Manor A, Avidan S (2020) Samplenet: Differentiable point cloud sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7578–7588
Liu M, Sheng L, Yang S, Shao J, Hu S-M (2020) Morphing and sampling network for dense point cloud completion. Proceedings of the AAAI Conference on Artificial Intelligence 34:11596–11603
Nezhadarya E, Taghavi E, Razani R, Liu B, Luo J (2020) Adaptive hierarchical down-sampling for point cloud classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12956–12964
Xie J, Zheng Z, Gao R, Wang W, Zhu S-C, Wu YN (2020) Generative voxelnet: learning energy-based models for 3d shape synthesis and analysis. IEEE Trans Pattern Anal Mach Intell 25:87
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR
Yang J, Zhang Q, Ni B, Li L, Liu J, Zhou M, Tian Q (2019) Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3323–3332
Zhu L, Wang B, Tian G, Wang W, Li C (2021) Towards point cloud completion: point rank sampling and cross-cascade graph cnn. Neurocomputing 461:1–16
He C, Zeng H, Huang J, Hua X.-S, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Lecun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (2014) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Pajdla T, Matas J (2004) Recognizing objects in range data using regional point descriptors. In: European Conference on Computer Vision
Wang Y, Gang P, Wu Z, Shi H (2004) Sphere-spin-image: A viewpoint-invariant surface representation for 3d face recognition. In: Computational Science - ICCS 2004, 4th International Conference, Kraków, Poland, June 6-9, 2004, Proceedings, Part II
Zhang Z (1998) Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: IEEE International Conference on Automatic Face & Gesture Recognition
Mo K (2015) Spatial transformer network. Adv Neural Inf Process Syst 28:1–9
Abeywickrama T, Cheema M.A, Taniar D (2016) k-nearest neighbors on road networks: A journey in experimentation and in-memory implementation. Proceedings of the VLDB Endowment 9(6)
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, pp. 12993–13000
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR)
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631
Liang M, Yang B, Wang S, Urtasun R (2020) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 641–656
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9775–9784
Yang Z, Sun Y, Liu S, Shen X, Jia J Std: Sparse-to-dense 3d object detector for point cloud. Tencent; Chinese University of Hong Kong
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2019) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538
Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3ddet: Perceptual-to-conceptual association for 3d point cloud object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13329–13338
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1711–1719
Hu J, Kuai T, Waslander SL (2022) Point density-aware voxels for lidar 3d object detection
Acknowledgements
This work was supported by the National Key R &D Program of China (2019YFA0708300) and National Natural Science Foundation of China (Grant No. 52074323).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, L., Chen, Z., Wang, B. et al. SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection. Neural Comput & Applic 35, 13417–13431 (2023). https://doi.org/10.1007/s00521-023-08382-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08382-7