SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection

Zhu, Liping; Chen, Zhe; Wang, Bingyao; Tian, Gangyi; Ji, Laihu

doi:10.1007/s00521-023-08382-7

SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection

Original Article
Published: 14 March 2023

Volume 35, pages 13417–13431, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Liping Zhu^1,2,
Zhe Chen ORCID: orcid.org/0000-0002-5763-0404^1,2,
Bingyao Wang^1,2,
Gangyi Tian^1,2 &
…
Laihu Ji^1,2

268 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

3D object detection has been used in many fields, such as virtual reality, automatic driving and target tracking. 3D object detection methods usually use point clouds as input, but point clouds are disordered and rotationally invariant. To solve this problem, voxel-based methods convert point clouds into voxels. However, the raw point clouds contain a large number of background points which are not relevant to the target or not helpful for subsequent detection, and voxel-based methods invariably feed them directly into the network for 3D object detection. Besides, the voxel-based approach is to model the point cloud data to generate a plenty of voxels with the same dimension. But random sampling during voxel partition will make voxel representation weaker and reduce the performance of classifier as well as box regressor. We, therefore, propose a plug-to-play module, which contains a shape-awared filter(SAF) and a semantic-ranked sampler(SRS). SAF can effectively remove some of the background points in the raw point clouds, to accelerate inference indirectly. And, SRS can enhance the expression of voxel feature by retaining the points of high confidence. Finally, we remove previous orientation classifier and propose a new loss method named ADIoU loss to improve the orientation estimation performance. Experiments on the KITTI car detection bench-mark demonstrate that our method shows faster inference speed and higher detection accuracy compared with SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

ET-PointPillars: improved PointPillars for 3D object detection based on optimized voxel downsampling

Article 21 April 2024

Yiyi Liu, Zhengyi Yang, … Wangxin Cheng

GridPointNet: Grid and Point-Based 3D Object Detection from Point Cloud

Data availability statements

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Wang Y, Ye J (2020) An overview of 3d object detection. arXiv preprint https://doi.org/10.48550/arXiv.2010.15614
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18:10
Article Google Scholar
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Graham B (2014) Spatially-sparse convolutional neural networks. CoRR abs/1409.6070 http://arxiv.org/abs/1409.6070
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Qi C.R, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR abs/1706.02413 http://arxiv.org/abs/1706.02413
Ding Z, Han X, Niethammer M (2019) Votenet: a deep learning label fusion method for multi-atlas segmentation. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, Yap P-T, Khan A (eds) Medical image computing and computer assisted intervention - MICCAI 2019. Springer, Cham, pp 202–210
Chapter Google Scholar
He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zheng W, Tang W, Jiang L, Fu C-W (2021) Se-ssd: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14494–14503
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3d bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ali W, Abdelkarim S, Zahran M, Zidan M, Sallab AE (2018) Yolo3d: End-to-end real-time 3d oriented object bounding box detection from lidar point cloud. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp. 0–0
Wang DZ, Posner I (2015) Voting for voting in online point cloud object detection. In: Robotics: Science and Systems 2015, pp. 10–15
Qi C.R, Wei L, Wu C, Hao S, Guibas LJ (2017) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918–927
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1907–1915
Su P, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection
Lang AH, Vora S, Caesar H, Zhou L, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
He C, Zeng H, Huang J, Hua X.S, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zheng W, Tang W, Chen S, Jiang L, Fu CW (2020) Cia-ssd: Confident iou-aware single-stage object detector from point cloud. In: Proceedings of the AAAI conference on artificial intelligence, pp. 3555–3562
Franzen D, Cihacek L, Hofman V, Swenson L (1998) Topography-based sampling compared with grid sampling in the northern great plains. J Prod Agric 11(3):364–370
Article Google Scholar
Qi CR, Chen X, Litany O, Guibas LJ (2020) Imvotenet: Boosting 3d object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4404–4413
Huang Z, Yu Y, Xu J, Ni F, Le X (2020) Pf-net: Point fractal network for 3d point cloud completion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. Adv Neural Inf Process Syst 31:87
Google Scholar
Wu W, Qi Z, Fuxin L (2019) Pointconv: Deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598
Zhang W, Su S, Wang B, Hong Q, Sun L (2020) Local k-nns pattern in omni-direction graph convolution neural network for 3d point clouds. Neurocomputing 413:487–498
Article Google Scholar
Calderhead B, Girolami M (2011) Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods. Interface Focus 1(6):821–835
Article MATH Google Scholar
Lang I, Manor A, Avidan S (2020) Samplenet: Differentiable point cloud sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7578–7588
Liu M, Sheng L, Yang S, Shao J, Hu S-M (2020) Morphing and sampling network for dense point cloud completion. Proceedings of the AAAI Conference on Artificial Intelligence 34:11596–11603
Nezhadarya E, Taghavi E, Razani R, Liu B, Luo J (2020) Adaptive hierarchical down-sampling for point cloud classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12956–12964
Xie J, Zheng Z, Gao R, Wang W, Zhu S-C, Wu YN (2020) Generative voxelnet: learning energy-based models for 3d shape synthesis and analysis. IEEE Trans Pattern Anal Mach Intell 25:87
Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR
Yang J, Zhang Q, Ni B, Li L, Liu J, Zhou M, Tian Q (2019) Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3323–3332
Zhu L, Wang B, Tian G, Wang W, Li C (2021) Towards point cloud completion: point rank sampling and cross-cascade graph cnn. Neurocomputing 461:1–16
Article Google Scholar
He C, Zeng H, Huang J, Hua X.-S, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Lecun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (2014) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Pajdla T, Matas J (2004) Recognizing objects in range data using regional point descriptors. In: European Conference on Computer Vision
Wang Y, Gang P, Wu Z, Shi H (2004) Sphere-spin-image: A viewpoint-invariant surface representation for 3d face recognition. In: Computational Science - ICCS 2004, 4th International Conference, Kraków, Poland, June 6-9, 2004, Proceedings, Part II
Zhang Z (1998) Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: IEEE International Conference on Automatic Face & Gesture Recognition
Mo K (2015) Spatial transformer network. Adv Neural Inf Process Syst 28:1–9
Google Scholar
Abeywickrama T, Cheema M.A, Taniar D (2016) k-nearest neighbors on road networks: A journey in experimentation and in-memory implementation. Proceedings of the VLDB Endowment 9(6)
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, pp. 12993–13000
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR)
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631
Liang M, Yang B, Wang S, Urtasun R (2020) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 641–656
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9775–9784
Yang Z, Sun Y, Liu S, Shen X, Jia J Std: Sparse-to-dense 3d object detector for point cloud. Tencent; Chinese University of Hong Kong
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2019) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538
Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3ddet: Perceptual-to-conceptual association for 3d point cloud object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13329–13338
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1711–1719
Hu J, Kuai T, Waslander SL (2022) Point density-aware voxels for lidar 3d object detection

Download references

Acknowledgements

This work was supported by the National Key R &D Program of China (2019YFA0708300) and National Natural Science Foundation of China (Grant No. 52074323).

Author information

Authors and Affiliations

College of Information Science and Engineering, China University of Petroleum, Fuxue Road, Beijing, 102249, China
Liping Zhu, Zhe Chen, Bingyao Wang, Gangyi Tian & Laihu Ji
Beijing Key Laboratory of Petroleum Data Mining, China University of Petroleum, Fuxue Road, Beijing, 102249, China
Liping Zhu, Zhe Chen, Bingyao Wang, Gangyi Tian & Laihu Ji

Authors

Liping Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bingyao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gangyi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Laihu Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, L., Chen, Z., Wang, B. et al. SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection. Neural Comput & Applic 35, 13417–13431 (2023). https://doi.org/10.1007/s00521-023-08382-7

Download citation

Received: 24 June 2022
Accepted: 13 February 2023
Published: 14 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00521-023-08382-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection

Abstract

Access this article

Similar content being viewed by others

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

ET-PointPillars: improved PointPillars for 3D object detection based on optimized voxel downsampling

GridPointNet: Grid and Point-Based 3D Object Detection from Point Cloud

Data availability statements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SFSS-Net:shape-awared filter and sematic-ranked sampler for voxel-based 3D object detection

Abstract

Access this article

Similar content being viewed by others

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

ET-PointPillars: improved PointPillars for 3D object detection based on optimized voxel downsampling

GridPointNet: Grid and Point-Based 3D Object Detection from Point Cloud

Data availability statements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation