Skip to main content
Log in

ESA-SSD: single-stage object detection network using deep hierarchical feature learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection, localization and identification based on 3D point cloud data are widely applied in autonomous driving, robotics, augmented reality and other fields. We propose a raw point cloud-based, lightweight and single-stage 3D object detector ESA-SSD, which aggregate contextual features to key points using multi-set abstraction modules. At first, an enhanced feature expression module is utilize to encode the geometric information and feature content of points within the neighborhood for strengthening the feature expression of each point. And then, semantic aware sampling method performs down-sampling for making more foreground points included in the key points and reducing the ineffective learning of the network. To focus the network more on the key features of the object, the spatial attention mechanism is introduced to weight the features at each point. Finally, the center of the instance is estimated with the contextual instance centroid perception module in order to making the context features of the detection object extracted adequately. We performed experimental validation on the publicly available KITTI and DAIR-V2X datasets. On the KITTI dataset, ESA-SSD achieves detection accuracies of 88.58%, 80.26% and 76.80% for the car categories with easy, medium and difficult detection difficulties; on the DAIR-V2X dataset, the detection accuracies for the car categories with three detection difficulties reach 73.21%, 61.62% and 56.99%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

All data in the manuscript can be found at: https://www.cvlibs.net/datasets/kitti/ and https://thudair.baai.ac.cn/index. All other data are available from the authors upon reasonable request.

References

  1. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  2. Gupta S, Devi DTU (2020) YOLOv2 based real time object detection. Int J Comput Sci Trends Technol IJCST 8:26–30

    Google Scholar 

  3. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.48550/arXiv.1804.02767

  4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  5. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  6. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp, 1440–1448

  7. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6):1137–1149

  8. Qi D, Tan W, Yao Q, Liu J (2022) YOLO5Face: why reinventing a face detector. In: European Conference on Computer Vision. Springer Nature Switzerland, Cham, pp 228–244

  9. Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Gonçalves GR, Schwartz WR, Menotti D (2018) A robust real-time automatic license plate recognition based on the YOLO detector. In: 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–10. https://doi.org/10.1109/IJCNN.2018.8489629

  10. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703

  11. Yan Y, Mao Y, Li B (2018) SECOND: sparsely embedded convolutional detection. Sensors 18(10):3337. https://doi.org/10.3390/s18103337

    Article  Google Scholar 

  12. Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12697–12705. https://doi.org/10.1109/CVPR.2019.01298

  13. Zheng W, Tang W, Chen S, Jiang L, Fu CW (2021) Cia-ssd: Confident iou-aware single-stage object detector from point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence 35(4):3555–3562. https://doi.org/10.1609/aaai.v35i4.16470

  14. Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 770–779

  15. Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020) PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10529–10538. https://doi.org/10.1109/CVPR42600.2020.01054

  16. Shi S, Jiang L, Deng J, Wang Z, Guo C, Shi J, Wang X, Li H (2023) PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. Int J Comput Vis 131(2):531–551

    Article  Google Scholar 

  17. Deng J, Shi S, Li P, Zhou W, Zhang Y, Li H (2021) Voxel r-cnn: towards high performance voxel-based 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 35(2):1201–1209. https://doi.org/10.1609/aaai.v35i2.16207

  18. Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), pp 1951–1960. https://doi.org/10.1109/ICCV.2019.00204

  19. Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 11040–11048 https://doi.org/10.1109/CVPR42600.2020.01105

  20. Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499

  21. Qiu S, Anwar S, Barnes N (2021) Pnp-3d: a plug-and-play for 3d point clouds. IEEE Trans Pattern Anal Mach Intell 45(1):1312–1319

    Article  Google Scholar 

  22. Zhang Y, Hu Q, Xu G, Ma Y, Wan J, Guo Y (2022) Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 18953–18962. https://doi.org/10.1109/CVPR52688.2022.01838

  23. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  24. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074

  25. Yu H, Luo Y, Shu M, Huo Y, Yang Z, Shi Y, Guo Z, Li H, Hu X, Yuan J, Nie Z (2022) Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp 21361–21370. https://doi.org/10.1109/CVPR52688.2022.02067

  26. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660

  27. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108

  28. Liu H, Tian S (2023) Deep 3D point cloud classification and segmentation network based on GateNet. Visual Comput 1–11.https://doi.org/10.1007/s00371-023-02826-w

  29. Pan X, Xia Z, Song S, Li LE, Huang G (2021) 3d object detection with pointformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7463–7472

  30. Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: Robust 3d object detection from point clouds with triple attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):11677–11684. https://doi.org/10.1609/aaai.v34i07.6837

  31. Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9277–9286

  32. Zhou D, Fang J, Song X, Guan C, Yin J, Dai Y, Yang R (2019) Iou loss for 2d/3d object detection. In: 2019 International Conference on 3D Vision (3DV). IEEE, pp 85–94

  33. Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3Ddet: Perceptual-to-conceptual association for 3D point cloud object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13329–13338. https://doi.org/10.1109/CVPR42600.2020.01334

  34. He C, Zeng H, Huang J, Hua XS, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11873–11882. https://doi.org/10.1109/CVPR42600.2020.01189

  35. Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intell 43(8):2647–2664

    Google Scholar 

  36. Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9775–9784

  37. Noh J, Lee S, Ham B (2021) Hvpr: Hybrid voxel-point representation for single-stage 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14605–14614

  38. Li J, Luo S, Zhu Z, Dai H, Krylov AS, Ding Y, Shao L(2020) 3D IoU-Net: IoU guided 3D object detector for point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.48550/arXiv.2004.04962

  39. Shi W, Rajkumar R (2020) Point-gnn: graph neural network for 3d object detection in a point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1711–1719. https://doi.org/10.1109/CVPR42600.2020.00178

  40. Zheng W, Tang W, Jiang L, Fu CW (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14494–14503

  41. Shi G, Li R, Ma C (2022) Pillarnet: real-time and high-performance pillar-based 3d object detection. In: European Conference on Computer Vision, pp 35–52

Download references

Funding

This work was funded by National Natural Science Foundation of China (Under Grant: 62176018).

This work was funded by University Innovation Fund of China for Production, Education and Research (Under Grant: 2022IT229).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Liu.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Dong, Z. ESA-SSD: single-stage object detection network using deep hierarchical feature learning. Multimed Tools Appl 83, 56207–56228 (2024). https://doi.org/10.1007/s11042-023-17754-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17754-z

Keywords