Object detection, localization and identification based on 3D point cloud data are widely applied in autonomous driving, robotics, augmented reality and other fields. We propose a raw point cloud-based, lightweight and single-stage 3D object detector ESA-SSD, which aggregate contextual features to key points using multi-set abstraction modules. At first, an enhanced feature expression module is utilize to encode the geometric information and feature content of points within the neighborhood for strengthening the feature expression of each point. And then, semantic aware sampling method performs down-sampling for making more foreground points included in the key points and reducing the ineffective learning of the network. To focus the network more on the key features of the object, the spatial attention mechanism is introduced to weight the features at each point. Finally, the center of the instance is estimated with the contextual instance centroid perception module in order to making the context features of the detection object extracted adequately. We performed experimental validation on the publicly available KITTI and DAIR-V2X datasets. On the KITTI dataset, ESA-SSD achieves detection accuracies of 88.58%, 80.26% and 76.80% for the car categories with easy, medium and difficult detection difficulties; on the DAIR-V2X dataset, the detection accuracies for the car categories with three detection difficulties reach 73.21%, 61.62% and 56.99%.

Data availability
All data in the manuscript can be found at: https://www.cvlibs.net/datasets/kitti/ and https://thudair.baai.ac.cn/index. All other data are available from the authors upon reasonable request.
This work was funded by National Natural Science Foundation of China (Under Grant: 62176018).
This work was funded by University Innovation Fund of China for Production, Education and Research (Under Grant: 2022IT229).
Liu, H., Dong, Z. ESA-SSD: single-stage object detection network using deep hierarchical feature learning. Multimed Tools Appl 83, 56207–56228 (2024). https://doi.org/10.1007/s11042-023-17754-z
