Abstract
Nowadays, 3D object detection technology from point clouds develops rapidly. However, lots of small objects emerge in real point cloud scenes, which are hard to be detected due to few points, hindering overall detection accuracy. To address this issue, we propose a novel two-stage 3D object detection method, which introduces an attention strategy to enhance key structure information of objects, so as to promote overall detection accuracy, especially for small objects. Specifically, in the first stage, we employ the convolutional block attention module on the 3D sparse convolution layer to extract voxel features and further apply the Swin Transformer to enhance Bird’s Eye View (BEV) feature for generating high-quality proposals. Then, in the second stage, we apply a Voxel Set Abstraction (VSA) module to fuse voxel features and BEV features into keypoint features, followed by a Region of Interest (RoI) pooling module to obtain grid features for confidence prediction and box regression. Experiment results on the KITTI dataset prove that our method IMAM achieves excellent detection performance, especially for pedestrians and cyclists with small sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018). https://doi.org/10.1109/CVPR.2018.00472
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019). https://doi.org/10.1109/CVPR.2019.01298
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020). https://doi.org/10.1109/CVPR42600.2020.01105
He, C., Zeng, H., Huang, J., Hua, X.S., Zhang, L.: Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11873–11882 (2020). https://doi.org/10.1109/CVPR42600.2020.01189
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Shi, S., et al.: Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020). https://doi.org/10.1109/CVPR42600.2020.01054
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1951–1960 (2019). https://doi.org/10.1109/ICCV.2019.00204
Shi, S., Wang, Z., Wang, X., Li, H.: Part-A2 net: 3d part-aware and aggregation neural network for object detection from point cloud, vol. 2(3), pp. 1–10. arXiv preprint arXiv:1907.03670 (2019). https://doi.org/10.1109/TPAMI.2020.2977026
Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9775–9784 (2019)
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–15 (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale, pp. 1–22. arXiv preprint arXiv:2010.11929 (2020)
Acknowledgment
This work is supported by National Natural Science Foundation of China (No. 62106086) and the Natural Science Foundation of Hubei Province (No. 2021CFB564).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, J., Wu, H. (2023). IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-44195-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)