IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud

Zhou, Jing; Wu, Han

doi:10.1007/978-3-031-44195-0_10

Jing Zhou¹¹ &
Han Wu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14260))

Included in the following conference series:

International Conference on Artificial Neural Networks

1538 Accesses

Abstract

Nowadays, 3D object detection technology from point clouds develops rapidly. However, lots of small objects emerge in real point cloud scenes, which are hard to be detected due to few points, hindering overall detection accuracy. To address this issue, we propose a novel two-stage 3D object detection method, which introduces an attention strategy to enhance key structure information of objects, so as to promote overall detection accuracy, especially for small objects. Specifically, in the first stage, we employ the convolutional block attention module on the 3D sparse convolution layer to extract voxel features and further apply the Swin Transformer to enhance Bird’s Eye View (BEV) feature for generating high-quality proposals. Then, in the second stage, we apply a Voxel Set Abstraction (VSA) module to fuse voxel features and BEV features into keypoint features, followed by a Region of Interest (RoI) pooling module to obtain grid features for confidence prediction and box regression. Experiment results on the KITTI dataset prove that our method IMAM achieves excellent detection performance, especially for pedestrians and cyclists with small sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MFNCA: Multi-level Fusion Network Based on Cross Attention for 3D Point Cloud Object Detection

DA-TSD: Double Attention Two-Stage 3D Object Detector from Point Clouds

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

Article Open access 09 February 2024

References

Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018). https://doi.org/10.1109/CVPR.2018.00472
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019). https://doi.org/10.1109/CVPR.2019.01298
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020). https://doi.org/10.1109/CVPR42600.2020.01105
He, C., Zeng, H., Huang, J., Hua, X.S., Zhang, L.: Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11873–11882 (2020). https://doi.org/10.1109/CVPR42600.2020.01189
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Google Scholar
Shi, S., et al.: Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020). https://doi.org/10.1109/CVPR42600.2020.01054
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1951–1960 (2019). https://doi.org/10.1109/ICCV.2019.00204
Shi, S., Wang, Z., Wang, X., Li, H.: Part-A2 net: 3d part-aware and aggregation neural network for object detection from point cloud, vol. 2(3), pp. 1–10. arXiv preprint arXiv:1907.03670 (2019). https://doi.org/10.1109/TPAMI.2020.2977026
Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9775–9784 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–15 (2017)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale, pp. 1–22. arXiv preprint arXiv:2010.11929 (2020)

Download references

Acknowledgment

This work is supported by National Natural Science Foundation of China (No. 62106086) and the Natural Science Foundation of Hubei Province (No. 2021CFB564).

Author information

Authors and Affiliations

School of Artificial Intelligence, Jianghan University, Wuhan, 430056, China
Jing Zhou & Han Wu

Authors

Jing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Han Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Zhou .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, J., Wu, H. (2023). IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-44195-0_10
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud