Skip to main content

IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14260))

Included in the following conference series:

  • 1538 Accesses

Abstract

Nowadays, 3D object detection technology from point clouds develops rapidly. However, lots of small objects emerge in real point cloud scenes, which are hard to be detected due to few points, hindering overall detection accuracy. To address this issue, we propose a novel two-stage 3D object detection method, which introduces an attention strategy to enhance key structure information of objects, so as to promote overall detection accuracy, especially for small objects. Specifically, in the first stage, we employ the convolutional block attention module on the 3D sparse convolution layer to extract voxel features and further apply the Swin Transformer to enhance Bird’s Eye View (BEV) feature for generating high-quality proposals. Then, in the second stage, we apply a Voxel Set Abstraction (VSA) module to fuse voxel features and BEV features into keypoint features, followed by a Region of Interest (RoI) pooling module to obtain grid features for confidence prediction and box regression. Experiment results on the KITTI dataset prove that our method IMAM achieves excellent detection performance, especially for pedestrians and cyclists with small sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018). https://doi.org/10.1109/CVPR.2018.00472

  2. Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  3. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019). https://doi.org/10.1109/CVPR.2019.01298

  4. Yang, Z., Sun, Y., Liu, S., Jia, J.: 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020). https://doi.org/10.1109/CVPR42600.2020.01105

  5. He, C., Zeng, H., Huang, J., Hua, X.S., Zhang, L.: Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11873–11882 (2020). https://doi.org/10.1109/CVPR42600.2020.01189

  6. Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)

    Google Scholar 

  7. Shi, S., et al.: Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020). https://doi.org/10.1109/CVPR42600.2020.01054

  8. Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1951–1960 (2019). https://doi.org/10.1109/ICCV.2019.00204

  9. Shi, S., Wang, Z., Wang, X., Li, H.: Part-A2 net: 3d part-aware and aggregation neural network for object detection from point cloud, vol. 2(3), pp. 1–10. arXiv preprint arXiv:1907.03670 (2019). https://doi.org/10.1109/TPAMI.2020.2977026

  10. Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point r-cnn. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9775–9784 (2019)

    Google Scholar 

  11. Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–15 (2017)

    Google Scholar 

  12. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  13. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021). https://doi.org/10.1109/ICCV48922.2021.00986

  14. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  15. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale, pp. 1–22. arXiv preprint arXiv:2010.11929 (2020)

Download references

Acknowledgment

This work is supported by National Natural Science Foundation of China (No. 62106086) and the Natural Science Foundation of Hubei Province (No. 2021CFB564).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, J., Wu, H. (2023). IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44195-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44194-3

  • Online ISBN: 978-3-031-44195-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics