Three-Dimensional Object Detection Network Based on Coordinate Attention and Overlapping Region Penalty Mechanisms

Li, Wenxin; Zhu, Shiyu; Liu, Hongzhi; Zhang, Pinzheng; Zhang, Xiaoqin

doi:10.1007/978-3-031-47665-5_7

Wenxin Li¹³,
Shiyu Zhu¹⁴,
Hongzhi Liu¹⁵,
Pinzheng Zhang¹⁵ &
…
Xiaoqin Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14408))

Included in the following conference series:

Asian Conference on Pattern Recognition

256 Accesses

Abstract

Three-dimensional target detection is a key technology in the fields of autonomous driving and robot control for applications such as self-driving cars and unmanned aircraft systems. In order to achieve high detection accuracy, this paper proposes a 3D target detection network with a coordinate attention training mechanism that generates voting feature points for better detection ability and an overlap region penalty mechanism that reduces false detection. In comparative experiments on public large-scale 3D datasets including the Scannet dataset and SUN-RGB-D dataset, the proposed method obtained an average detection accuracy mAP of 60.1% and 58.0% with an intersection ratio of 0.25, which demonstrates its superior effectiveness over the current main algorithms such as F-PointNet, VoxelNet and MV3D. The improved method is expected to achieve higher accuracy for 3D object detection relying only on point cloud information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ss, A., Svavp, B.: Techniques and challenges of face recognition: a critical review. Procedia Comput. Sci. 143, 536–543 (2018)
Article Google Scholar
Yu, H., Yang, Z., Tan, L., et al.: Methods and datasets on semantic segmentation: a review. Neurocomputing 304, 82–103 (2018)
Google Scholar
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Google Scholar
Qi, C.R., Yi, L., Su, H., et al.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, 30 (2017)
Google Scholar
Qi, C.R., Litany, O., He, K., et al.: Deep hough voting for 3d object detection in point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9277–9286 (2019)
Google Scholar
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Lang, A.H., Vora, S., Caesar, H., et al.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Shi, S., Guo, C., Jiang, L., et al.: Pv-rcnn: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)
Google Scholar
Chen, X., Ma, H., Wan, J., et al.: Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Google Scholar
Qi, C.R., Liu, W., Wu, C., et al.: Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Google Scholar
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Google Scholar
Dai, A., Chang, A.X., Savva, M., et al.: Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Song, S., Xiao, J.: Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 808–816 (2016)
Google Scholar
Hou, J., Dai, A., Nießner, M.: 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., et al.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Yi, L., Zhao, W., Wang, H., et al.: Gspn: generative shape proposal network for 3d instance segmentation in point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3947–3956
Google Scholar

Download references

Acknowledgments

The research was supported by the Zhejiang Provincial Natural Science Foundation of China (Grant No. LQ21A040007), and Scientific Research Fund of Zhejiang Provincial Education Department (Grant No. Y201941856).

Author information

Authors and Affiliations

School of Information and Electronic Engineering (Sussex Artificial Intelligence Institute), Zhejiang Gongshang University, Hangzhou, 310018, China
Wenxin Li & Xiaoqin Zhang
National Key Laboratory of Transient of Physics, Nanjing University of Science and Technology, Nanjing, 210094, China
Shiyu Zhu
School of Computer Science and Engineering, Southeast University, Nanjing, 210096, China
Hongzhi Liu & Pinzheng Zhang

Authors

Wenxin Li
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pinzheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoqin Zhang .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Bejing, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, W., Zhu, S., Liu, H., Zhang, P., Zhang, X. (2023). Three-Dimensional Object Detection Network Based on Coordinate Attention and Overlapping Region Penalty Mechanisms. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14408. Springer, Cham. https://doi.org/10.1007/978-3-031-47665-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-47665-5_7
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47664-8
Online ISBN: 978-3-031-47665-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics