3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition

Han, Xian-Feng; Huang, Xin-Yi; Sun, Shi-Jie; Wang, Ming-Jie

doi:10.1007/s10462-022-10165-w

3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition

Published: 11 March 2022

Volume 55, pages 6655–6671, (2022)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Xian-Feng Han¹,
Xin-Yi Huang¹,
Shi-Jie Sun ORCID: orcid.org/0000-0003-4043-8448² &
…
Ming-Jie Wang³

689 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, deep CNN-based methods have achieved significant success in solving various 2D computer vision issues. However, directly processing 3D point clouds with CNNs remains a challenging problem due to their irregular characteristic, which results in the comprehensive performance far from optimal. In this paper, we propose a novel trainable architecture for 3D point cloud based object recognition from the perspective of depth of network and attention mechanism for the first time. We first transform the input point cloud into regular volumetric representation using binary occupancy grid strategy. The output is then fed into our proposed 3D Dense-Attention CNN framework, dubbed as $\mathbf{3DDACNN }$, to obtain features with enhanced representation power. Extensive experiments on highly challenging datasets demonstrate the effectiveness of our proposed model, which can achieve remarkable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PointFormer: A Dual Perception Attention-Based Network for Point Cloud Classification

3D-GCNN - 3D Object Classification Using 3D Grid Convolutional Neural Networks

On learning the right attention point for feature enhancement

Article 23 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Behley J, Steinhage V, Cremers AB (2012) Performance of histogram descriptors for the classification of 3d laser range data in urban environments. In: Proceedings of the 2012 IEEE international conference on robotics and automation, pp 4391–4398
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Cheng S, Chen X, He X, Liu Z, Bai X (2021) Pra-net: point relation-aware network for 3d point cloud analysis. IEEE Trans Image Process 30:4436–4448
Article Google Scholar
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005
Engelcke M, Rao D, Wang D.Z, Tong C.H, Posner I (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1355–1366
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Fu J, Liu J, Tian H, Fang Z, Lu (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Fujiwara K, Hashimoto T (2020) Neural implicit embedding for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11734–11743
Graham B (2015) Sparse 3d convolutional neural networks. http://arxiv.org/abs/arXiv:1505.02890
Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3d local surface description and object recognition. Int J Comput Vis 105(1):63–86
Article MathSciNet MATH Google Scholar
Hadji I, DeSouza GN (2014) Local-to-global signature descriptor for 3d object recognition. In: Proceedings of Asian conference on computer vision, pp 570–584
Han Z, Lu H, Liu Z, Vong CM, Liua YS, Zwicker M, Han J, Chen CLP (2019) 3d2seqviews: aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Trans Image Process 28:3986
Article MathSciNet MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259
Article Google Scholar
Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Kasaei SH, Tomé AM, Lopes LS, Oliveira M (2016) Good: a global orthographic object descriptor for 3d object recognition and manipulation. Pattern Recogn Lett 83:312–320
Article Google Scholar
Klokov R, Lempitsky V (2017) Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE international conference on computer vision, pp 863–872
Lei H, Akhtar N, Mian A (2020) Spherical kernel for efficient graph convolution on 3d point clouds. In: Proceedings of the IEEE transactions on pattern analysis and machine intelligence
Lei H, Akhtar N, Mian A (2019) Octree guided cnn with spherical kernels for 3d point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Li J, Chen BM, Hee Lee G (2018) So-net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. In: Proceedings of the advances in neural information processing systems, pp 820–830
Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (sipf) for 3d point clouds and 3d multi-scale object detection. Neural Comput Appl 29(5):1209–1224
Article Google Scholar
Liu Y, Fan B, Xiang S, Pan C (2019a) Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Liu X, Han Z, Liu Y-S, Zwicker M (2019a) Point2sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: AAAI
Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Trans. Multimed. 21(5):1169–1182
Article Google Scholar
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 922–928
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems, vol 2. NIPS’14, pp 2204–2212
Patterson A, Mordohai P, Daniilidis K (2008) Object detection from large-scale 3d datasets using bottom-up and top-down descriptors. In: Proceedings of the European conference on computer vision, pp 553–566
Prakhya SM, Lin J, Chandrasekhar V, Lin W, Liu B (2017) 3dhopd: a fast low-dimensional 3-d descriptor. IEEE Robot Autom Lett 2(3):1472–1479
Article Google Scholar
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656
Qi CR, Su H, Mo K, Guibas LJ (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the advances in neural information processing systems, pp 5099– 5108
Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Hopc: histogram of oriented principal components of 3d pointclouds for action recognition. In: European conference on computer vision, Springer, pp 742–757
Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 452–460
Ravanbakhsh S, Schneider J, Poczos B (2016) Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500
Ren S, He K, Girshick R, Sun J (2015)Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, pp 91–99
Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3577–3586
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE international conference on robotics and automation, pp 3212–3217
Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3d recognition and pose using the viewpoint feature histogram. In: Proceedings of the 2010 IEEE/RSJ international conference on intelligent robots and systems, pp 2155–2162
Shen Y, Feng C, Yang Y, Tian D (2018) Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4548–4557
Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3693–3702
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. In: European conference on computer vision, Springer, pp 356–369
Wang DZ, Posner I (2015) Voting for voting in online point cloud object detection. In: Proceedings of the robotics: science and systems, vol 1, pp 10–15607
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph cnn for learning on point clouds. https://arxiv.org/abs/arXiv:1801.07829
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133
Article Google Scholar
Xie S, Liu S, Chen Z, Tu Z(2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4606–4615
Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 206–215
Yang J, Lee C, Ahn P, Lee H, Yi E, Kim J(2020) Pbp-net: point projection and back-projection network for 3d point cloud segmentation. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 8469–8475
Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5565–5573
Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16259–16268
Zhi S, Liu Y, Li X, Guo Y (2018) Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput Graph 71:199–207
Article Google Scholar
Zhong Y (2009) Intrinsic shape signatures: a shape descriptor for 3d object recognition. In: Proceedings of the 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 689–696
Zhou Y, Tuzel O(2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4490–4499

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 62002299), the Natural Science Foundation of Chongqing, China (No. cstc2020jcyj-msxmX0126), the Fundamental Research Funds for the Central Universities (No. SWU120005), the National Natural Science Foundation of China (Grant No. 62006026), and the Central Universities Basic Research Special Funds (Grant No. 300102241202).

Author information

Authors and Affiliations

College of Computer and Information Science, Southwest University, Tiansheng Road, Chongqing, 400715, China
Xian-Feng Han & Xin-Yi Huang
College of Information and Engineering, Changan University, Nan’er Huan Road, Xi’an, 710064, ShaanXi, China
Shi-Jie Sun
Department of Computer Science, Memorial University of Newfoundland, St. John’s, Canada
Ming-Jie Wang

Authors

Xian-Feng Han
View author publications
You can also search for this author inPubMed Google Scholar
Xin-Yi Huang
View author publications
You can also search for this author inPubMed Google Scholar
Shi-Jie Sun
View author publications
You can also search for this author inPubMed Google Scholar
Ming-Jie Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shi-Jie Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, XF., Huang, XY., Sun, SJ. et al. 3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition. Artif Intell Rev 55, 6655–6671 (2022). https://doi.org/10.1007/s10462-022-10165-w

Download citation

Published: 11 March 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10462-022-10165-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PointFormer: A Dual Perception Attention-Based Network for Point Cloud Classification

3D-GCNN - 3D Object Classification Using 3D Grid Convolutional Neural Networks

On learning the right attention point for feature enhancement

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now