Skip to main content
Log in

3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Recently, deep CNN-based methods have achieved significant success in solving various 2D computer vision issues. However, directly processing 3D point clouds with CNNs remains a challenging problem due to their irregular characteristic, which results in the comprehensive performance far from optimal. In this paper, we propose a novel trainable architecture for 3D point cloud based object recognition from the perspective of depth of network and attention mechanism for the first time. We first transform the input point cloud into regular volumetric representation using binary occupancy grid strategy. The output is then fed into our proposed 3D Dense-Attention CNN framework, dubbed as \(\mathbf{3DDACNN }\), to obtain features with enhanced representation power. Extensive experiments on highly challenging datasets demonstrate the effectiveness of our proposed model, which can achieve remarkable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Behley J, Steinhage V, Cremers AB (2012) Performance of histogram descriptors for the classification of 3d laser range data in urban environments. In: Proceedings of the 2012 IEEE international conference on robotics and automation, pp 4391–4398

  • Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492

  • Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667

  • Cheng S, Chen X, He X, Liu Z, Bai X (2021) Pra-net: point relation-aware network for 3d point cloud analysis. IEEE Trans Image Process 30:4436–4448

    Article  Google Scholar 

  • Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005

  • Engelcke M, Rao D, Wang D.Z, Tong C.H, Posner I (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1355–1366

  • Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446

  • Fu J, Liu J, Tian H, Fang Z, Lu (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  • Fujiwara K, Hashimoto T (2020) Neural implicit embedding for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11734–11743

  • Graham B (2015) Sparse 3d convolutional neural networks. http://arxiv.org/abs/arXiv:1505.02890

  • Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3d local surface description and object recognition. Int J Comput Vis 105(1):63–86

    Article  MathSciNet  MATH  Google Scholar 

  • Hadji I, DeSouza GN (2014) Local-to-global signature descriptor for 3d object recognition. In: Proceedings of Asian conference on computer vision, pp 570–584

  • Han Z, Lu H, Liu Z, Vong CM, Liua YS, Zwicker M, Han J, Chen CLP (2019) 3d2seqviews: aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Trans Image Process 28:3986

    Article  MathSciNet  MATH  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  • Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  • Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259

    Article  Google Scholar 

  • Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025

  • Kasaei SH, Tomé AM, Lopes LS, Oliveira M (2016) Good: a global orthographic object descriptor for 3d object recognition and manipulation. Pattern Recogn Lett 83:312–320

    Article  Google Scholar 

  • Klokov R, Lempitsky V (2017) Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE international conference on computer vision, pp 863–872

  • Lei H, Akhtar N, Mian A (2020) Spherical kernel for efficient graph convolution on 3d point clouds. In: Proceedings of the IEEE transactions on pattern analysis and machine intelligence

  • Lei H, Akhtar N, Mian A (2019) Octree guided cnn with spherical kernels for 3d point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  • Li J, Chen BM, Hee Lee G (2018) So-net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406

  • Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. In: Proceedings of the advances in neural information processing systems, pp 820–830

  • Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (sipf) for 3d point clouds and 3d multi-scale object detection. Neural Comput Appl 29(5):1209–1224

    Article  Google Scholar 

  • Liu Y, Fan B, Xiang S, Pan C (2019a) Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  • Liu X, Han Z, Liu Y-S, Zwicker M (2019a) Point2sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: AAAI

  • Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Trans. Multimed. 21(5):1169–1182

    Article  Google Scholar 

  • Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 922–928

  • Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems, vol 2. NIPS’14, pp 2204–2212

  • Patterson A, Mordohai P, Daniilidis K (2008) Object detection from large-scale 3d datasets using bottom-up and top-down descriptors. In: Proceedings of the European conference on computer vision, pp 553–566

  • Prakhya SM, Lin J, Chandrasekhar V, Lin W, Liu B (2017) 3dhopd: a fast low-dimensional 3-d descriptor. IEEE Robot Autom Lett 2(3):1472–1479

    Article  Google Scholar 

  • Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656

  • Qi CR, Su H, Mo K, Guibas LJ (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  • Qi CR, Yi L, Su H, Guibas LJ (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the advances in neural information processing systems, pp 5099– 5108

  • Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Hopc: histogram of oriented principal components of 3d pointclouds for action recognition. In: European conference on computer vision, Springer, pp 742–757

  • Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 452–460

  • Ravanbakhsh S, Schneider J, Poczos B (2016) Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500

  • Ren S, He K, Girshick R, Sun J (2015)Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, pp 91–99

  • Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3577–3586

  • Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE international conference on robotics and automation, pp 3212–3217

  • Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3d recognition and pose using the viewpoint feature histogram. In: Proceedings of the 2010 IEEE/RSJ international conference on intelligent robots and systems, pp 2155–2162

  • Shen Y, Feng C, Yang Y, Tian D (2018) Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4548–4557

  • Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3693–3702

  • Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953

  • Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. In: European conference on computer vision, Springer, pp 356–369

  • Wang DZ, Posner I (2015) Voting for voting in online point cloud object detection. In: Proceedings of the robotics: science and systems, vol 1, pp 10–15607

  • Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  • Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph cnn for learning on point clouds. https://arxiv.org/abs/arXiv:1801.07829

  • Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  • Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920

  • Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133

    Article  Google Scholar 

  • Xie S, Liu S, Chen Z, Tu Z(2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4606–4615

  • Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 206–215

  • Yang J, Lee C, Ahn P, Lee H, Yi E, Kim J(2020) Pbp-net: point projection and back-projection network for 3d point cloud segmentation. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 8469–8475

  • Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5565–5573

  • Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16259–16268

  • Zhi S, Liu Y, Li X, Guo Y (2018) Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput Graph 71:199–207

    Article  Google Scholar 

  • Zhong Y (2009) Intrinsic shape signatures: a shape descriptor for 3d object recognition. In: Proceedings of the 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 689–696

  • Zhou Y, Tuzel O(2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4490–4499

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 62002299), the Natural Science Foundation of Chongqing, China (No. cstc2020jcyj-msxmX0126), the Fundamental Research Funds for the Central Universities (No. SWU120005), the National Natural Science Foundation of China (Grant No. 62006026), and the Central Universities Basic Research Special Funds (Grant No. 300102241202).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi-Jie Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, XF., Huang, XY., Sun, SJ. et al. 3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition. Artif Intell Rev 55, 6655–6671 (2022). https://doi.org/10.1007/s10462-022-10165-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10165-w

Keywords