Abstract
Recently, deep CNN-based methods have achieved significant success in solving various 2D computer vision issues. However, directly processing 3D point clouds with CNNs remains a challenging problem due to their irregular characteristic, which results in the comprehensive performance far from optimal. In this paper, we propose a novel trainable architecture for 3D point cloud based object recognition from the perspective of depth of network and attention mechanism for the first time. We first transform the input point cloud into regular volumetric representation using binary occupancy grid strategy. The output is then fed into our proposed 3D Dense-Attention CNN framework, dubbed as \(\mathbf{3DDACNN }\), to obtain features with enhanced representation power. Extensive experiments on highly challenging datasets demonstrate the effectiveness of our proposed model, which can achieve remarkable performance.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Behley J, Steinhage V, Cremers AB (2012) Performance of histogram descriptors for the classification of 3d laser range data in urban environments. In: Proceedings of the 2012 IEEE international conference on robotics and automation, pp 4391–4398
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Cheng S, Chen X, He X, Liu Z, Bai X (2021) Pra-net: point relation-aware network for 3d point cloud analysis. IEEE Trans Image Process 30:4436–4448
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3d object recognition. In: Proceedings of the 2010 IEEE computer society conference on computer vision and pattern recognition, pp 998–1005
Engelcke M, Rao D, Wang D.Z, Tong C.H, Posner I (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1355–1366
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Fu J, Liu J, Tian H, Fang Z, Lu (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Fujiwara K, Hashimoto T (2020) Neural implicit embedding for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11734–11743
Graham B (2015) Sparse 3d convolutional neural networks. http://arxiv.org/abs/arXiv:1505.02890
Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3d local surface description and object recognition. Int J Comput Vis 105(1):63–86
Hadji I, DeSouza GN (2014) Local-to-global signature descriptor for 3d object recognition. In: Proceedings of Asian conference on computer vision, pp 570–584
Han Z, Lu H, Liu Z, Vong CM, Liua YS, Zwicker M, Han J, Chen CLP (2019) 3d2seqviews: aggregating sequential views for 3d global feature learning by cnn with hierarchical attention aggregation. IEEE Trans Image Process 28:3986
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259
Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Kasaei SH, Tomé AM, Lopes LS, Oliveira M (2016) Good: a global orthographic object descriptor for 3d object recognition and manipulation. Pattern Recogn Lett 83:312–320
Klokov R, Lempitsky V (2017) Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE international conference on computer vision, pp 863–872
Lei H, Akhtar N, Mian A (2020) Spherical kernel for efficient graph convolution on 3d point clouds. In: Proceedings of the IEEE transactions on pattern analysis and machine intelligence
Lei H, Akhtar N, Mian A (2019) Octree guided cnn with spherical kernels for 3d point clouds. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Li J, Chen BM, Hee Lee G (2018) So-net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9397–9406
Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: convolution on x-transformed points. In: Proceedings of the advances in neural information processing systems, pp 820–830
Lin B, Wang F, Zhao F, Sun Y (2018) Scale invariant point feature (sipf) for 3d point clouds and 3d multi-scale object detection. Neural Comput Appl 29(5):1209–1224
Liu Y, Fan B, Xiang S, Pan C (2019a) Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Liu X, Han Z, Liu Y-S, Zwicker M (2019a) Point2sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In: AAAI
Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Trans. Multimed. 21(5):1169–1182
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 922–928
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems, vol 2. NIPS’14, pp 2204–2212
Patterson A, Mordohai P, Daniilidis K (2008) Object detection from large-scale 3d datasets using bottom-up and top-down descriptors. In: Proceedings of the European conference on computer vision, pp 553–566
Prakhya SM, Lin J, Chandrasekhar V, Lin W, Liu B (2017) 3dhopd: a fast low-dimensional 3-d descriptor. IEEE Robot Autom Lett 2(3):1472–1479
Qi CR, Su H, Nießner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view cnns for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656
Qi CR, Su H, Mo K, Guibas LJ (2017a) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H, Guibas LJ (2017b) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the advances in neural information processing systems, pp 5099– 5108
Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) Hopc: histogram of oriented principal components of 3d pointclouds for action recognition. In: European conference on computer vision, Springer, pp 742–757
Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 452–460
Ravanbakhsh S, Schneider J, Poczos B (2016) Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500
Ren S, He K, Girshick R, Sun J (2015)Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, pp 91–99
Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3577–3586
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE international conference on robotics and automation, pp 3212–3217
Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3d recognition and pose using the viewpoint feature histogram. In: Proceedings of the 2010 IEEE/RSJ international conference on intelligent robots and systems, pp 2155–2162
Shen Y, Feng C, Yang Y, Tian D (2018) Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4548–4557
Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3693–3702
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. In: European conference on computer vision, Springer, pp 356–369
Wang DZ, Posner I (2015) Voting for voting in online point cloud object detection. In: Proceedings of the robotics: science and systems, vol 1, pp 10–15607
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2018) Dynamic graph cnn for learning on point clouds. https://arxiv.org/abs/arXiv:1801.07829
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133
Xie S, Liu S, Chen Z, Tu Z(2018) Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4606–4615
Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 206–215
Yang J, Lee C, Ahn P, Lee H, Yi E, Kim J(2020) Pbp-net: point projection and back-projection network for 3d point cloud segmentation. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 8469–8475
Zhao H, Jiang L, Fu C-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5565–5573
Zhao H, Jiang L, Jia J, Torr PH, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 16259–16268
Zhi S, Liu Y, Li X, Guo Y (2018) Toward real-time 3d object recognition: a lightweight volumetric cnn framework using multitask learning. Comput Graph 71:199–207
Zhong Y (2009) Intrinsic shape signatures: a shape descriptor for 3d object recognition. In: Proceedings of the 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 689–696
Zhou Y, Tuzel O(2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4490–4499
Acknowledgements
This research was supported by the National Natural Science Foundation of China (No. 62002299), the Natural Science Foundation of Chongqing, China (No. cstc2020jcyj-msxmX0126), the Fundamental Research Funds for the Central Universities (No. SWU120005), the National Natural Science Foundation of China (Grant No. 62006026), and the Central Universities Basic Research Special Funds (Grant No. 300102241202).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, XF., Huang, XY., Sun, SJ. et al. 3DDACNN: 3D dense attention convolutional neural network for point cloud based object recognition. Artif Intell Rev 55, 6655–6671 (2022). https://doi.org/10.1007/s10462-022-10165-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10165-w