Global Hierarchical Attention for 3D Point Cloud Analysis

Jia, Dan; Hermans, Alexander; Leibe, Bastian

doi:10.1007/978-3-031-16788-1_17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13485))

Included in the following conference series:

DAGM German Conference on Pattern Recognition

1834 Accesses

Abstract

We propose a new attention mechanism, called Global Hierarchical Attention (GHA), for 3D point cloud analysis. GHA approximates the regular global dot-product attention via a series of coarsening and interpolation operations over multiple hierarchy levels. The advantage of GHA is two-fold. First, it has linear complexity with respect to the number of points, enabling the processing of large point clouds. Second, GHA inherently possesses the inductive bias to focus on spatially close points, while retaining the global connectivity among all points. Combined with a feedforward network, GHA can be inserted into many existing network architectures. We experiment with multiple baseline networks and show that adding GHA consistently improves performance across different tasks and datasets. For the task of semantic segmentation, GHA gives a +1.7% mIoU increase to the MinkowskiEngine baseline on ScanNet. For the 3D object detection task, GHA improves the CenterPoint baseline by +0.5% mAP on the nuScenes dataset, and the 3DETR baseline by +2.1% mAP$_{25}$ and +1.5% mAP$_{50}$ on ScanNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rethinking Attention Module Design for Point Cloud Analysis

RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation

AttenPoint: Exploring Point Cloud Segmentation Through Attention-Based Modules

References

Armeni, I., et al.: 3d semantic parsing of large-scale indoor spaces. In: CVPR (2016)
Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Caron, M.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)
Google Scholar
Chen, C., Chen, Z., Zhang, J., Tao, D.: SASA: semantics-augmented set abstraction for point-based 3D object detection. In: AAAI (2022)
Google Scholar
Cheng, B., Sheng, L., Shi, S., Yang, M., Xu, D.: Back-tracing representative points for voting-based 3D object detection in point clouds. In: CVPR (2021)
Google Scholar
Choromanski, K., et al.: Rethinking attention with performers. In: ICLR (2020)
Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4D Spatio-Temporal ConvNets: Minkowski convolutional neural networks. In: CVPR (2019)
Google Scholar
Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: ICCV (2019)
Google Scholar
Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection (2020). https://github.com/open-mmlab/mmdetection3d
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In: CVPR (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Fan, H., Yang, L., Kankanhalli, M.: Point 4D transformer networks for spatio-temporal modeling in point cloud videos. In: CVPR (2021)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. IJRR 32(11), 1231–1237 (2013)
Google Scholar
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: CVPR (2018)
Google Scholar
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R.R., Hu, S.-M.: PCT: point cloud transformer. Comput. Visual Media 7(2), 187–199 (2021). https://doi.org/10.1007/s41095-021-0229-5
Article Google Scholar
Jiang, L., Zhao, H., Liu, S., Shen, X., Fu, C.W., Jia, J.: Hierarchical point-edge interaction network for point cloud semantic segmentation. In: ICCV (2019)
Google Scholar
Kanezaki, A., Matsushita, Y., Nishida, Y.: RotationNet for joint object categorization and unsupervised pose estimation from multi-view images. PAMI 43 (2021)
Google Scholar
Landrieu, L., Boussaha, M.: Point cloud oversegmentation with graph-structured deep metric learning. In: CVPR (2019)
Google Scholar
Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: CVPR (2018)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: NeurIPS (2018)
Google Scholar
Li, G., et al.: Deepgcns: Making gcns go as deep as cnns. PAMI (2021)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Google Scholar
Liu, Z., Zhang, Z., Cao, Y., Hu, H., Tong, X.: Group-free 3D object detection via transformers. In: ICCV (2021)
Google Scholar
Mao, J., Wang, X., Li, H.: Interpolated convolutional networks for 3D point cloud understanding. In: ICCV (2019)
Google Scholar
Mao, J., et al.: Voxel transformer for 3D object detection. In: ICCV (2021)
Google Scholar
Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: ICCV (2021)
Google Scholar
Pan, X., Xia, Z., Song, S., Li, L.E., Huang, G.: 3D object detection with pointformer. In: CVPR (2021)
Google Scholar
Park, C., Jeong, Y., Cho, M., Park, J.: Fast point transformer. arXiv:2112.04702 (2021)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: ICCV (2019)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR (2018)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Google Scholar
Qian, X., et al.: MLCVNet: multi-level context VoteNet for 3D object detection. In: CVPR (2020)
Google Scholar
Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do Vision Transformers See Like Convolutional Neural Networks? arXiv:2108.08810 (2021)
Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR (2017)
Google Scholar
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR (2020)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)
Google Scholar
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. PAMI (2020)
Google Scholar
Su, H., et al.: SPLATNet: sparse lattice networks for point cloud processing. In: CVPR (2018)
Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)
Google Scholar
Tatarchenko, M., Park, J., Koltun, V., Zhou, Q.Y.: Tangent convolutions for dense prediction in 3D. In: CVPR (2018)
Google Scholar
Tay, Y., Dehghani, M., Bahri, D., Metzler, D.: Efficient transformers: a survey. arXiv:2009.06732 (2020)
Thomas, H., Qi, C., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.: KPConv: flexible and deformable convolution for point clouds. In: ICCV (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, L., Huang, Y., Hou, Y., Zhang, S., Shan, J.: Graph attention convolution for point cloud semantic segmentation. In: CVPR (2019)
Google Scholar
Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: self-attention with linear complexity. arXiv:2006.04768 (2020)
Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: ICCV (2019)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graphics 38(5) (2019). https://doi.org/10.1145/3326362
Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR (2019)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR (2015)
Google Scholar
Xie, Q., et al.: VENet: voting enhancement network for 3D object detection. In: ICCV (2021)
Google Scholar
Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Yu.: SpiderCNN: deep learning on point sets with parameterized convolutional filters. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 90–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_6
Chapter Google Scholar
Yan, Y.: SpConv: Spatially Sparse Convolution Library. https://github.com/traveller59/spconv. Accessed 04 Mar 2022
Yan, Y., Yuxing Mao, B.L.: SECOND: Sparsely Embedded Convolutional Detection. Sensors (2018)
Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR (2020)
Google Scholar
Ye, S., Chen, D., Han, S., Liao, J.: Learning with noisy labels for robust point cloud segmentation. In: ICCV (2021)
Google Scholar
Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graphics 35 (2016)
Google Scholar
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3D object detection and tracking. In: CVPR (2021)
Google Scholar
Zaheer, M., et al.: Big bird: transformers for longer sequences. In: NeurIPS (2020)
Google Scholar
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: CVPR (2017)
Google Scholar
Zhang, Z., Sun, B., Yang, H., Huang, Q.: H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_19
Chapter Google Scholar
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: CVPR (2020)
Google Scholar
Zhao, H., Jiang, L., Fu, C.W., Jia, J.: PointWeb: enhancing local neighborhood features for point cloud processing. In: CVPR (2019)
Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: ICCV (2021)
Google Scholar
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in LiDAR point clouds. In: CoRL (2019)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2017)
Google Scholar
Zhu, Z., Soricut, R.: H-Transformer-1D: fast one-dimensional hierarchical attention for sequences. In: ACL (2021)
Google Scholar

Download references

Acknowledgements

This project was funded by the BMBF project 6GEM (16KISK036K) and the ERC Consolidator Grant DeeVise (ERC-2017-COG-773161). We thank Jonas Schult, Markus Knoche, Ali Athar, and Christian Schmidt for helpful discussions.

Author information

Authors and Affiliations

Visual Computing Institute, RWTH Aachen University, Aachen, Germany
Dan Jia, Alexander Hermans & Bastian Leibe

Authors

Dan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Hermans
View author publications
You can also search for this author in PubMed Google Scholar
Bastian Leibe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Jia .

Editor information

Editors and Affiliations

TU Dresden, Dresden, Germany
Björn Andres
University of Bonn, Bonn, Germany
Florian Bernard
Technical University of Munich, Munich, Germany
Daniel Cremers
University of Hamburg, Hamburg, Germany
Simone Frintrop
University of Konstanz, Konstanz, Germany
Bastian Goldlücke
University of Siegen, Siegen, Germany
Ivo Ihrke

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2699 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, D., Hermans, A., Leibe, B. (2022). Global Hierarchical Attention for 3D Point Cloud Analysis. In: Andres, B., Bernard, F., Cremers, D., Frintrop, S., Goldlücke, B., Ihrke, I. (eds) Pattern Recognition. DAGM GCPR 2022. Lecture Notes in Computer Science, vol 13485. Springer, Cham. https://doi.org/10.1007/978-3-031-16788-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-16788-1_17
Published: 20 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16787-4
Online ISBN: 978-3-031-16788-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Global Hierarchical Attention for 3D Point Cloud Analysis