Skip to main content

ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13841))

Included in the following conference series:

  • 474 Accesses

Abstract

Direct detection of 3D objects from point clouds is a challenging task due to sparsity and irregularity of point clouds. To capture point features from the raw point clouds for 3D object detection, most previous researches utilize PointNet and its variants as the feature learning backbone and have seen encouraging results. However, these methods capture point features independently without modeling the interaction between points, and simple symmetric functions cannot adequately aggregate local contextual features, which are vital for 3D object recognition. To address such limitations, we propose ReAGFormer, a reaggregation Transformer backbone with affine group features for point feature learning in 3D object detection, which can capture the dependencies between points on the aligned group feature space while retaining the flexible receptive fields. The key idea of ReAGFormer is to alleviate the perturbation of the point feature space by affine transformation and extract the dependencies between points using self-attention, while reaggregating the local point set features with the learned attention. Moreover, we also design multi-scale connections in the feature propagation layer to reduce the geometric information loss caused by point sampling and interpolation. Experimental results show that by equipping our method as the backbone for existing 3D object detectors, significant improvements and state-of-the-art performance are achieved over original models on SUN RGB-D and ScanNet V2 benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)

    Google Scholar 

  2. Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)

    Google Scholar 

  3. Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: CVPR, pp. 808–816 (2016)

    Google Scholar 

  4. Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3D object detection from point clouds. In: CVPR, pp. 7652–7660 (2018)

    Google Scholar 

  5. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 652–660 (2017)

    Google Scholar 

  6. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. NeurIPS 30, 5099–5108 (2017)

    Google Scholar 

  7. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: ICCV, pp. 9277–9286 (2019)

    Google Scholar 

  8. Xie, Q., et al.: Mlcvnet: multi-level context votenet for 3D object detection. In: CVPR, pp. 10447–10456 (2020)

    Google Scholar 

  9. Zhang, Z., Sun, B., Yang, H., Huang, Q.: H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_19

    Chapter  Google Scholar 

  10. Cheng, B., Sheng, L., Shi, S., Yang, M., Xu, D.: Back-tracing representative points for voting-based 3D object detection in point clouds. In: CVPR, pp. 8963–8972 (2021)

    Google Scholar 

  11. Liu, Z., Zhang, Z., Cao, Y., Hu, H., Tong, X.: Group-free 3D object detection via transformers. In: ICCV (2021)

    Google Scholar 

  12. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)

    Google Scholar 

  13. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  14. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  15. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)

    Google Scholar 

  16. Liu, Y., et al.: A survey of visual transformers. arXiv preprint arXiv:2111.06091 (2021)

  17. Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: PCT: point cloud transformer. Comput. Vis. Media 7(2), 187–199 (2021)

    Article  Google Scholar 

  18. Zhao, H., Jiang, L., Jia, J., Torr, P., Koltun, V.: Point transformer. arXiv preprint arXiv:2012.09164 (2020)

  19. Pan, X., Xia, Z., Song, S., Li, L.E., Huang, G.: 3D object detection with pointformer. In: CVPR, pp. 7463–7472 (2021)

    Google Scholar 

  20. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR, pp. 5828–5839 (2017)

    Google Scholar 

  21. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR, pp. 567–576 (2015)

    Google Scholar 

  22. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: ICCV, pp. 945–953 (2015)

    Google Scholar 

  23. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)

    Google Scholar 

  24. Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3D representations at high resolutions. In: CVPR, pp. 3577–3586 (2017)

    Google Scholar 

  25. Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)

    Google Scholar 

  26. Jiang, L., Zhao, H., Liu, S., Shen, X., Fu, C.W., Jia, J.: Hierarchical point-edge interaction network for point cloud semantic segmentation. In: ICCV, pp. 10433–10441 (2019)

    Google Scholar 

  27. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on x-transformed points. NeurIPS 31, 820–830 (2018)

    Google Scholar 

  28. Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Yu.: SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 90–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_6

    Chapter  Google Scholar 

  29. Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: CVPR, pp. 2589–2597 (2018)

    Google Scholar 

  30. Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR, pp. 9621–9630 (2019)

    Google Scholar 

  31. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)

    Google Scholar 

  32. Xu, M., Ding, R., Zhao, H., Qi, X.: PaConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR, pp. 3173–3182 (2021)

    Google Scholar 

  33. Boulch, A., Puy, G., Marlet, R.: FKAConv: feature-kernel alignment for point cloud convolution. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12622, pp. 381–399. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69525-5_23

    Chapter  Google Scholar 

  34. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)

    Article  Google Scholar 

  35. Wang, L., Huang, Y., Hou, Y., Zhang, S., Shan, J.: Graph attention convolution for point cloud semantic segmentation. In: CVPR, pp. 10296–10305 (2019)

    Google Scholar 

  36. Xu, Q., Sun, X., Wu, C.Y., Wang, P., Neumann, U.: Grid-GCN for fast and scalable point cloud learning. In: CVPR, pp. 5661–5670 (2020)

    Google Scholar 

  37. Zhao, H., Jiang, L., Fu, C.W., Jia, J.: Pointweb: enhancing local neighborhood features for point cloud processing. In: CVPR, pp. 5565–5573 (2019)

    Google Scholar 

  38. Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39

    Chapter  Google Scholar 

  39. Hou, J., Dai, A., Nießner, M.: 3D-sis: 3D semantic instance segmentation of RGB-D scans. In: CVPR, pp. 4421–4430 (2019)

    Google Scholar 

  40. Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Article  Google Scholar 

  41. Yi, L., Zhao, W., Wang, H., Sung, M., Guibas, L.J.: GSPN: generative shape proposal network for 3D instance segmentation in point cloud. In: CVPR, pp. 3947–3956 (2019)

    Google Scholar 

  42. Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: CVPR, pp. 1525–1533 (2016)

    Google Scholar 

  43. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)

    Google Scholar 

  44. Shi, S., et al.: Pv-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR, pp. 10529–10538 (2020)

    Google Scholar 

  45. Chen, J., Lei, B., Song, Q., Ying, H., Chen, D.Z., Wu, J.: A hierarchical graph network for 3D object detection on point clouds. In: CVPR, pp. 392–401 (2020)

    Google Scholar 

  46. Duan, Y., Zhu, C., Lan, Y., Yi, R., Liu, X., Xu, K.: Disarm: displacement aware relation module for 3D detection. In: CVPR, pp. 16980–16989 (2022)

    Google Scholar 

  47. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  48. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

  49. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pp. 6881–6890 (2021)

    Google Scholar 

  50. Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. arXiv preprint arXiv:2105.05633 (2021)

  51. Lai, X., et al.: Stratified transformer for 3D point cloud segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8500–8509 (2022)

    Google Scholar 

  52. Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3523–3532 (2019)

    Google Scholar 

  53. Fan, H., Yang, Y., Kankanhalli, M.: Point 4D transformer networks for spatio-temporal modeling in point cloud videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14204–14213 (2021)

    Google Scholar 

  54. Qiu, S., Anwar, S., Barnes, N.: Pu-transformer: point cloud upsampling transformer. arXiv preprint arXiv:2111.12242 (2021)

  55. Xu, X., Geng, G., Cao, X., Li, K., Zhou, M.: TDnet: transformer-based network for point cloud denoising. Appl. Opt. 61(6), C80–C88 (2022)

    Article  Google Scholar 

  56. Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: Pointr: diverse point cloud completion with geometry-aware transformers. In: ICCV, pp. 12498–12507 (2021)

    Google Scholar 

  57. Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: ICCV (2021)

    Google Scholar 

  58. Ma, X., Qin, C., You, H., Ran, H., Fu, Y.: Rethinking network design and local geometry in point cloud: a simple residual MLP framework. In: ICLR (2021)

    Google Scholar 

  59. Peng, Z., et al.: Conformer: local features coupling global representations for visual recognition. arXiv preprint arXiv:2105.03889 (2021)

  60. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)

    Google Scholar 

  61. Chen, B., Liu, Y., Zhang, Z., Lu, G., Zhang, D.: Transattunet: multi-level attention-guided u-net with transformer for medical image segmentation. arXiv preprint arXiv:2107.05274 (2021)

  62. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  63. Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d (2020)

  64. Du, H., Li, L., Liu, B., Vasconcelos, N.: SPOT: selective point cloud voting for better proposal in point cloud object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 230–247. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_14

    Chapter  Google Scholar 

  65. You, Y., et al.: Canonical voting: towards robust oriented bounding box detection in 3D scenes. In: CVPR, pp. 1193–1202 (2022)

    Google Scholar 

  66. Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R., Jagersand, M.: U2-net: going deeper with nested U-structure for salient object detection. Pattern Recogn. 106, 107404 (2020)

    Article  Google Scholar 

  67. Cai, Y., Wang, Y.: Ma-unet: an improved version of unet based on multi-scale and attention mechanism for medical image segmentation. arXiv preprint arXiv:2012.10952 (2020)

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province (No. 2019B010139004) and the National Natural Science Foundation Youth Fund No. 62007001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 22946 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, C., Yue, K., Liu, Y. (2023). ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26319-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26318-7

  • Online ISBN: 978-3-031-26319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics