Skip to main content
Log in

RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Designing efficient deep learning models for 3D point clouds is an important research topic. Point-voxel convolution (Liu et al. in NeurIPS, 2019) is a pioneering approach in this direction, but it still has considerable room for improvement in terms of performance, since it has quite a few layers of simple 3D convolutions and linear point-voxel feature fusion operations. To resolve these issues, we propose a novel reparameterizable point-voxel convolution (RepPVConv) block. First, RepPVConv adopts two reparameterizable 3D convolution modules to extract more informative voxel features without introducing any extra computational overhead for inference. The rationale is that the reparameterizable 3D convolution modules are trained in high-capacity modes but are reparameterized into low-capacity modes during inference while losslessly maintaining the original performance. Second, RepPVConv attentively fuses the reparameterized voxel features with those of points. Since the proposed approach operates in a nonlinear manner, descriptive reparameterized voxel features can be better utilized. Extensive experimental results show that RepPVConv-based networks are efficient in terms of both GPU memory consumption and computational complexity and significantly outperform the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Armeni, I., Sener, O., Zamir, A.R. et al.: 3d semantic parsing of large-scale indoor spaces. In: CVPR, pp. 1534–1543 (2016)

  2. Armeni, I., Sax, S., Zamir, A.R., et al.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  3. Bronstein, M.M., Bruna, J., LeCun, Y., et al.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)

    Article  Google Scholar 

  4. Chang, A.X., Funkhouser, T., Guibas, L., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  5. Chen, L., Zhang, Q.: Ddgcn: graph convolution network based on direction and distance for point cloud learning. Vis. Comput. 1–11 (2022) https://doi.org/10.1007/s00371-021-02351-8

  6. Chen, Y., Peng, W., Tang, K., et al.: Pyrapvconv: efficient 3d point cloud perception with pyramid voxel convolution and sharable attention. Comput. Intell. Neurosci. 2022:1–9 https://doi.org/10.1155/2022/2286818

  7. Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: CVPR, pp. 3075–3084 (2019)

  8. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., et al.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: MICCAI, pp. 424–432 (2016)

  9. Ding, X., Zhang, X., Ma, N., et al.: Repvgg: making vgg-style convnets great again. In: CVPR (2021)

  10. Engelcke, M., Rao, D., Wang, D.Z., et al.: Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: ICRA, pp. 1355–1361. IEEE (2017)

  11. Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: the kitti dataset. IJRR 32(11), 1231–1237 (2013)

    Google Scholar 

  12. Graham, B., Engelcke, M., Van Der Maaten, L.: 3d semantic segmentation with submanifold sparse convolutional networks. In: CVPR, pp. 9224–9232 (2018)

  13. Guo, Y., Bennamoun, M., Sohel, F., et al.: 3d object recognition in cluttered scenes with local surface features: a survey. IEEE TPAMI 36(11), 2270–2287 (2014)

    Article  Google Scholar 

  14. Guo, Y., Wang, H., Hu, Q., et al.: Deep learning for 3d point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)

    Article  Google Scholar 

  15. Hu, Q., Yang, B., Xie, L., et al.: Randla-net: efficient semantic segmentation of large-scale point clouds. In: CVPR, pp. 11,108–11,117 (2020)

  16. Ioannidou, A., Chatzilari, E., Nikolopoulos, S., et al.: Deep learning advances in computer vision with 3d data: a survey. ACM Comput. Surv. (CSUR) 50(2), 1–38 (2017)

    Article  Google Scholar 

  17. Kingma, D.P., Welling, M., et al.: An introduction to variational autoencoders. Found. Trends® Mach. Learn. 12(4), 307–392 (2019)

    Article  MATH  Google Scholar 

  18. Li, B.: 3d fully convolutional network for vehicle detection in point cloud. In: IROS, pp. 1513–1518 (2017)

  19. Li, Y., Bu, R., Sun, M., et al.: Pointcnn: convolution on x-transformed points. In: NeurIPS, pp. 820–830 (2018)

  20. Lin, N., Li, Y., Tang, K., et al.: Manipulation planning from demonstration via goal-conditioned prior action primitive decomposition and alignment. IEEE Robot. Autom. Lett. 7(2), 1387–1394 (2022)

    Article  Google Scholar 

  21. Liu, Z., Tang, H., Lin, Y., et al.: Point-voxel CNN for efficient 3d deep learning. In: NeurIPS (2019)

  22. Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)

  23. Noh, J., Lee, S., Ham, B.: Hvpr: hybrid voxel-point representation for single-stage 3d object detection. In: CVPR, pp. 14,605–14,614 (2021)

  24. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8026–8037 (2019)

  25. Qi, C.R., Su, H., Mo, K., et al.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: CVPR, pp. 652–660 (2017a)

  26. Qi, C.R., Yi, L., Su, H., et al.: Pointnet++ deep hierarchical feature learning on point sets in a metric space. In: NeurIPS, pp. 5105–5114 (2017b)

  27. Qi, C.R., Liu, W., Wu, C., et al. Frustum pointnets for 3d object detection from rgb-d data. In: CVPR, pp. 918–927 (2018)

  28. Que, Z., Lu, G., Xu, D.: Voxelcontext-net: an octree based framework for point cloud compression. In: CVPR, pp. 6042–6051 (2021)

  29. Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3d representations at high resolutions. In: CVPR, pp. 3577–3586 (2017)

  30. Shi, S., Guo, C., Jiang, L., et al.: PV-RCNN: point-voxel feature set abstraction for 3d object detection. In: CVPR, pp. 10,529–10,538 (2020)

  31. Shi, S., Jiang, L., Deng, J., et al.: PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3d object detection. arXiv preprint arXiv:2102.00463 (2021)

  32. Shi, W., Rajkumar, R.: Point-GNN: graph neural network for 3d object detection in a point cloud. In: CVPR, pp. 1711–1719 (2020)

  33. Song, S., Xiao, J.: Sliding shapes for 3d object detection in depth images. In: ECCV, pp. 634–651 (2014)

  34. Song, S., Xiao, J.: Deep sliding shapes for amodal 3d object detection in rgb-d images. In: CVPR, pp. 808–816 (2016)

  35. Su, H., Maji, S., Kalogerakis, E., et al.: Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, pp. 945–953 (2015)

  36. Sun, Y., Miao, Y., Chen, J., et al.: Pgcnet: patch graph convolutional network for point cloud segmentation of indoor scenes. Vis. Comput. 36(10), 2407–2418 (2020)

    Article  Google Scholar 

  37. Tang, H., Liu, Z., Zhao, S., et al.: Searching efficient 3d architectures with sparse point-voxel convolution. In: ECCV, pp. 685–702 (2020)

  38. Tang, K., Ma, Y., Miao, D., et al.: Decision fusion networks for image classification. IEEE Trans. Neural Netw. Learn. Syst. (2022) https://doi.org/10.1109/TNNLS.2022.3196129

  39. Thomas, H., Qi, C.R., Deschaud, J.E., et al.: Kpconv: flexible and deformable convolution for point clouds. In: ICCV, pp. 6411–6420 (2019)

  40. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NeurIPS (2017)

  41. Veit, A., Wilber, M., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: NeurIPS, pp. 550–558 (2016)

  42. Wang, P.S., Liu, Y., Guo, Y.X., et al.: O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM TOG 36(4), 1–11 (2017)

    Google Scholar 

  43. Wang, P.S., Sun, C.Y., Liu, Y., et al.: Adaptive o-cnn: a patch-based deep representation of 3d shapes. ACM TOG 37(6), 1–11 (2018)

    Google Scholar 

  44. Wang, Y., Sun, Y., Liu, Z., et al.: Dynamic graph cnn for learning on point clouds. ACM TOG (SIGGRAPH) 38(5), 1–12 (2019)

    Article  Google Scholar 

  45. Wei, Y., Wang, Z., Rao, Y., et al.: Pv-raft: point-voxel correlation fields for scene flow estimation of point clouds. In: CVPR, pp. 6954–6963 (2021)

  46. Wu, W., Qi, Z., Fuxin, L.: Pointconv: deep convolutional networks on 3d point clouds. In: CVPR, pp. 9621–9630 (2019)

  47. Wu, Z., Song, S., Khosla, A., et al.: 3d shapenets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)

  48. Xu, M., Ding, R., Zhao, H., et al.: Paconv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR (2021)

  49. Zagoruyko, S., Komodakis, N.: Diracnets: Training very deep neural networks without skip-connections. arXiv preprint arXiv:1706.00388 (2017)

  50. Zhang, F., Fang, J., Wah, B.W., et al.: Deep fusionnet for point cloud semantic segmentation. In: ECCV, pp. 644–663 (2020)

  51. Zhao, H., Jiang, L., Fu, C.W., et al.: Pointweb: enhancing local neighborhood features for point cloud processing. In: CVPR, pp. 5565–5573 (2019)

  52. Zhao, H., Jiang, L., Jia, J., et al.: Point transformer. In: ICCV, pp. 16,259–16,268 (2021)

  53. Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: CVPR, pp. 4490–4499 (2018)

Download references

Acknowledgements

We thank the reviewers for the valuable comments. This work was supported in part by the National Natural Science Foundation of China (62102105, 62072126), Guangdong Basic and Applied Basic Research Foundation (2020A1515110997, 2022A1515011501, and 2022A1515010138), the Science and Technology Program of Guangzhou (202002030263, 202102010419 and 202201020229), and the Open Project Program of the State Key Lab of CAD and CG (A2218), Zhejiang University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weilong Peng or Meie Fang.

Ethics declarations

Conflict of interest

We declare that we have no competing financial interests or personal relationships that could have appeared to influence our work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, K., Chen, Y., Peng, W. et al. RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception. Vis Comput 39, 5577–5588 (2023). https://doi.org/10.1007/s00371-022-02682-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02682-0

Keywords

Navigation