Point-Voxel Based Geometry-Adaptive Network for 3D Point Cloud Analysis

Zhao, Tian-Meng; Zeng, Hui; Zhang, Bao-Qing; Liu, Hong-Min; Fan, Bin

doi:10.1007/s11390-024-3521-x

Point-Voxel Based Geometry-Adaptive Network for 3D Point Cloud Analysis

Regular Paper
Published: 05 December 2024

Volume 39, pages 1167–1179, (2024)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Tian-Meng Zhao (赵天孟)¹,
Hui Zeng (曾慧)^1,2,
Bao-Qing Zhang (张保庆)³,
Hong-Min Liu (刘红敏)^4,5 &
…
Bin Fan (樊彬)^4,5

107 Accesses
1 Altmetric
Explore all metrics

Abstract

Point cloud analysis is challenging because of the unordered and irregular data structure of point clouds. To describe geometric information in point clouds, existing methods mainly use convolution, graph, and attention operations to construct sophisticated local aggregation operators. These operators work well in extracting local information but bring unfavorable inference latency due to high computation complexity. To solve the above problem, this paper presents a novel point-voxel based geometry-adaptive network (PVGANet), which combines multiple representations of point and voxel to describe the point cloud from different granularities and can obtain features of different scales effectively. To extract fine-grained geometric features, we design the position-adaptive pooling operator, which uses point pairs’ relative position and feature similarity to weight and aggregate the point features at local areas of point clouds. To extract coarse-grained local features, we design a depth-wise convolution operator, which conducts the depth-wise convolution on voxel grids. With an easy addition, fine-grained geometric and coarse-grained local features can be fused, and we can use the geometry-adaptive fused features to complete the efficient shape analysis of point clouds, such as shape classification and part segmentation. Extensive experiments on ModelNet40, ScanObjectNN, and ShapeNet Part benchmarks demonstrate that our PVGANet achieves competitive performance compared with the related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Wang S, Jiang Y, Hu J, Fan X, Luo Z, Liu Y, Liu L. Efficient representation and optimization of TPMS-based porous structures for 3D heat dissipation. Computer-Aided Design, 2022, 142: 103123. DOI: https://doi.org/10.1016/j.cad.2021.103123.
Article MathSciNet Google Scholar
Qi C R, Su H, Mo K C, Guibas L J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.77–85. DOI: https://doi.org/10.1109/CVPR.2017.16.
Google Scholar
Su H, Maji S, Kalogerakis E, Learned-Miller E. Multi-view convolutional neural networks for 3D shape recognition. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.945–953. DOI: https://doi.org/10.1109/ICCV.2015.114.
Google Scholar
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3D ShapeNets: A deep representation for volumetric shapes. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.1912–1920. DOI: https://doi.org/10.1109/CVPR.2015.7298801.
Google Scholar
Maturana D, Scherer S. VoxNet: A 3D convolutional neural network for real-time object recognition. In Proc. the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept. 28–Oct. 2, 2015, pp.922–928. DOI: https://doi.org/10.1109/IROS.2015.7353481.
Google Scholar
Çiçek Ö, Abdulkadir A, Lienkamp S S, Brox T, Ronneberger O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proc. the 19th International Conference on Medical Image Computing and Computer-Assisted Intervention, Oct. 2016, pp.424–432. DOI: https://doi.org/10.1007/978-3-319-46723-8_49.
Google Scholar
Goyal A, Law H, Liu B, Newell A, Deng J. Revisiting point cloud shape classification with a simple and effective baseline. In Proc. the 38th International Conference on Machine Learning, Jul. 2021, pp.3809–3820.
Google Scholar
Hamdi A, Giancola S, Ghanem B. MVTN: Multi-view transformation network for 3D shape recognition. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.1–11. DOI: https://doi.org/10.1109/ICCV48922.2021.00007.
Google Scholar
Choy C, Gwak J, Savarese S. 4D spatio-temporal convnets: Minkowski convolutional neural networks. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3070–3079. DOI: https://doi.org/10.1109/CVPR.2019.00319.
Google Scholar
Tang H, Liu Z, Zhao S, Lin Y, Lin J, Wang H, Han S. Searching efficient 3D architectures with sparse point-voxel convolution. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.685–702. DOI: https://doi.org/10.1007/978-3-030-58604-1_41.
Google Scholar
Qi C R, Yi L, Su H, Guibas L J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.5105–5114.
Google Scholar
Liu Y, Fan B, Xiang S, Pan C. Relation-shape convolutional neural network for point cloud analysis. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.8887–8896. DOI: https://doi.org/10.1109/CVPR.2019.00910.
Google Scholar
Qian G, Hammoud H A A K, Li G, Thabet A, Ghanem B. ASSANet: An anisotropic separable set abstraction for efficient point cloud representation learning. In Proc. the 35th International Conference on Neural Information Processing Systems, Jun. 2021, Article No. 2154, pp.28119–28130.
Google Scholar
Ma X, Qin C, You H, Ran H, Fu Y. Rethinking network design and local geometry in point cloud: A simple residual MLP framework. In Proc. the 10th International Conference on Learning Representations, Apr. 2022.
Google Scholar
Qian G, Li Y, Peng H, Mai J, Hammoud H A A K, Elhoseiny M, Ghanem B. PointNeXt: Revisiting PointNet++ with improved training and scaling strategies. In Proc. the 36th Conference on Neural Information Processing Systems, Nov. 28–Dec. 9, 2022.
Google Scholar
Xu Y, Fan T, Xu M, Zeng L, Qiao Y. SpiderCNN: Deep learning on point sets with parameterized convolutional filters. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.90–105. DOI: https://doi.org/10.1007/978-3030-01237-3_6.
Google Scholar
Li Y, Bu R, Sun M, Wu W, Di X, Chen B. PointCNN: Convolution on X-transformed points. In Proc. the 32nd Conference on Neural Information Processing Systems, Dec. 2018, pp.820–830.
Google Scholar
Qiu S, Anwar S, Barnes N. Dense-resolution network for point cloud classification and segmentation. In Proc. the 2021 IEEE Winter Conference on Applications of Computer Vision, Jan. 2021, pp.3812–3821. DOI: https://doi.org/10.1109/WACV48630.2021.00386.
Chapter Google Scholar
Cheng S, Chen X, He X, Liu Z, Bai X. PRA-Net: Point relation-aware network for 3D point cloud analysis. IEEE Trans. Image Processing, 2021, 30: 4436–4448. DOI: https://doi.org/10.1109/TIP.2021.3072214.
Article Google Scholar
Wang Y, Sun Y, Liu Z, Sarma S E, Bronstein M M, Solomon J M. Dynamic graph CNN for learning on point clouds. ACM Trans. Graphics, 2019, 38 (5): Article No. 146. DOI: https://doi.org/10.1145/3326362.
Li G, Müller M, Thabet A, Ghanem B. DeepGCNs: Can GCNs go as deep as CNNs? In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.9266–9275. DOI: https://doi.org/10.1109/ICCV.2019.00936.
Google Scholar
Liu X, Han Z, Liu Y S, Zwicker M. Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jun. 2019, pp.8778–8785. DOI: https://doi.org/10.1609/aaai.v33i01.33018778.
Google Scholar
Yan X, Zheng C, Li Z, Wang S, Cui S. PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proc. the 2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.5588–5597. DOI: https://doi.org/10.1109/CVPR42600.2020.00563.
Google Scholar
Guo M H, Cai J X, Liu Z N, Mu T J, Martin R R, Hu S M. PCT: Point cloud transformer. Computational Visual Media, 2021, 7(2): 187–199. DOI: https://doi.org/10.1007/s41095-021-0229-5.
Article Google Scholar
Hu Q, Yang B, Xie L, Rosa S, Guo Y, Wang Z, Trigoni N, Markham A. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.11105–11114. DOI: https://doi.org/10.1109/CVPR42600.2020.01112.
Google Scholar
Qiu S, Anwar S, Barnes N. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.1757–1767. DOI: https://doi.org/10.1109/CVPR46437.2021.00180.
Google Scholar
Nie D, Lan R, Wang L, Ren X. Pyramid architecture for multi-scale processing in point cloud segmentation. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.17263–17273. DOI: https://doi.org/10.1109/CVPR52688.2022.01677.
Google Scholar
Zhao H, Jiang L, Jia J, Torr P H S, Koltun V. Point transformer. In Proc. the IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.16259–16268.
Google Scholar
Qiu S, Anwar S, Barnes N. Geometric back-projection network for point cloud classification. IEEE Trans. Multimedia, 2021, 24: 1943–1955. DOI: https://doi.org/10.1109/TMM.2021.3074240.
Article Google Scholar
Lai X, Liu J, Jiang L, Wang L, Zhao H, Liu S, Qi X, Jia J. Stratified transformer for 3D point cloud segmentation. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.8490–8499. DOI: https://doi.org/10.1109/CVPR52688.2022.00831.
Google Scholar
Song Y, He F, Duan Y, Liang Y, Yan X. A kernel correlation-based approach to adaptively acquire local features for learning 3D point clouds. Computer-Aided Design, 2022, 146: 103196. DOI: https://doi.org/10.1016/j.cad.2022.103196.
Article MathSciNet Google Scholar
Wang S, Liu Y, Wang L, Sun Y, Yin B. PASIFTNet: Scale-and-directional-aware semantic segmentation of point clouds. Computer-Aided Design, 2023, 156: 103462. DOI: https://doi.org/10.1016/j.cad.2022.103462.
Article Google Scholar
You H, Feng Y, Ji R, Gao Y. PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition. In Proc. the 26th ACM International Conference on Multimedia, Oct. 2018, pp.1310–1318. DOI: https://doi.org/10.1145/3240508.3240702.
Chapter Google Scholar
You H, Feng Y, Zhao X, Zou C, Ji R, Gao Y. PVRNet: Point-view relation neural network for 3D shape recognition. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jul. 2019, pp.9119–9126. DOI: https://doi.org/10.1609/aaai.v33i01.33019119.
Google Scholar
Le T, Duan Y. PointGrid: A deep network for 3D shape understanding. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.9204–9214. DOI: https://doi.org/10.1109/CVPR.2018.00959.
Chapter Google Scholar
Liu Z, Tang H, Lin Y, Han S. Point-voxel CNN for efficient 3D deep learning. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, p.87.
Google Scholar
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H. PV-RCNN: Point-voxel feature set abstraction for 3D object detection. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.10526–10535. DOI: https://doi.org/10.1109/CVPR42600.2020.01054.
Google Scholar
Noh J, Lee S, Ham B. HVPR: Hybrid voxel-point representation for single-stage 3D object detection. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.14600–14609. DOI: https://doi.org/10.1109/CVPR46437.2021.01437.
Google Scholar
Xu J, Zhang R, Dou J, Zhu Y, Sun J, Pu S. RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.16004–16013. DOI: https://doi.org/10.1109/ICCV48922.2021.01572.
Google Scholar
Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017. https://arxiv.org/abs/1704.04861, Sept. 2024.
Google Scholar
Liu Z, Hu H, Cao Y, Zhang Z, Tong X. A closer look at local aggregation operators in point cloud analysis. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.326–342. DOI: https://doi.org/10.1007/978-3-030-58592-1_20.
Google Scholar
Uy M A, Pham Q H, Hua B S, Nguyen T, Yeung S K. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.1588–1597. DOI: https://doi.org/10.1109/ICCV.2019.00167.
Google Scholar
Yi L, Kim V G, Ceylan D, Shen I C, Yan M, Su H, Lu C, Huang Q, Sheffer A, Guibas L. A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graphics, 2016, 35 (6): Article No. 210. DOI: https://doi.org/10.1145/2980179.2980238.

Download references

Author information

Authors and Affiliations

Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrial Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Tian-Meng Zhao (赵天孟) & Hui Zeng (曾慧)
Shunde Innovation School, University of Science and Technology Beijing, Foshan, 528399, China
Hui Zeng (曾慧)
Beijing Institute of Electronic System Engineering, Beijing, 100854, China
Bao-Qing Zhang (张保庆)
School of Intelligence Science and Technology, University of Science and Technology Beijing, Beijing, 100083, China
Hong-Min Liu (刘红敏) & Bin Fan (樊彬)
Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing, 100083, China
Hong-Min Liu (刘红敏) & Bin Fan (樊彬)

Authors

Tian-Meng Zhao (赵天孟)
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zeng (曾慧)
View author publications
You can also search for this author in PubMed Google Scholar
Bao-Qing Zhang (张保庆)
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Min Liu (刘红敏)
View author publications
You can also search for this author in PubMed Google Scholar
Bin Fan (樊彬)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zeng (曾慧).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

Recommended by ChinaMM 2023

The work was supported by the National Natural Science Foundation of China under Grant Nos. 62273034, 61973029, and 62076026, and the Scientific and Technological Innovation Foundation of Foshan under Grant No. BK21BF004.

Tian-Meng Zhao received his B.S. degree from University of Science and Technology Beijing, Beijing, in 2020. Now, he is studying for his master’s degree at the same university. His main research interests include computer vision and point cloud processing.

Hui Zeng received her B.S. and M.S. degrees from Shandong University, Jinan, in 2001 and 2004, respectively, and received her Ph.D. degree from Institute of Automation, Chinese Academy of Sciences, Beijing, in 2007. She is currently a professor at the Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing. Her main research interests include computer vision, pattern recognition, and machine learning.

Bao-Qing Zhang received his Ph.D. degree from University of Science and Technology Beijing, Beijing, in 2014. He is currently a senior engineer at Beijing Institute of Electronic System Engineering, Beijing. His main research interests include information processing.

Hong-Min Liu received her B.S. degree from Xidian University, Xi’an, in 2004, and her Ph.D. degree from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2009. She is currently a professor with the School of Intelligence Science and Technology and the Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing. Her research focuses on image processing, computer vision, and pattern recognition.

Bin Fan received his B.S. degree from Beijing University of Chemical Technology, Beijing, in 2006, and his Ph.D. degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, in 2011. He is currently a professor with School of Intelligence Science and Technology and the Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing. His research focuses on computer vision, pattern recognition, image processing, and multimedia.

Electronic supplementary material