research-article

U-shaped network based on Transformer for 3D point clouds semantic segmentation

Authors:
Jiazhe Zhang

College of Intelligence Science and Technology, National University of Defense Technology, China

College of Intelligence Science and Technology, National University of Defense Technology, China
View Profile

,
Xingwei Li

College of Intelligence Science and Technology, National University of Defense Technology, China

College of Intelligence Science and Technology, National University of Defense Technology, China
View Profile

,
Xianfa Zhao

College of Intelligence Science and Technology, National University of Defense Technology, China

College of Intelligence Science and Technology, National University of Defense Technology, China
View Profile

,
Yizhi Ge

College of Intelligence Science and Technology, National University of Defense Technology, China

College of Intelligence Science and Technology, National University of Defense Technology, China
View Profile

,
Zheng Zhang

College of Intelligence Science and Technology, National University of Defense Technology, China

College of Intelligence Science and Technology, National University of Defense Technology, China
View Profile

ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image ProcessingDecember 2021Pages 170–176https://doi.org/10.1145/3511176.3511209

Published:12 March 2022Publication History

ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image Processing

Pages 170–176

ABSTRACT

3D point clouds processing is a significant technical direction of autonomous driving, computer vision, and 3D mapping. However, due to the disorder and irregularity of 3D point clouds, it brings some challenges to its development. In recent years, Transformer, as an important technology in natural language processing, has been successfully applied in 2D image processing and achieved excellent results. Recently, relevant research on the application of Transformer on 3D point clouds has also been published. In this paper, we refer to the self-attention mechanism in the transformer architecture and propose a U-shaped network based on Transformer for 3D point clouds segmentation. And we do semantic segmentation experiments on the Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS). Experiments show that the performance of our proposed network is better than some semantic segmentation algorithms in common evaluation metrics.

References

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2021. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans Pattern Anal Mach Intell (2021), 4338-4364. https://doi.org/10.1109/TPAMI.2020.3005434Google ScholarDigital Library
Lawin, F. J., Danelljan, M., Tosteberg, P., Bhat, G., Khan, F. S., and Felsberg, M. 2017. Deep projective 3D semantic segmentation. In International Conference on Computer Analysis of Images and Patterns. Springer, Cham, 95-107 https://doi.org/10.1007/978-3-319-64689-3_8Google Scholar
Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, and Silvio Savarese. 2017. SEGCloud: Semantic Segmentation of 3D Point Clouds. In 2017 international conference on 3D vision (3DV). IEEE, 537-547.https://doi.org/10.1109/3DV.2017.00067Google ScholarCross Ref
Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9224-9232.https://doi.org/10.1109/CVPR.2018.00961Google Scholar
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2016. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.652-660. https://doi.org/10.1109/CVPR.2017.16Google Scholar
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv preprint arXiv:1706.02413. http://arxiv.org.nudtproxy.yitlink.com:80/abs/1706.02413Google Scholar
Noam Shazeer Niki Parmar Ashish Vaswani and Polosukhin. 2017. Attention Is All You Need. In Advances in neural information processing systems.5998-6008.http://arxiv.org.nudtproxy.yitlink.com:80/abs/1706.03762Google Scholar
Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: Convolution On X-Transformed Points. Advances in neural information processing systems, 31: 820-830Google Scholar
Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. PointConv: Deep Convolutional Networks on 3D Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.9621-9630.https://doi.org/10.1109/CVPR.2019.00985Google ScholarCross Ref
Xiaoqing Ye, Jiamao Li, Hexiao Huang, Liang Du, and Xiaolin Zhang. 2018. 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In Proceedings of the European Conference on Computer Vision.403-41.https://doi.org/10.1007/978-3-030-01234-2_25Google ScholarDigital Library
Lei Wang, Yuchun Huang, Yaolin Hou, Shenman Zhang, and Jie Shan. 2019. Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition .10296-10305.https://doi.org/10.1109/CVPR.2019.01054Google ScholarCross Ref
Yanni Ma, Yulan Guo, Hao Liu, Yinjie Lei, and Gongjian Wen. 2020. Global Context Reasoning for Semantic Segmentation of 3D Point Clouds. IEEE,2920-2929. https://doi.org/10.1109/WACV45572.2020.9093411Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.https://arxiv.org/abs/2010.11929Google Scholar
Nico Engel, Vasileios Belagiannis, and Klaus Dietmayer. 2021. Point Transformer. IEEE Access (2021), 134826-134840. https://doi.org/10.1109/ACCESS.2021.3116304Google Scholar
Menghao Guo, Junxiong Cai, Zhengning Liu, Taijiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer. Computational Visual Media (2021), 187-199. https://doi.org/10.1007/s41095-021-0229-5Google Scholar
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. 2021. Point Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision .16259-16268. https://doi.org/10.1109/ACCESS.2021.3116304Google Scholar
Qiangui Huang, Weiyue Wang, and Ulrich Neumann. 2018. Recurrent Slice Networks for 3D Segmentation of Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2626-2635.https://doi.org/10.1109/CVPR.2018.00278Google ScholarCross Ref
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, and Qian-Yi Zhou. 2018. Tangent Convolutions for Dense Prediction in 3D. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3887-3896Google ScholarCross Ref
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point Clouds. ACM T. Graphic. (2019), 1-12. https://doi.org/10.1145/3326362Google ScholarDigital Library
Francis Engelmann, Theodora Kontogianni, Jonas Schult, and Bastian Leibe. 2019. Know What Your Neighbors Do: 3D Semantic Segmentation of Point Clouds. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-030-11015-4_29Google ScholarDigital Library
Tianyang Lin, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. 2021. A Survey of Transformers. arXiv preprint arXiv:2106.04554. https://arxiv.org/abs/2106.04554Google Scholar

Recommendations

Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds
Computer Vision – ECCV 2020
Abstract
We develop a novel learning scheme named Self-Prediction for 3D instance and semantic segmentation of point clouds. Distinct from most existing methods that focus on designing convolutional operators, our method designs a new learning scheme to ...
Read More
Multi-view Network with Transformer for Point Cloud Semantic Segmentation
ICIAI '22: Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence

The input of most point cloud semantic segmentation networks is the reconstructed complete point cloud, but in practical application scenarios, the vision devices often capture single frame point cloud data. In order to better adapt to the actual ...
Read More
JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds
Computer Vision – ECCV 2020
Abstract
Semantic segmentation and semantic edge detection can be seen as two dual problems with close relationships in computer vision. Despite the fast evolution of learning-based 3D semantic segmentation methods, little attention has been drawn to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image Processing
December 2021
219 pages
ISBN:9781450385893
DOI:10.1145/3511176

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 March 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D point clouds
S3DIS
Transformer
semantic segmentation
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 50
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

U-shaped network based on Transformer for 3D point clouds semantic segmentation

ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image Processing

ABSTRACT

References

Cited By

Recommendations

Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds

Multi-view Network with Transformer for Point Cloud Semantic Segmentation

JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

U-shaped network based on Transformer for 3D point clouds semantic segmentation

ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image Processing

ABSTRACT

References

Cited By

Recommendations

Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds

Multi-view Network with Transformer for Point Cloud Semantic Segmentation

JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media