Journals & Magazines >IEEE Transactions on Multimedia >Volume: 26

Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. Howev...Show More

Metadata

Abstract:

Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. However, extending masking to 3D point cloud understanding that combines local and global features poses a new challenge. In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works. Specifically, to avoid fitting to a fixed masking ratio, we first propose multi-ratio masking, which prompts the encoder to fully explore representative features thanks to tasks of different difficulties. Next, to encourage the embedding of both local and global features, we formulate a compound loss, which consists of (i) a global representation contrastive loss to encourage the cluster assignments of the masked point clouds to be consistent to that of the completed input, and (ii) a local point cloud prediction loss to encourage accurate prediction of masked points. Equipped with our Point-LGMask, we show that our learned representations transfer well to various downstream tasks, including few-shot classification, shape classification, object part segmentation, as well as real-world scene-based 3D object detection and 3D semantic segmentation. Particularly, our model largely advances existing pre-training methods on the difficult few-shot classification task using the real-captured ScanObjectNN dataset by surpassing over 4% to the second-best method. Also, our Point-LGMask achieves 0.4%

$AP_{25}$ and 0.8%

$AP_{50}$ gains on 3D object detection task over the second-best method. For semantic segmentation, our Point-LGMask surpasses the second-best method by 0.4% mAcc and 0.5% mIoU.

Published in: IEEE Transactions on Multimedia ( Volume: 26)

Page(s): 8360 - 8370

Date of Publication: 08 June 2023

ISSN Information:

DOI: 10.1109/TMM.2023.3282568

Funding Agency:

Contents

References is not available for this document.

Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?