research-article

Graph Spectral Perturbation for 3D Point Cloud Contrastive Learning

Authors:

Jin XieAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5389 - 5398

https://doi.org/10.1145/3581783.3612469

Published: 27 October 2023 Publication History

Abstract

3D point cloud contrastive learning has attracted increasing attention due to its efficient learning ability. By distinguishing the similarity relationship between positive and negative samples in the feature space, it can learn effective point cloud feature representations without manual annotation. However, most point cloud contrastive learning methods construct contrastive samples by perturbing point clouds in data space or introducing multi-modality/format data, which may be difficult to control the intensity of the perturbation or introduce interference from different modalities/formats. To this end, in this paper, we propose a novel graph spectral perturbation based contrastive learning framework (GSPCon) for efficient and robust self-supervised 3D point cloud representation learning. It aims to perform perturbations in the graph spectral domain to construct contrastive samples of the point cloud. Specifically, we first naturally represent the point cloud as a k-nearest neighbors (KNN) graph, and adaptively transform the coordinates of the points into the graph spectral domain based on the graph Fourier transform (GFT). Then we implement data augmentation in the graph spectral domain by perturbing the spectral representations. Finally, the contrastive samples are generated by employing the inverse graph Fourier transform (IGFT) to transform the augmented spectral representations back to the point clouds. Experimental results show that our method achieves the state-of-the-art performance on various downstream tasks. Source code is available at https://github.com/yh-han/GSPCon.git.

References

[1]

Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. 2018. Learning representations and generative models for 3D point clouds. In International conference on machine learning. PMLR, 40--49.

[2]

Mohamed Afham, Isuru Dissanayake, Dinithi Dissanayake, Amaya Dharmasiri, Kanchana Thilakarathna, and Ranga Rodrigo. 2022. Crosspoint: Self-supervised cross-modal contrastive learning for 3D point cloud understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9902--9912.

[3]

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9297--9307.

[4]

Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).

[5]

Siheng Chen, Dong Tian, Chen Feng, Anthony Vetro, and Jelena Kovačević. 2017. Fast resampling of three-dimensional point clouds via graphs. IEEE Transactions on Signal Processing (2017), 666--681.

[6]

Fan RK Chung. 1997. Spectral graph theory. Vol. 92. American Mathematical Soc.

Digital Library

[7]

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 5828--5839.

[8]

Bi'an Du, Xiang Gao, Wei Hu, and Xin Li. 2021. Self-contrastive learning with hard negative sampling for self-supervised point cloud learning. In Proceedings of the 29th ACM International Conference on Multimedia. 3133--3142.

Digital Library

[9]

Matheus Gadelha, Aruni RoyChowdhury, Gopal Sharma, Evangelos Kalogerakis, Liangliang Cao, Erik Learned-Miller, Rui Wang, and Subhransu Maji. 2020. Label-efficient learning on point clouds using approximate convex decompositions. In Proceedings of the European Conference on Computer Vision. 473--491.

Digital Library

[10]

Matheus Gadelha, Rui Wang, and Subhransu Maji. 2018. Multiresolution tree networks for 3D point cloud processing. In Proceedings of the European Conference on Computer Vision. 103--118.

Digital Library

[11]

Xiang Gao, Wei Hu, and Guo-Jun Qi. 2020. Graphter: Unsupervised learning of graph transformation equivariant representations via auto-encoding node-wise transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7163--7172.

[12]

Zhizhong Han, Mingyang Shang, Yu-Shen Liu, and Matthias Zwicker. 2019. View inter-prediction gan: Unsupervised representation learning for 3D shapes by learning global shape memories to support local view predictions. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 8376--8384.

Digital Library

[13]

Kaveh Hassani and Mike Haley. 2019. Unsupervised multi-task feature learning on point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8160--8171.

[14]

Qianjiang Hu, Daizong Liu, and Wei Hu. 2022. Exploring the devil in graph spectral domain for 3D point cloud attacks. In Proceedings of the European Conference on Computer Vision. 229--248.

Digital Library

[15]

Siyuan Huang, Yichen Xie, Song-Chun Zhu, and Yixin Zhu. 2021. Spatio-temporal self-supervised representation learning for 3D point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6535--6545.

[16]

Longlong Jing, Ling Zhang, and Yingli Tian. 2021. Self-supervised feature learning by cross-modality and cross-view correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1581--1591.

[17]

Nikhil Ketkar and Eder Santana. 2017. Deep learning with Python. Vol. 1. Springer.

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[19]

Jiaxin Li, Ben M Chen, and Gim Hee Lee. 2018b. So-net: Self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 9397--9406.

[20]

Lanxiao Li and Michael Heizmann. 2022. A closer look at invariances in self-supervised pre-training for 3D vision. In Proceedings of the European Conference on Computer Vision. 656--673.

Digital Library

[21]

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018a. Pointcnn: Convolution on x-transformed points. Advances in neural information processing systems, Vol. 31 (2018).

[22]

Daizong Liu, Wei Hu, and Xin Li. 2022. Point cloud attacks in graph spectral domain: When 3D geometry meets graph signal processing. arXiv preprint arXiv:2207.13326 (2022).

[23]

Fayao Liu, Guosheng Lin, Chuan-Sheng Foo, Chaitanya K Joshi, and Jie Lin. 2021. Point discriminative learning for unsupervised representation learning on 3D point clouds. arXiv preprint arXiv:2108.02104 (2021).

[24]

Yongcheng Liu, Bin Fan, Shiming Xiang, and Chunhong Pan. 2019. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 8895--8904.

[25]

Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).

[26]

Srikanth Malla and Yi-Ting Chen. 2023. CLR-GAM: Contrastive Point Cloud Learning with Guided Augmentation and Feature Mapping. arXiv preprint arXiv:2302.14306 (2023).

[27]

Ishan Misra, Rohit Girdhar, and Armand Joulin. 2021. An end-to-end transformer model for 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2906--2917.

[28]

Yatian Pang, Wenxiao Wang, Francis EH Tay, Wei Liu, Yonghong Tian, and Li Yuan. 2022. Masked autoencoders for point cloud self-supervised learning. In Proceedings of the European Conference on Computer Vision. 604--621.

Digital Library

[29]

Omid Poursaeed, Tianxing Jiang, Han Qiao, Nayun Xu, and Vladimir G Kim. 2020. Self-supervised learning of point clouds via orientation estimation. In Proceedings of the IEEE Conference on International Conference on 3D Vision. 1018--1028.

[30]

Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. 2019. Deep hough voting for 3D object detection in point clouds. In proceedings of the IEEE/CVF International Conference on Computer Vision. 9277--9286.

[31]

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017a. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 652--660.

[32]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. Pointnet: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, Vol. 30 (2017).

[33]

Aditya Sanghi. 2020. Info3D: Representation learning on 3D objects using mutual information maximization and contrastive learning. In Proceedings of the European Conference on Computer Vision. 626--642.

Digital Library

[34]

Jonathan Sauder and Bjarne Sievers. 2019. Self-supervised deep learning on point clouds by reconstructing space. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[35]

Charu Sharma and Manohar Kaul. 2020. Self-supervised few-shot learning on point clouds. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7212--7221.

[36]

David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine, Vol. 30, 3 (2013), 83--98.

[37]

Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Francc ois Goulette, and Leonidas J Guibas. 2019. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6411--6420.

[38]

Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. 2019. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1588--1597.

[39]

Hanchen Wang, Qi Liu, Xiangyu Yue, Joan Lasenby, and Matt J Kusner. 2021. Unsupervised point cloud pre-training via occlusion completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9782--9792.

[40]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2019. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics, Vol. 38, 5 (2019), 1--12.

Digital Library

[41]

Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Advances in neural information processing systems, Vol. 29 (2016).

[42]

Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. Pointconv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 9621--9630.

[43]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1912--1920.

[44]

Aoran Xiao, Jiaxing Huang, Dayan Guan, Xiaoqin Zhang, Shijian Lu, and Ling Shao. 2023. Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[45]

Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas Guibas, and Or Litany. 2020. Pointcontrast: Unsupervised pre-training for 3D point cloud understanding. In Proceedings of the European Conference on Computer Vision. 574--591.

Digital Library

[46]

Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. 2023. ULIP: Learning a unified representation of language, images, and point clouds for 3D understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1179--1189.

[47]

Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. 2018. Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 206--215.

[48]

Li Yi, Vladimir G Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics, Vol. 35, 6 (2016), 1--12.

Digital Library

[49]

Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. 2022. Point-bert: Pre-training 3D point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19313--19322.

[50]

Ling Zhang and Zhigang Zhu. 2019. Unsupervised feature learning for point cloud understanding by contrasting and clustering using graph convolutional neural networks. In 2019 international conference on 3D vision. IEEE, 395--404.

[51]

Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, and Hongsheng Li. 2022a. Point-M2AE: multi-scale masked autoencoders for hierarchical point cloud pre-training. arXiv preprint arXiv:2205.14401 (2022).

[52]

Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hongsheng Li. 2022b. Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders. arXiv preprint arXiv:2212.06785 (2022).

[53]

Songyang Zhang, Shuguang Cui, and Zhi Ding. 2020. Hypergraph spectral analysis and processing in 3D point cloud. IEEE Transactions on Image Processing, Vol. 30 (2020), 1193--1206.

[54]

Zaiwei Zhang, Rohit Girdhar, Armand Joulin, and Ishan Misra. 2021. Self-supervised pretraining of 3D features on any point-cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10252--10263.

[55]

Yongheng Zhao, Tolga Birdal, Haowen Deng, and Federico Tombari. 2019. 3D point capsule networks. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 1009--1018.

Cited By

Yin WLin SLu YWang HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Diverse Consensuses Paired with Motion Estimation-Based Multi-Model FittingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681646(9281-9290)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681646
Han YXu CXu RQian JXie J(2024)Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence LearningComputer Vision – ECCV 202410.1007/978-3-031-73116-7_24(414-431)Online publication date: 31-Oct-2024
https://doi.org/10.1007/978-3-031-73116-7_24

Index Terms

Graph Spectral Perturbation for 3D Point Cloud Contrastive Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Shape representations

Recommendations

Contrastive Graph Learning with Graph Convolutional Networks
Document Analysis Systems
Abstract
We introduce a new approach for graph representation learning, which will improve the performance of graph-based methods for the problem of key information extraction (kie) from document images. The existing methods either use a fixed graph ...
Adversarial Graph Contrastive Learning with Information Regularization
WWW '22: Proceedings of the ACM Web Conference 2022

Contrastive learning is an effective unsupervised method in graph representation learning. Recently, the data augmentation based contrastive learning method has been extended from images to graphs. However, most prior works are directly adapted from the ...
ArieL: Adversarial Graph Contrastive Learning
Contrastive learning is an effective unsupervised method in graph representation learning. The key component of contrastive learning lies in the construction of positive and negative samples. Previous methods usually utilize the proximity of nodes in the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Science Fund of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
245
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)5

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yin WLin SLu YWang HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Diverse Consensuses Paired with Motion Estimation-Based Multi-Model FittingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681646(9281-9290)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681646
Han YXu CXu RQian JXie J(2024)Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence LearningComputer Vision – ECCV 202410.1007/978-3-031-73116-7_24(414-431)Online publication date: 31-Oct-2024
https://doi.org/10.1007/978-3-031-73116-7_24

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten