DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction

Li, Yanlong; Madarasingha, Chamara; Thilakarathna, Kanchana

doi:10.1007/978-3-031-72952-2_21

Yanlong Li¹³,
Chamara Madarasingha¹⁴ &
Kanchana Thilakarathna¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15104))

Included in the following conference series:

European Conference on Computer Vision

316 Accesses

Abstract

Point cloud streaming is increasingly getting popular, evolving into the norm for interactive service delivery and the future Metaverse. However, the substantial volume of data associated with point clouds presents numerous challenges, particularly in terms of high bandwidth consumption and large storage capacity. Despite various solutions proposed thus far, with a focus on point cloud compression, upsampling, and completion, these reconstruction-related methods continue to fall short in delivering high fidelity point cloud output. As a solution, in DiffPMAE, we propose an effective point cloud reconstruction architecture. Inspired by self-supervised learning concepts, we combine Masked Autoencoder and Diffusion Model to remotely reconstruct point cloud data. By the nature of this reconstruction process, DiffPMAE can be extended to many related downstream tasks including point cloud compression, upsampling and completion. Leveraging ShapeNet-55 and ModelNet datasets with over 60000 objects, we validate the performance of DiffPMAE exceeding many state-of-the-art methods in terms of autoencoding and downstream tasks considered. Our source code is available at: https://github.com/TyraelDLee/DiffPMAE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Latent diffusion transformer for point cloud generation

Article 22 April 2024

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning

P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising

References

Draco 3d graphics compression. https://google.github.io/draco/
Achituve, I., Maron, H., Chechik, G.: Self-supervised learning for domain adaptation on point clouds. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 123–133 (2021)
Google Scholar
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: International Conference on Machine Learning, pp. 40–49. PMLR (2018)
Google Scholar
Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for 3D point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9902–9912, June 2022
Google Scholar
Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 899-907. NIPS’13, Curran Associates Inc., Red Hook, NY, USA (2013)
Google Scholar
Berger, M., Levine, J.A., Nonato, L.G., Taubin, G., Silva, C.T.: A benchmark for surface reconstruction. ACM Trans. Graph. (TOG) 32(2), 1–17 (2013)
Article Google Scholar
Biswas, S., Liu, J., Wong, K., Wang, S., Urtasun, R.: Muscle: multi sweep compression of lidar using deep entropy models. Adv. Neural Inf. Process. Syst. 33, 22170–22181 (2020)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chang, A.X., et al.: ShapeNet: an Information-Rich 3D Model Repository. Technical report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)
F Chen, A., et al.: An introduction to point cloud compression standards. GetMobile: Mob. Comput. Commun. 27(1), 11–17 (2023)
Google Scholar
Devillers, O., Gandoin, P.M.: Geometric compression for interactive transmission. In: Proceedings Visualization 2000. VIS 2000 (Cat. No. 00CH37145), pp. 319–326. IEEE (2000)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
Eckart, B., Yuan, W., Liu, C., Kautz, J.: Self-supervised learning on 3d point clouds by learning discrete generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8248–8257 (2021)
Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)
Google Scholar
Golla, T., Klein, R.: Real-time point cloud compression. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5087–5092. IEEE (2015)
Google Scholar
Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., Tabatabai, A.: An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 9, e13 (2020)
Article Google Scholar
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of ACL (2020)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
He, Y., Ren, X., Tang, D., Zhang, Y., Xue, X., Fu, Y.: Density-preserving deep point cloud compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2333–2342 (2022)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)
Google Scholar
Huang, T., Liu, Y.: 3D point cloud geometry compression on deep learning. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 890–898 (2019)
Google Scholar
Huang, Y., Peng, J., Kuo, C.C.J., Gopi, M.: A generic scheme for progressive point cloud coding. IEEE Trans. Vis. Comput. Graph. 14(2), 440–453 (2008)
Article Google Scholar
Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7662–7670 (2020)
Google Scholar
Kim, D., Shin, M., Paik, J.: PU-EdgeFormer: edge transformer for dense prediction in point cloud upsampling. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Google Scholar
Li, R., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-GAN: a point cloud upsampling adversarial network. In: IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Li, R., Li, X., Heng, P.A., Fu, C.W.: Point cloud upsampling via disentangled refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 344–353 (2021)
Google Scholar
Lien, J.M., Kurillo, G., Bajcsy, R.: Multi-camera tele-immersion system with real-time model driven data compression: a new model-based compression method for massive dynamic point data. Vis. Comput. 26, 3–15 (2010)
Article Google Scholar
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021
Google Scholar
Mekuria, R., Blom, K., Cesar, P.: Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Trans. Circuits Syst. Video Technol. 27(4), 828–842 (2016)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper_files/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
Nardo, F., Peressoni, D., Testolina, P., Giordani, M., Zanella, A.: Point cloud compression for efficient data broadcasting: a performance comparison. In: 2022 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2732–2737. IEEE (2022)
Google Scholar
Nichol, A., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)
Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3D point clouds from complex prompts, December 2022
Google Scholar
Pang, Y., Wang, W., Tay, F.E.H., Liu, W., Tian, Y., Yuan, L.: Masked Autoencoders for Point Cloud Self-supervised Learning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13662, pp. 604–621. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_35
Qian, G., Abualshour, A., Li, G., Thabet, A., Ghanem, B.: PU-GCN: point cloud upsampling using graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11683–11692 (2021)
Google Scholar
Qiu, S., Anwar, S., Barnes, N.: Pu-transformer: point cloud upsampling transformer. In: Proceedings of the Asian Conference on Computer Vision, pp. 2475–2493 (2022)
Google Scholar
Quach, M., Pang, J., Tian, D., Valenzise, G., Dufaux, F.: Survey on deep learning-based point cloud compression. Front. Signal Process. 2, 846972 (2022)
Article Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2021)
Google Scholar
Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Shi, Y., Venkatram, P., Ding, Y., Ooi, W.T.: Enabling low bit-rate mpeg V-PCC-encoded volumetric video streaming with 3D sub-sampling. In: Proceedings of the 14th Conference on ACM Multimedia Systems, pp. 108–118 (2023)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2256–2265. PMLR, Lille, France, 07–09 July 2015. https://proceedings.mlr.press/v37/sohl-dickstein15.html
Song, R., Fu, C., Liu, S., Li, G.: Efficient hierarchical entropy model for learned point cloud compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14368–14377, June 2023
Google Scholar
Tyszkiewicz, M.J., Fua, P., Trulls, E.: GECCO: geometrically-conditioned point diffusion models (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ICML ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1390156.1390294
Wang, J., Ding, D., Li, Z., Ma, Z.: Multiscale point cloud geometry compression. In: 2021 Data Compression Conference (DCC), pp. 73–82. IEEE (2021)
Google Scholar
Wang, X., Ang Jr, M.H., Lee, G.H.: Cascaded refinement network for point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 790–799 (2020)
Google Scholar
Wei, C., et al.: Diffusion models as masked autoencoder. arXiv preprint arXiv:2304.03283 (2023)
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920. IEEE Computer Society, Los Alamitos, CA, USA, June 2015. https://doi.org/10.1109/CVPR.2015.7298801, https://doi.ieeecomputersociety.org/10.1109/CVPR.2015.7298801
Xiang, P., et al.: Snowflake point deconvolution for point cloud completion and generation with skip-transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6320–6338 (2023). https://doi.org/10.1109/TPAMI.2022.3217161
Article Google Scholar
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4541–4550 (2019)
Google Scholar
Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Yifan, W., Wu, S., Huang, H., Cohen-Or, D., Sorkine-Hornung, O.: Patch-based progressive 3D point set upsampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5958–5967 (2019)
Google Scholar
Yu, L., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-NET: point cloud upsampling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2799 (2018)
Google Scholar
Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: PoinTR: diverse point cloud completion with geometry-aware transformers. In: ICCV (2021)
Google Scholar
Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: PCN: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737 (2018)
Google Scholar
Zeng, X., et al.: LION: latent point diffusion models for 3D shape generation. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 10021–10039. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/40e56dabe12095a5fc44a6e4c3835948-Paper-Conference.pdf
Zhang, R., Guo, Z., Gao, P., Fang, R., Zhao, B., Wang, D., Qiao, Y., Li, H.: Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. arXiv preprint arXiv:2205.14401 (2022)
Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10252–10263 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Sydney, Sydney, Australia
Yanlong Li & Kanchana Thilakarathna
University of New South Wales, Kensington, Australia
Chamara Madarasingha

Authors

Yanlong Li
View author publications
You can also search for this author in PubMed Google Scholar
Chamara Madarasingha
View author publications
You can also search for this author in PubMed Google Scholar
Kanchana Thilakarathna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanlong Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4495 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Madarasingha, C., Thilakarathna, K. (2025). DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-72952-2_21
Published: 01 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72951-5
Online ISBN: 978-3-031-72952-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction