Abstract
Point cloud streaming is increasingly getting popular, evolving into the norm for interactive service delivery and the future Metaverse. However, the substantial volume of data associated with point clouds presents numerous challenges, particularly in terms of high bandwidth consumption and large storage capacity. Despite various solutions proposed thus far, with a focus on point cloud compression, upsampling, and completion, these reconstruction-related methods continue to fall short in delivering high fidelity point cloud output. As a solution, in DiffPMAE, we propose an effective point cloud reconstruction architecture. Inspired by self-supervised learning concepts, we combine Masked Autoencoder and Diffusion Model to remotely reconstruct point cloud data. By the nature of this reconstruction process, DiffPMAE can be extended to many related downstream tasks including point cloud compression, upsampling and completion. Leveraging ShapeNet-55 and ModelNet datasets with over 60000 objects, we validate the performance of DiffPMAE exceeding many state-of-the-art methods in terms of autoencoding and downstream tasks considered. Our source code is available at: https://github.com/TyraelDLee/DiffPMAE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Draco 3d graphics compression. https://google.github.io/draco/
Achituve, I., Maron, H., Chechik, G.: Self-supervised learning for domain adaptation on point clouds. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 123–133 (2021)
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: International Conference on Machine Learning, pp. 40–49. PMLR (2018)
Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for 3D point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9902–9912, June 2022
Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 899-907. NIPS’13, Curran Associates Inc., Red Hook, NY, USA (2013)
Berger, M., Levine, J.A., Nonato, L.G., Taubin, G., Silva, C.T.: A benchmark for surface reconstruction. ACM Trans. Graph. (TOG) 32(2), 1–17 (2013)
Biswas, S., Liu, J., Wong, K., Wang, S., Urtasun, R.: Muscle: multi sweep compression of lidar using deep entropy models. Adv. Neural Inf. Process. Syst. 33, 22170–22181 (2020)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Chang, A.X., et al.: ShapeNet: an Information-Rich 3D Model Repository. Technical report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)
F Chen, A., et al.: An introduction to point cloud compression standards. GetMobile: Mob. Comput. Commun. 27(1), 11–17 (2023)
Devillers, O., Gandoin, P.M.: Geometric compression for interactive transmission. In: Proceedings Visualization 2000. VIS 2000 (Cat. No. 00CH37145), pp. 319–326. IEEE (2000)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
Eckart, B., Yuan, W., Liu, C., Kautz, J.: Self-supervised learning on 3d point clouds by learning discrete generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8248–8257 (2021)
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)
Golla, T., Klein, R.: Real-time point cloud compression. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5087–5092. IEEE (2015)
Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., Tabatabai, A.: An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 9, e13 (2020)
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of ACL (2020)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)
He, Y., Ren, X., Tang, D., Zhang, Y., Xue, X., Fu, Y.: Density-preserving deep point cloud compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2333–2342 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)
Huang, T., Liu, Y.: 3D point cloud geometry compression on deep learning. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 890–898 (2019)
Huang, Y., Peng, J., Kuo, C.C.J., Gopi, M.: A generic scheme for progressive point cloud coding. IEEE Trans. Vis. Comput. Graph. 14(2), 440–453 (2008)
Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7662–7670 (2020)
Kim, D., Shin, M., Paik, J.: PU-EdgeFormer: edge transformer for dense prediction in point cloud upsampling. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Li, R., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-GAN: a point cloud upsampling adversarial network. In: IEEE International Conference on Computer Vision (ICCV) (2019)
Li, R., Li, X., Heng, P.A., Fu, C.W.: Point cloud upsampling via disentangled refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 344–353 (2021)
Lien, J.M., Kurillo, G., Bajcsy, R.: Multi-camera tele-immersion system with real-time model driven data compression: a new model-based compression method for massive dynamic point data. Vis. Comput. 26, 3–15 (2010)
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021
Mekuria, R., Blom, K., Cesar, P.: Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Trans. Circuits Syst. Video Technol. 27(4), 828–842 (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper_files/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
Nardo, F., Peressoni, D., Testolina, P., Giordani, M., Zanella, A.: Point cloud compression for efficient data broadcasting: a performance comparison. In: 2022 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2732–2737. IEEE (2022)
Nichol, A., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)
Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3D point clouds from complex prompts, December 2022
Pang, Y., Wang, W., Tay, F.E.H., Liu, W., Tian, Y., Yuan, L.: Masked Autoencoders for Point Cloud Self-supervised Learning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13662, pp. 604–621. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_35
Qian, G., Abualshour, A., Li, G., Thabet, A., Ghanem, B.: PU-GCN: point cloud upsampling using graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11683–11692 (2021)
Qiu, S., Anwar, S., Barnes, N.: Pu-transformer: point cloud upsampling transformer. In: Proceedings of the Asian Conference on Computer Vision, pp. 2475–2493 (2022)
Quach, M., Pang, J., Tian, D., Valenzise, G., Dufaux, F.: Survey on deep learning-based point cloud compression. Front. Signal Process. 2, 846972 (2022)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2021)
Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. Adv. Neural Inf. Process. Syst. 32 (2019)
Shi, Y., Venkatram, P., Ding, Y., Ooi, W.T.: Enabling low bit-rate mpeg V-PCC-encoded volumetric video streaming with 3D sub-sampling. In: Proceedings of the 14th Conference on ACM Multimedia Systems, pp. 108–118 (2023)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2256–2265. PMLR, Lille, France, 07–09 July 2015. https://proceedings.mlr.press/v37/sohl-dickstein15.html
Song, R., Fu, C., Liu, S., Li, G.: Efficient hierarchical entropy model for learned point cloud compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14368–14377, June 2023
Tyszkiewicz, M.J., Fua, P., Trulls, E.: GECCO: geometrically-conditioned point diffusion models (2023)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ICML ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1390156.1390294
Wang, J., Ding, D., Li, Z., Ma, Z.: Multiscale point cloud geometry compression. In: 2021 Data Compression Conference (DCC), pp. 73–82. IEEE (2021)
Wang, X., Ang Jr, M.H., Lee, G.H.: Cascaded refinement network for point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 790–799 (2020)
Wei, C., et al.: Diffusion models as masked autoencoder. arXiv preprint arXiv:2304.03283 (2023)
Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920. IEEE Computer Society, Los Alamitos, CA, USA, June 2015. https://doi.org/10.1109/CVPR.2015.7298801, https://doi.ieeecomputersociety.org/10.1109/CVPR.2015.7298801
Xiang, P., et al.: Snowflake point deconvolution for point cloud completion and generation with skip-transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6320–6338 (2023). https://doi.org/10.1109/TPAMI.2022.3217161
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4541–4550 (2019)
Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Yifan, W., Wu, S., Huang, H., Cohen-Or, D., Sorkine-Hornung, O.: Patch-based progressive 3D point set upsampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5958–5967 (2019)
Yu, L., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-NET: point cloud upsampling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2799 (2018)
Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: PoinTR: diverse point cloud completion with geometry-aware transformers. In: ICCV (2021)
Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: PCN: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737 (2018)
Zeng, X., et al.: LION: latent point diffusion models for 3D shape generation. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 10021–10039. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/40e56dabe12095a5fc44a6e4c3835948-Paper-Conference.pdf
Zhang, R., Guo, Z., Gao, P., Fang, R., Zhao, B., Wang, D., Qiao, Y., Li, H.: Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. arXiv preprint arXiv:2205.14401 (2022)
Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10252–10263 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Madarasingha, C., Thilakarathna, K. (2025). DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-72952-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72951-5
Online ISBN: 978-3-031-72952-2
eBook Packages: Computer ScienceComputer Science (R0)