Skip to main content

DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15104))

Included in the following conference series:

  • 316 Accesses

Abstract

Point cloud streaming is increasingly getting popular, evolving into the norm for interactive service delivery and the future Metaverse. However, the substantial volume of data associated with point clouds presents numerous challenges, particularly in terms of high bandwidth consumption and large storage capacity. Despite various solutions proposed thus far, with a focus on point cloud compression, upsampling, and completion, these reconstruction-related methods continue to fall short in delivering high fidelity point cloud output. As a solution, in DiffPMAE, we propose an effective point cloud reconstruction architecture. Inspired by self-supervised learning concepts, we combine Masked Autoencoder and Diffusion Model to remotely reconstruct point cloud data. By the nature of this reconstruction process, DiffPMAE can be extended to many related downstream tasks including point cloud compression, upsampling and completion. Leveraging ShapeNet-55 and ModelNet datasets with over 60000 objects, we validate the performance of DiffPMAE exceeding many state-of-the-art methods in terms of autoencoding and downstream tasks considered. Our source code is available at: https://github.com/TyraelDLee/DiffPMAE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Draco 3d graphics compression. https://google.github.io/draco/

  2. Achituve, I., Maron, H., Chechik, G.: Self-supervised learning for domain adaptation on point clouds. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 123–133 (2021)

    Google Scholar 

  3. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: International Conference on Machine Learning, pp. 40–49. PMLR (2018)

    Google Scholar 

  4. Afham, M., Dissanayake, I., Dissanayake, D., Dharmasiri, A., Thilakarathna, K., Rodrigo, R.: Crosspoint: self-supervised cross-modal contrastive learning for 3D point cloud understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9902–9912, June 2022

    Google Scholar 

  5. Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 899-907. NIPS’13, Curran Associates Inc., Red Hook, NY, USA (2013)

    Google Scholar 

  6. Berger, M., Levine, J.A., Nonato, L.G., Taubin, G., Silva, C.T.: A benchmark for surface reconstruction. ACM Trans. Graph. (TOG) 32(2), 1–17 (2013)

    Article  Google Scholar 

  7. Biswas, S., Liu, J., Wong, K., Wang, S., Urtasun, R.: Muscle: multi sweep compression of lidar using deep entropy models. Adv. Neural Inf. Process. Syst. 33, 22170–22181 (2020)

    Google Scholar 

  8. Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  9. Chang, A.X., et al.: ShapeNet: an Information-Rich 3D Model Repository. Technical report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)

  10. F Chen, A., et al.: An introduction to point cloud compression standards. GetMobile: Mob. Comput. Commun. 27(1), 11–17 (2023)

    Google Scholar 

  11. Devillers, O., Gandoin, P.M.: Geometric compression for interactive transmission. In: Proceedings Visualization 2000. VIS 2000 (Cat. No. 00CH37145), pp. 319–326. IEEE (2000)

    Google Scholar 

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  13. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf

  14. Eckart, B., Yuan, W., Liu, C., Kautz, J.: Self-supervised learning on 3d point clouds by learning discrete generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8248–8257 (2021)

    Google Scholar 

  15. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)

    Google Scholar 

  16. Golla, T., Klein, R.: Real-time point cloud compression. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5087–5092. IEEE (2015)

    Google Scholar 

  17. Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., Tabatabai, A.: An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Trans. Signal Inf. Process. 9, e13 (2020)

    Article  Google Scholar 

  18. Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of ACL (2020)

    Google Scholar 

  19. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  20. He, Y., Ren, X., Tang, D., Zhang, Y., Xue, X., Fu, Y.: Density-preserving deep point cloud compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2333–2342 (2022)

    Google Scholar 

  21. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)

    Google Scholar 

  22. Huang, T., Liu, Y.: 3D point cloud geometry compression on deep learning. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 890–898 (2019)

    Google Scholar 

  23. Huang, Y., Peng, J., Kuo, C.C.J., Gopi, M.: A generic scheme for progressive point cloud coding. IEEE Trans. Vis. Comput. Graph. 14(2), 440–453 (2008)

    Article  Google Scholar 

  24. Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7662–7670 (2020)

    Google Scholar 

  25. Kim, D., Shin, M., Paik, J.: PU-EdgeFormer: edge transformer for dense prediction in point cloud upsampling. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)

    Google Scholar 

  26. Li, R., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-GAN: a point cloud upsampling adversarial network. In: IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  27. Li, R., Li, X., Heng, P.A., Fu, C.W.: Point cloud upsampling via disentangled refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 344–353 (2021)

    Google Scholar 

  28. Lien, J.M., Kurillo, G., Bajcsy, R.: Multi-camera tele-immersion system with real-time model driven data compression: a new model-based compression method for massive dynamic point data. Vis. Comput. 26, 3–15 (2010)

    Article  Google Scholar 

  29. Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

    Google Scholar 

  30. Mekuria, R., Blom, K., Cesar, P.: Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Trans. Circuits Syst. Video Technol. 27(4), 828–842 (2016)

    Article  Google Scholar 

  31. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper_files/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

  32. Nardo, F., Peressoni, D., Testolina, P., Giordani, M., Zanella, A.: Point cloud compression for efficient data broadcasting: a performance comparison. In: 2022 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2732–2737. IEEE (2022)

    Google Scholar 

  33. Nichol, A., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)

  34. Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3D point clouds from complex prompts, December 2022

    Google Scholar 

  35. Pang, Y., Wang, W., Tay, F.E.H., Liu, W., Tian, Y., Yuan, L.: Masked Autoencoders for Point Cloud Self-supervised Learning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13662, pp. 604–621. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20086-1_35

  36. Qian, G., Abualshour, A., Li, G., Thabet, A., Ghanem, B.: PU-GCN: point cloud upsampling using graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11683–11692 (2021)

    Google Scholar 

  37. Qiu, S., Anwar, S., Barnes, N.: Pu-transformer: point cloud upsampling transformer. In: Proceedings of the Asian Conference on Computer Vision, pp. 2475–2493 (2022)

    Google Scholar 

  38. Quach, M., Pang, J., Tian, D., Valenzise, G., Dufaux, F.: Survey on deep learning-based point cloud compression. Front. Signal Process. 2, 846972 (2022)

    Article  Google Scholar 

  39. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)

    Google Scholar 

  40. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)

    Google Scholar 

  41. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2021)

    Google Scholar 

  42. Sauder, J., Sievers, B.: Self-supervised deep learning on point clouds by reconstructing space. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  43. Shi, Y., Venkatram, P., Ding, Y., Ooi, W.T.: Enabling low bit-rate mpeg V-PCC-encoded volumetric video streaming with 3D sub-sampling. In: Proceedings of the 14th Conference on ACM Multimedia Systems, pp. 108–118 (2023)

    Google Scholar 

  44. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2256–2265. PMLR, Lille, France, 07–09 July 2015. https://proceedings.mlr.press/v37/sohl-dickstein15.html

  45. Song, R., Fu, C., Liu, S., Li, G.: Efficient hierarchical entropy model for learned point cloud compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14368–14377, June 2023

    Google Scholar 

  46. Tyszkiewicz, M.J., Fua, P., Trulls, E.: GECCO: geometrically-conditioned point diffusion models (2023)

    Google Scholar 

  47. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  48. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ICML ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1390156.1390294

  49. Wang, J., Ding, D., Li, Z., Ma, Z.: Multiscale point cloud geometry compression. In: 2021 Data Compression Conference (DCC), pp. 73–82. IEEE (2021)

    Google Scholar 

  50. Wang, X., Ang Jr, M.H., Lee, G.H.: Cascaded refinement network for point cloud completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 790–799 (2020)

    Google Scholar 

  51. Wei, C., et al.: Diffusion models as masked autoencoder. arXiv preprint arXiv:2304.03283 (2023)

  52. Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920. IEEE Computer Society, Los Alamitos, CA, USA, June 2015. https://doi.org/10.1109/CVPR.2015.7298801, https://doi.ieeecomputersociety.org/10.1109/CVPR.2015.7298801

  53. Xiang, P., et al.: Snowflake point deconvolution for point cloud completion and generation with skip-transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6320–6338 (2023). https://doi.org/10.1109/TPAMI.2022.3217161

    Article  Google Scholar 

  54. Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4541–4550 (2019)

    Google Scholar 

  55. Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  56. Yifan, W., Wu, S., Huang, H., Cohen-Or, D., Sorkine-Hornung, O.: Patch-based progressive 3D point set upsampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5958–5967 (2019)

    Google Scholar 

  57. Yu, L., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-NET: point cloud upsampling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2799 (2018)

    Google Scholar 

  58. Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: PoinTR: diverse point cloud completion with geometry-aware transformers. In: ICCV (2021)

    Google Scholar 

  59. Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-BERT: pre-training 3D point cloud transformers with masked point modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  60. Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: PCN: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737 (2018)

    Google Scholar 

  61. Zeng, X., et al.: LION: latent point diffusion models for 3D shape generation. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 10021–10039. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/40e56dabe12095a5fc44a6e4c3835948-Paper-Conference.pdf

  62. Zhang, R., Guo, Z., Gao, P., Fang, R., Zhao, B., Wang, D., Qiao, Y., Li, H.: Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. arXiv preprint arXiv:2205.14401 (2022)

  63. Zhang, Z., Girdhar, R., Joulin, A., Misra, I.: Self-supervised pretraining of 3D features on any point-cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10252–10263 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanlong Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4495 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y., Madarasingha, C., Thilakarathna, K. (2025). DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15104. Springer, Cham. https://doi.org/10.1007/978-3-031-72952-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72952-2_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72951-5

  • Online ISBN: 978-3-031-72952-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics