Skip to main content

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15073))

Included in the following conference series:

Abstract

We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1 s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/.

Y. Xu and Z. Shi—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdal, R., et al.: Gaussian shell maps for efficient 3D human generation. arXiv preprint arXiv:2311.17857 (2023)

  2. Anciukevičius, T., et al.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  3. Beltagy, I., Peters, M.E., Cohan, A.: LongFormer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

  4. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

  5. Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  6. Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  7. Chan, E.R., et al.: Generative novel view synthesis with 3D-aware diffusion models. International Conference on Computer Vision (2023)

    Google Scholar 

  8. Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelSplat: 3D Gaussian splats from image pairs for scalable generalizable 3D reconstruction. arXiv preprint arXiv:2312.12337 (2023)

  9. Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: TensoRF: tensorial radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 333–350. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_20

    Chapter  Google Scholar 

  10. Chen, A., et al.: MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In: International Conference on Computer Vision (2021)

    Google Scholar 

  11. Chen, G., Wang, W.: A survey on 3D gaussian splatting. arXiv preprint arXiv:2401.03890 (2024)

  12. Chen, H., et al.: Single-stage diffusion NeRF: a unified approach to 3D generation and reconstruction. arXiv preprint arXiv:2304.06714 (2023)

  13. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023)

  14. Chen, Z., Wang, F., Liu, H.: Text-to-3D using Gaussian splatting. arXiv preprint arXiv:2309.16585 (2023)

  15. Chung, J., Lee, S., Nam, H., Lee, J., Lee, K.M.: LucidDreamer: domain-free generation of 3D gaussian splatting scenes. arXiv preprint arXiv:2311.13384 (2023)

  16. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794 (2021)

    Google Scholar 

  17. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  18. Downs, L., et al.: Google scanned objects: a high-quality dataset of 3D scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2553–2560. IEEE (2022)

    Google Scholar 

  19. Fei, B., Xu, J., Zhang, R., Zhou, Q., Yang, W., He, Y.: 3D Gaussian as a new vision era: a survey. arXiv preprint arXiv:2402.07181 (2024)

  20. Gao, J., et al.: GET3D: a generative model of high quality 3D textured shapes learned from images. In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  21. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  22. Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985 (2021)

  23. Gu, J., et al.: NeRFDiff: single-image view synthesis with nerf-guided distillation from 3D-aware diffusion. In: International Conference on Machine Learning (2023)

    Google Scholar 

  24. Gupta, A., Xiong, W., Nie, Y., Jones, I., Oğuz, B.: 3DGen: triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371 (2023)

  25. Hertz, A., Aberman, K., Cohen-Or, D.: Delta denoising score. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2328–2337 (2023)

    Google Scholar 

  26. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  27. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  28. Hong, Y., et al.: LRM: large reconstruction model for single image to 3D. arXiv preprint arXiv:2311.04400 (2023)

  29. Hu, L., et al.: GaussianAvatar: towards realistic human avatar modeling from a single video via animatable 3D gaussians. arXiv preprint arXiv:2312.02134 (2023)

  30. Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  31. Jain, A., Tancik, M., Abbeel, P.: Putting NeRF on a diet: semantically consistent few-shot view synthesis. In: International Conference on Computer Vision (2021)

    Google Scholar 

  32. Jia, Y.B.: Plücker coordinates for lines in the space. Problem Solver Techniques for Applied Computer Science, Com-S-477/577 Course Handout (2020)

    Google Scholar 

  33. Jiang, H., Jiang, Z., Zhao, Y., Huang, Q.: LEAP: liberate sparse-view 3D modeling from camera poses. In: International Conference on Learning Representation (2024)

    Google Scholar 

  34. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    Chapter  Google Scholar 

  35. Jun, H., Nichol, A.: Shap-E: generating conditional 3D implicit functions. arXiv preprint arXiv:2305.02463 (2023)

  36. Kang, M., et al.: Scaling up GANs for text-to-image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  37. Karnewar, A., Vedaldi, A., Novotny, D., Mitra, N.J.: HoloDiffusion: training a 3D diffusion model using 2D images. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  38. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representation (2018)

    Google Scholar 

  39. Karras, T., et al.: Alias-free generative adversarial networks. In: Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  40. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  41. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  42. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)

    Google Scholar 

  43. Keselman, L., Hebert, M.: Approximate differentiable rendering with algebraic surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 596–614. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_35

    Chapter  Google Scholar 

  44. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  45. Li, J., et al.: Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model (2023). https://arxiv.org/abs/2311.06214

  46. Li, X., Wang, H., Tseng, K.K.: GaussianDiffusion: 3D gaussian splatting for denoising diffusion probabilistic models with structured noise. arXiv preprint arXiv:2311.11221 (2023)

  47. Li, Z., Zheng, Z., Wang, L., Liu, Y.: Animatable gaussians: learning pose-dependent gaussian maps for high-fidelity human avatar modeling. arXiv preprint arXiv:2311.16096 (2023)

  48. Liang, Y., Yang, X., Lin, J., Li, H., Xu, X., Chen, Y.: LucidDreamer: towards high-fidelity text-to-3D generation via interval score matching. arXiv preprint arXiv:2311.11284 (2023)

  49. Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 300–309 (2023)

    Google Scholar 

  50. Lin, K.E., Yen-Chen, L., Lai, W.S., Lin, T.Y., Shih, Y.C., Ramamoorthi, R.: Vision transformer for NeRF-based view synthesis from a single input image. In: IEEE Winter Conference on Applications of Computer Vision (2023)

    Google Scholar 

  51. Ling, H., Kim, S.W., Torralba, A., Fidler, S., Kreis, K.: Align your gaussians: text-to-4D with dynamic 3D gaussians and composed diffusion models. arXiv preprint arXiv:2312.13763 (2023)

  52. Liu, M., et al.: One-2-3-45++: fast single image to 3D objects with consistent multi-view generation and 3D diffusion. arXiv preprint arXiv:2311.07885 (2023)

  53. Liu, M., et al.: One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization (2023)

    Google Scholar 

  54. Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9298–9309 (2023)

    Google Scholar 

  55. Liu, Y., et al.: SyncDreamer: generating multiview-consistent images from a single-view image. In: The Twelfth International Conference on Learning Representations (2023)

    Google Scholar 

  56. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  57. Long, X., et al.: Wonder3D: single image to 3D using cross-domain diffusion. arXiv preprint arXiv:2310.15008 (2023)

  58. Long, X., Lin, C., Wang, P., Komura, T., Wang, W.: SparseNeuS: fast generalizable neural surface reconstruction from sparse views. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 21–227. Springer, Cham (2022)

    Chapter  Google Scholar 

  59. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  60. Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713 (2023)

  61. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  62. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  63. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 102:1–102:15 (2022). https://doi.org/10.1145/3528223.3530127

  64. Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: International Conference on Computer Vision (2019)

    Google Scholar 

  65. Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022)

  66. Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  67. Ntavelis, E., Siarohin, A., Olszewski, K., Wang, C., Van Gool, L., Tulyakov, S.: Autodecoding latent 3D diffusion models. arXiv preprint arXiv:2307.05445 (2023)

  68. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  69. Po, R., et al.: State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204 (2023)

  70. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: The Eleventh International Conference on Learning Representations (2022)

    Google Scholar 

  71. Qian, S., Kirschstein, T., Schoneveld, L., Davoli, D., Giebenhain, S., Nießner, M.: GaussianAvatars: photorealistic head avatars with rigged 3D gaussians. arXiv preprint arXiv:2312.02069 (2023)

  72. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  73. Ren, J., et al.: DreamGaussian4D: generative 4D Gaussian splatting. arXiv preprint arXiv:2312.17142 (2023)

  74. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  75. Saito, S., Schwartz, G., Simon, T., Li, J., Nam, G.: Relightable gaussian codec avatars. arXiv preprint arXiv:2312.03704 (2023)

  76. Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  77. Shen, B., et al.: GINA-3D: learning to generate implicit neural assets in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4913–4926 (2023)

    Google Scholar 

  78. Shi, R., et al.: Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023)

  79. Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)

    Google Scholar 

  80. Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. In: The Twelfth International Conference on Learning Representations (2023)

    Google Scholar 

  81. Shi, Z., Peng, S., Xu, Y., Andreas, G., Liao, Y., Shen, Y.: Deep generative models on 3D representations: a survey. arXiv preprint arXiv:2210.15663 (2022)

  82. Shue, J.R., Chan, E.R., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  83. Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473 (2020)

    Google Scholar 

  84. Sitzmann, V., Rezchikov, S., Freeman, B., Tenenbaum, J., Durand, F.: Light field networks: neural scene representations with single-evaluation rendering. In: NeurIPS (2021)

    Google Scholar 

  85. Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  86. Skorokhodov, I., Siarohin, A., Xu, Y., Ren, J., Lee, H.Y., Wonka, P., Tulyakov, S.: 3D generation on ImageNet. In: International Conference on Learning Representations (2023). https://openreview.net/forum?id=U2WjB9xxZ9q

  87. Skorokhodov, I., Tulyakov, S., Wang, Y., Wonka, P.: EpiGRAF: rethinking training of 3D GANs. In: In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  88. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)

  89. Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Splatter image: ultra-fast single-view 3D reconstruction. arXiv preprint arXiv:2312.13150 (2023)

  90. Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Viewset diffusion:(0-) image-conditioned 3D generative models from 2D data. arXiv preprint arXiv:2306.07881 (2023)

  91. Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: LGM: large multi-view gaussian model for high-resolution 3D content creation. arXiv preprint arXiv:2402.05054 (2024)

  92. Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: generative gaussian splatting for efficient 3D content creation. arXiv preprint arXiv:2309.16653 (2023)

  93. Tang, J., et al.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184 (2023)

  94. Tewari, A., et al.: Advances in neural rendering. In: Computer Graphics Forum, pp. 703–735 (2022)

    Google Scholar 

  95. Tewari, A., et al.: Diffusion with forward models: solving stochastic inverse problems without direct supervision. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  96. Tosi, F., et al.: How NeRFs and 3D gaussian splatting are reshaping SLAM: a survey. arXiv preprint arXiv:2402.13255 (2024)

  97. Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12619–12629 (2023)

    Google Scholar 

  98. Wang, P., et al.: PF-LRM: pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024 (2023)

  99. Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  100. Wang, Z., et al.: ProlificDreamer: high-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213 (2023)

  101. Wu, G., et al.: 4D gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023)

  102. Xu, D., et al.: AGG: amortized generative 3D gaussians for single image to 3D. arXiv preprint arXiv:2401.04099 (2024)

  103. Xu, Y., et al.: DisCoScene: spatially disentangled generative radiance fields for controllable 3d-aware scene synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  104. Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3D-aware image synthesis via learning structural and textural representations. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  105. Xu, Y., et al.: DMV3D: denoising multi-view diffusion using 3D large reconstruction model. arXiv preprint arXiv:2311.09217 (2023)

  106. Yang, Z., Yang, H., Pan, Z., Zhu, X., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4D gaussian splatting. arXiv preprint arXiv:2310.10642 (2023)

  107. Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)

  108. Yu, A., Ye, V., Tancik, M., Kanazawa, A.: PixelNeRF: neural radiance fields from one or few images. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  109. Zhang, K., et al.: ARF: artistic radiance fields (2022)

    Google Scholar 

  110. Zhu, J., Yang, C., Zheng, K., Xu, Y., Shi, Z., Shen, Y.: Exploring sparse MoE in GANs for text-conditioned image synthesis. arXiv preprint arXiv:2309.03904 (2023)

  111. Zielonka, W., Bagautdinov, T., Saito, S., Zollhöfer, M., Thies, J., Romero, J.: Drivable 3D Gaussian avatars. arXiv preprint arXiv:2311.08581 (2023)

  112. Zou, Z.X., et al.: Triplane meets Gaussian splatting: fast and generalizable single-view 3D reconstruction with transformers. arXiv preprint arXiv:2312.09147 (2023)

Download references

Acknowledgement

We would like to thank Shangzhan Zhang for his help with the demo video, and Minghua Liu for assisting with the evaluation of One-2-3-45++. This project was supported by Google, Samsung, and a Swiss Postdoc Mobility fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinghao Xu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6027 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Y. et al. (2025). GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15073. Springer, Cham. https://doi.org/10.1007/978-3-031-72633-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72633-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72632-3

  • Online ISBN: 978-3-031-72633-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics