GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Xu, Yinghao; Shi, Zifan; Yifan, Wang; Chen, Hansheng; Yang, Ceyuan; Peng, Sida; Shen, Yujun; Wetzstein, Gordon

doi:10.1007/978-3-031-72633-0_1

Yinghao Xu¹³,
Zifan Shi^13,14,
Wang Yifan¹³,
Hansheng Chen¹³,
Ceyuan Yang¹⁵,
Sida Peng¹⁶,
Yujun Shen¹⁷ &
…
Gordon Wetzstein¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15073))

Included in the following conference series:

European Conference on Computer Vision

550 Accesses
7 Citations

Abstract

We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1 s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/.

Y. Xu and Z. Shi—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MVDiffusion++: A Dense High-Resolution Multi-view Diffusion Model for Single or Sparse-View 3D Object Reconstruction

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

References

Abdal, R., et al.: Gaussian shell maps for efficient 3D human generation. arXiv preprint arXiv:2311.17857 (2023)
Anciukevičius, T., et al.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Beltagy, I., Peters, M.E., Cohan, A.: LongFormer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Chan, E.R., et al.: Generative novel view synthesis with 3D-aware diffusion models. International Conference on Computer Vision (2023)
Google Scholar
Charatan, D., Li, S., Tagliasacchi, A., Sitzmann, V.: pixelSplat: 3D Gaussian splats from image pairs for scalable generalizable 3D reconstruction. arXiv preprint arXiv:2312.12337 (2023)
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: TensoRF: tensorial radiance fields. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 333–350. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_20
Chapter Google Scholar
Chen, A., et al.: MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In: International Conference on Computer Vision (2021)
Google Scholar
Chen, G., Wang, W.: A survey on 3D gaussian splatting. arXiv preprint arXiv:2401.03890 (2024)
Chen, H., et al.: Single-stage diffusion NeRF: a unified approach to 3D generation and reconstruction. arXiv preprint arXiv:2304.06714 (2023)
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023)
Chen, Z., Wang, F., Liu, H.: Text-to-3D using Gaussian splatting. arXiv preprint arXiv:2309.16585 (2023)
Chung, J., Lee, S., Nam, H., Lee, J., Lee, K.M.: LucidDreamer: domain-free generation of 3D gaussian splatting scenes. arXiv preprint arXiv:2311.13384 (2023)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794 (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Downs, L., et al.: Google scanned objects: a high-quality dataset of 3D scanned household items. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 2553–2560. IEEE (2022)
Google Scholar
Fei, B., Xu, J., Zhang, R., Zhou, Q., Yang, W., He, Y.: 3D Gaussian as a new vision era: a survey. arXiv preprint arXiv:2402.07181 (2024)
Gao, J., et al.: GET3D: a generative model of high quality 3D textured shapes learned from images. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985 (2021)
Gu, J., et al.: NeRFDiff: single-image view synthesis with nerf-guided distillation from 3D-aware diffusion. In: International Conference on Machine Learning (2023)
Google Scholar
Gupta, A., Xiong, W., Nie, Y., Jones, I., Oğuz, B.: 3DGen: triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371 (2023)
Hertz, A., Aberman, K., Cohen-Or, D.: Delta denoising score. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2328–2337 (2023)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems (2020)
Google Scholar
Hong, Y., et al.: LRM: large reconstruction model for single image to 3D. arXiv preprint arXiv:2311.04400 (2023)
Hu, L., et al.: GaussianAvatar: towards realistic human avatar modeling from a single video via animatable 3D gaussians. arXiv preprint arXiv:2312.02134 (2023)
Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Jain, A., Tancik, M., Abbeel, P.: Putting NeRF on a diet: semantically consistent few-shot view synthesis. In: International Conference on Computer Vision (2021)
Google Scholar
Jia, Y.B.: Plücker coordinates for lines in the space. Problem Solver Techniques for Applied Computer Science, Com-S-477/577 Course Handout (2020)
Google Scholar
Jiang, H., Jiang, Z., Zhao, Y., Huang, Q.: LEAP: liberate sparse-view 3D modeling from camera poses. In: International Conference on Learning Representation (2024)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Jun, H., Nichol, A.: Shap-E: generating conditional 3D implicit functions. arXiv preprint arXiv:2305.02463 (2023)
Kang, M., et al.: Scaling up GANs for text-to-image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Karnewar, A., Vedaldi, A., Novotny, D., Mitra, N.J.: HoloDiffusion: training a 3D diffusion model using 2D images. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representation (2018)
Google Scholar
Karras, T., et al.: Alias-free generative adversarial networks. In: Advances in Neural Information Processing Systems (2021)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)
Google Scholar
Keselman, L., Hebert, M.: Approximate differentiable rendering with algebraic surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 596–614. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19824-3_35
Chapter Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Li, J., et al.: Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model (2023). https://arxiv.org/abs/2311.06214
Li, X., Wang, H., Tseng, K.K.: GaussianDiffusion: 3D gaussian splatting for denoising diffusion probabilistic models with structured noise. arXiv preprint arXiv:2311.11221 (2023)
Li, Z., Zheng, Z., Wang, L., Liu, Y.: Animatable gaussians: learning pose-dependent gaussian maps for high-fidelity human avatar modeling. arXiv preprint arXiv:2311.16096 (2023)
Liang, Y., Yang, X., Lin, J., Li, H., Xu, X., Chen, Y.: LucidDreamer: towards high-fidelity text-to-3D generation via interval score matching. arXiv preprint arXiv:2311.11284 (2023)
Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 300–309 (2023)
Google Scholar
Lin, K.E., Yen-Chen, L., Lai, W.S., Lin, T.Y., Shih, Y.C., Ramamoorthi, R.: Vision transformer for NeRF-based view synthesis from a single input image. In: IEEE Winter Conference on Applications of Computer Vision (2023)
Google Scholar
Ling, H., Kim, S.W., Torralba, A., Fidler, S., Kreis, K.: Align your gaussians: text-to-4D with dynamic 3D gaussians and composed diffusion models. arXiv preprint arXiv:2312.13763 (2023)
Liu, M., et al.: One-2-3-45++: fast single image to 3D objects with consistent multi-view generation and 3D diffusion. arXiv preprint arXiv:2311.07885 (2023)
Liu, M., et al.: One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization (2023)
Google Scholar
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9298–9309 (2023)
Google Scholar
Liu, Y., et al.: SyncDreamer: generating multiview-consistent images from a single-view image. In: The Twelfth International Conference on Learning Representations (2023)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Long, X., et al.: Wonder3D: single image to 3D using cross-domain diffusion. arXiv preprint arXiv:2310.15008 (2023)
Long, X., Lin, C., Wang, P., Komura, T., Wang, W.: SparseNeuS: fast generalizable neural surface reconstruction from sparse views. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13692, pp. 21–227. Springer, Cham (2022)
Chapter Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713 (2023)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 102:1–102:15 (2022). https://doi.org/10.1145/3528223.3530127
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: International Conference on Computer Vision (2019)
Google Scholar
Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022)
Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Ntavelis, E., Siarohin, A., Olszewski, K., Wang, C., Van Gool, L., Tulyakov, S.: Autodecoding latent 3D diffusion models. arXiv preprint arXiv:2307.05445 (2023)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Po, R., et al.: State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204 (2023)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: The Eleventh International Conference on Learning Representations (2022)
Google Scholar
Qian, S., Kirschstein, T., Schoneveld, L., Davoli, D., Giebenhain, S., Nießner, M.: GaussianAvatars: photorealistic head avatars with rigged 3D gaussians. arXiv preprint arXiv:2312.02069 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ren, J., et al.: DreamGaussian4D: generative 4D Gaussian splatting. arXiv preprint arXiv:2312.17142 (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Saito, S., Schwartz, G., Simon, T., Li, J., Nam, G.: Relightable gaussian codec avatars. arXiv preprint arXiv:2312.03704 (2023)
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: Advances in Neural Information Processing Systems (2020)
Google Scholar
Shen, B., et al.: GINA-3D: learning to generate implicit neural assets in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4913–4926 (2023)
Google Scholar
Shi, R., et al.: Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023)
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Google Scholar
Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. In: The Twelfth International Conference on Learning Representations (2023)
Google Scholar
Shi, Z., Peng, S., Xu, Y., Andreas, G., Liao, Y., Shen, Y.: Deep generative models on 3D representations: a survey. arXiv preprint arXiv:2210.15663 (2022)
Shue, J.R., Chan, E.R., Po, R., Ankner, Z., Wu, J., Wetzstein, G.: 3D neural field generation using triplane diffusion. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473 (2020)
Google Scholar
Sitzmann, V., Rezchikov, S., Freeman, B., Tenenbaum, J., Durand, F.: Light field networks: neural scene representations with single-evaluation rendering. In: NeurIPS (2021)
Google Scholar
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3D-structure-aware neural scene representations. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Skorokhodov, I., Siarohin, A., Xu, Y., Ren, J., Lee, H.Y., Wonka, P., Tulyakov, S.: 3D generation on ImageNet. In: International Conference on Learning Representations (2023). https://openreview.net/forum?id=U2WjB9xxZ9q
Skorokhodov, I., Tulyakov, S., Wang, Y., Wonka, P.: EpiGRAF: rethinking training of 3D GANs. In: In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Splatter image: ultra-fast single-view 3D reconstruction. arXiv preprint arXiv:2312.13150 (2023)
Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Viewset diffusion:(0-) image-conditioned 3D generative models from 2D data. arXiv preprint arXiv:2306.07881 (2023)
Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: LGM: large multi-view gaussian model for high-resolution 3D content creation. arXiv preprint arXiv:2402.05054 (2024)
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: generative gaussian splatting for efficient 3D content creation. arXiv preprint arXiv:2309.16653 (2023)
Tang, J., et al.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184 (2023)
Tewari, A., et al.: Advances in neural rendering. In: Computer Graphics Forum, pp. 703–735 (2022)
Google Scholar
Tewari, A., et al.: Diffusion with forward models: solving stochastic inverse problems without direct supervision. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Tosi, F., et al.: How NeRFs and 3D gaussian splatting are reshaping SLAM: a survey. arXiv preprint arXiv:2402.13255 (2024)
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12619–12629 (2023)
Google Scholar
Wang, P., et al.: PF-LRM: pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024 (2023)
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Wang, Z., et al.: ProlificDreamer: high-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213 (2023)
Wu, G., et al.: 4D gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023)
Xu, D., et al.: AGG: amortized generative 3D gaussians for single image to 3D. arXiv preprint arXiv:2401.04099 (2024)
Xu, Y., et al.: DisCoScene: spatially disentangled generative radiance fields for controllable 3d-aware scene synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Xu, Y., Peng, S., Yang, C., Shen, Y., Zhou, B.: 3D-aware image synthesis via learning structural and textural representations. In: IEEE Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Xu, Y., et al.: DMV3D: denoising multi-view diffusion using 3D large reconstruction model. arXiv preprint arXiv:2311.09217 (2023)
Yang, Z., Yang, H., Pan, Z., Zhu, X., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4D gaussian splatting. arXiv preprint arXiv:2310.10642 (2023)
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3D gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: PixelNeRF: neural radiance fields from one or few images. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Zhang, K., et al.: ARF: artistic radiance fields (2022)
Google Scholar
Zhu, J., Yang, C., Zheng, K., Xu, Y., Shi, Z., Shen, Y.: Exploring sparse MoE in GANs for text-conditioned image synthesis. arXiv preprint arXiv:2309.03904 (2023)
Zielonka, W., Bagautdinov, T., Saito, S., Zollhöfer, M., Thies, J., Romero, J.: Drivable 3D Gaussian avatars. arXiv preprint arXiv:2311.08581 (2023)
Zou, Z.X., et al.: Triplane meets Gaussian splatting: fast and generalizable single-view 3D reconstruction with transformers. arXiv preprint arXiv:2312.09147 (2023)

Download references

Acknowledgement

We would like to thank Shangzhan Zhang for his help with the demo video, and Minghua Liu for assisting with the evaluation of One-2-3-45++. This project was supported by Google, Samsung, and a Swiss Postdoc Mobility fellowship.

Author information

Authors and Affiliations

Stanford University, Stanford, USA
Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen & Gordon Wetzstein
The Hong Kong University of Science and Technology, Sai Kung, Hong Kong
Zifan Shi
Shanghai AI Laboratory, Shanghai, China
Ceyuan Yang
Zhejiang University, Hangzhou, China
Sida Peng
Ant Group, Hangzhou, China
Yujun Shen

Authors

Yinghao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zifan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Wang Yifan
View author publications
You can also search for this author in PubMed Google Scholar
Hansheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ceyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sida Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Wetzstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghao Xu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6027 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y. et al. (2025). GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15073. Springer, Cham. https://doi.org/10.1007/978-3-031-72633-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-72633-0_1
Published: 22 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72632-3
Online ISBN: 978-3-031-72633-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation