Skip to main content

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e. g.iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an “a priori” pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by \(\mathbf {12\%}\) and translation accuracy by \(\mathbf {22\%}\) on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15 fps on consumer hardware.

Project page: https://mbortolon97.github.io/6dgs/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Google maps nerf integration. https://blog.google/products/maps/sustainable-immersive-maps-announcements/. Accessed 07 Mar 2024

  2. Akenine-Mo, T., Haines, E., Hoffman, N., et al.: Real-Time Rendering. AK Peters/CRC Press (2018)

    Google Scholar 

  3. Almkvist, G., Berndt, B.: Gauss, Landen, Ramanujan, the arithmetic-geometric mean, ellipses, \(\pi \), and the ladies diary. Am. Math. Mon. 95(7), 585–608 (1988)

    MathSciNet  Google Scholar 

  4. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: MIP-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)

    Google Scholar 

  5. Beckers, B., Beckers, P.: Fast and accurate view factor generation. In: FICUP (2016)

    Google Scholar 

  6. Bortolon, M., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A.: Iffnerf: initialization free and fast 6DoF pose estimation from a single image and a nerf model. In: ICRA (2024)

    Google Scholar 

  7. Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: ECCV (2022)

    Google Scholar 

  8. Chen, S., et al.: Robust dual quadric initialization for forward-translating camera movements. RAL 6(3), 4712–4719 (2021)

    Google Scholar 

  9. Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: CVPR (2016)

    Google Scholar 

  10. Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: Camnet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)

    Google Scholar 

  11. Gaudillière, V., Simon, G., Berger, M.O.: Camera relocalization with ellipsoidal abstraction of objects. In: ISMAR (2019)

    Google Scholar 

  12. Gaudillière, V., Simon, G., Berger, M.O.: Perspective-2-ellipsoid: bridging the gap between object detections and 6-DoF camera pose. RAL 5(4), 5189–5196 (2020)

    Google Scholar 

  13. Gay, P., Rubino, C., Bansal, V., Del Bue, A.: Probabilistic structure from motion with objects (PSFMO). In: ICCV

    Google Scholar 

  14. Gay, P., Stuart, J., Del Bue, A.: Visual graphs from motion (VGFM): scene understanding with object geometry reasoning. In: ACCV (2019)

    Google Scholar 

  15. He, X., Sun, J., Wang, Y., Huang, D., Bao, H., Zhou, X.: Onepose++: keypoint-free one-shot object pose estimation without cad models. In: NeurIPS (2022)

    Google Scholar 

  16. Hosseinzadeh, M., Latif, Y., Pham, T., Suenderhauf, N., Reid, I.: Structure aware slam using quadrics and planes. In: ACCV (2019)

    Google Scholar 

  17. Jacques, L., Masset, L., Kerschen, G.: Direction and surface sampling in ray tracing for spacecraft radiative heat transfer. Aerosp. Sci. Technol. 47 (2015)

    Google Scholar 

  18. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. TCG 42(4) (2023)

    Google Scholar 

  19. Kim, S., Min, J., Cho, M.: Transformatcher: match-to-match attention for semantic correspondence. In: CVPR (2022)

    Google Scholar 

  20. Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. TCG 36(4) (2017)

    Google Scholar 

  21. Laidlow, T., Davison, A.J.: Simultaneous localisation and mapping with quadric surfaces. In: 3DV (2022)

    Google Scholar 

  22. Lee, J., Kim, B., Cho, M.: Self-supervised equivariant learning for oriented keypoint detection. In: CVPR (2022)

    Google Scholar 

  23. Lee, J., Kim, B., Kim, S., Cho, M.: Learning rotation-equivariant features for visual correspondence. In: CVPR

    Google Scholar 

  24. Liao, Z., Hu, Y., Zhang, J., Qi, X., Zhang, X., Wang, W.: So-SLAM: semantic object slam with scale proportional and symmetrical texture constraints. RAL 7(2), 4008–4015 (2022)

    Google Scholar 

  25. Lin, Y., et al.: Parallel inversion of neural radiance fields for robust pose estimation. In: ICRA (2023)

    Google Scholar 

  26. Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: NeurIPS (2020)

    Google Scholar 

  27. Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV

    Google Scholar 

  28. Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)

    Google Scholar 

  29. Maggio, D., Mario, C., Carlone, L.: VERF: runtime monitoring of pose estimation with neural radiance fields. In: ICCV (2023)

    Google Scholar 

  30. Malley, T.: A shading method for computer generated images. Master’s thesis, Department of Computer Science, University of Utah (1988)

    Google Scholar 

  31. Masset, L., Brüls, O., Kerschen, G.: Partition of the circle in cells of equal area and shape. Technical report, Structural Dynamics Research Group, Aerospace and Mechanical Engineering Department, University of Liege, Institut de Mecanique et G’enie Civil (B52/3) (2011)

    Google Scholar 

  32. Meng, Y., Zhou, B.: Ellipsoid slam with novel object initialization. In: CASE (2022)

    Google Scholar 

  33. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  34. Moreau, A., Piasco, N., Bennehar, M., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Crossfire: camera relocalization on self-supervised features from an implicit representation. In: ICCV (2023)

    Google Scholar 

  35. Oquab, M., et al.: DINOv2: learning robust visual features without supervision (2023)

    Google Scholar 

  36. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  37. Shan, M., Feng, Q., Jau, Y.Y., Atanasov, N.: ELLIPSDF: joint object pose and shape optimization with a bi-level ellipsoid and signed distance function description. In: ICCV (2021)

    Google Scholar 

  38. Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: ICML (2018)

    Google Scholar 

  39. Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)

    Article  MathSciNet  Google Scholar 

  40. Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)

    Google Scholar 

  41. Tombari, F., Salti, S., di Stefano, L.: Unique signatures of histograms for local surface description. In: ECCV (2010)

    Google Scholar 

  42. Tsesmelis, T., Hasan, I., Cristani, M., Bue, A.D., Galasso, F.: RGBD2lux: dense light intensity estimation with an RGBD sensor. In: WACV (2018)

    Google Scholar 

  43. Wang, A., Kortylewski, A., Yuille, A.: Nemo: neural mesh models of contrastive features for robust 3D pose estimation. In: ICLR (2020)

    Google Scholar 

  44. Wang, A., Wang, P., Sun, J., Kortylewski, A., Yuille, A.: Voge: a differentiable volume renderer using Gaussian ellipsoids for analysis-by-synthesis. In: ICLR (2022)

    Google Scholar 

  45. Xie, T., et al.: PhysGaussian: physics-integrated 3D Gaussians for generative dynamics. In: CVPR (2024)

    Google Scholar 

  46. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)

    Google Scholar 

  47. Zins, M., Simon, G., Berger, M.O.: OA-SLAM: leveraging objects for camera relocalization in visual SLAM. In: ISMAR (2022)

    Google Scholar 

Download references

Acknowledgments

This work is part of the RePAIR project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 964854. This work has also received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092043, project AGILEHAND (Smart Grading, Handling and Packaging Solutions for Soft and Deformable Products in Agile and Reconfigurable Lines). We thank S. Fiorini for the discussion on the optimizers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bortolon Matteo .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 102365 KB)

Supplementary material 2 (pdf 8570 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Matteo, B., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A. (2025). 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15110. Springer, Cham. https://doi.org/10.1007/978-3-031-72943-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72943-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72942-3

  • Online ISBN: 978-3-031-72943-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics