6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Matteo, Bortolon; Tsesmelis, Theodore; James, Stuart; Poiesi, Fabio; Del Bue, Alessio

doi:10.1007/978-3-031-72943-0_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15110))

Included in the following conference series:

European Conference on Computer Vision

263 Accesses

Abstract

We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e. g.iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an “a priori” pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by $\mathbf {12\%}$ and translation accuracy by $\mathbf {22\%}$ on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15 fps on consumer hardware.

Project page: https://mbortolon97.github.io/6dgs/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

GGRt: Towards Pose-Free Generalizable 3D Gaussian Splatting in Real-Time

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

References

Google maps nerf integration. https://blog.google/products/maps/sustainable-immersive-maps-announcements/. Accessed 07 Mar 2024
Akenine-Mo, T., Haines, E., Hoffman, N., et al.: Real-Time Rendering. AK Peters/CRC Press (2018)
Google Scholar
Almkvist, G., Berndt, B.: Gauss, Landen, Ramanujan, the arithmetic-geometric mean, ellipses, $\pi $, and the ladies diary. Am. Math. Mon. 95(7), 585–608 (1988)
MathSciNet Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: MIP-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Google Scholar
Beckers, B., Beckers, P.: Fast and accurate view factor generation. In: FICUP (2016)
Google Scholar
Bortolon, M., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A.: Iffnerf: initialization free and fast 6DoF pose estimation from a single image and a nerf model. In: ICRA (2024)
Google Scholar
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: ECCV (2022)
Google Scholar
Chen, S., et al.: Robust dual quadric initialization for forward-translating camera movements. RAL 6(3), 4712–4719 (2021)
Google Scholar
Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: CVPR (2016)
Google Scholar
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: Camnet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)
Google Scholar
Gaudillière, V., Simon, G., Berger, M.O.: Camera relocalization with ellipsoidal abstraction of objects. In: ISMAR (2019)
Google Scholar
Gaudillière, V., Simon, G., Berger, M.O.: Perspective-2-ellipsoid: bridging the gap between object detections and 6-DoF camera pose. RAL 5(4), 5189–5196 (2020)
Google Scholar
Gay, P., Rubino, C., Bansal, V., Del Bue, A.: Probabilistic structure from motion with objects (PSFMO). In: ICCV
Google Scholar
Gay, P., Stuart, J., Del Bue, A.: Visual graphs from motion (VGFM): scene understanding with object geometry reasoning. In: ACCV (2019)
Google Scholar
He, X., Sun, J., Wang, Y., Huang, D., Bao, H., Zhou, X.: Onepose++: keypoint-free one-shot object pose estimation without cad models. In: NeurIPS (2022)
Google Scholar
Hosseinzadeh, M., Latif, Y., Pham, T., Suenderhauf, N., Reid, I.: Structure aware slam using quadrics and planes. In: ACCV (2019)
Google Scholar
Jacques, L., Masset, L., Kerschen, G.: Direction and surface sampling in ray tracing for spacecraft radiative heat transfer. Aerosp. Sci. Technol. 47 (2015)
Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. TCG 42(4) (2023)
Google Scholar
Kim, S., Min, J., Cho, M.: Transformatcher: match-to-match attention for semantic correspondence. In: CVPR (2022)
Google Scholar
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. TCG 36(4) (2017)
Google Scholar
Laidlow, T., Davison, A.J.: Simultaneous localisation and mapping with quadric surfaces. In: 3DV (2022)
Google Scholar
Lee, J., Kim, B., Cho, M.: Self-supervised equivariant learning for oriented keypoint detection. In: CVPR (2022)
Google Scholar
Lee, J., Kim, B., Kim, S., Cho, M.: Learning rotation-equivariant features for visual correspondence. In: CVPR
Google Scholar
Liao, Z., Hu, Y., Zhang, J., Qi, X., Zhang, X., Wang, W.: So-SLAM: semantic object slam with scale proportional and symmetrical texture constraints. RAL 7(2), 4008–4015 (2022)
Google Scholar
Lin, Y., et al.: Parallel inversion of neural radiance fields for robust pose estimation. In: ICRA (2023)
Google Scholar
Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: NeurIPS (2020)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV
Google Scholar
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)
Google Scholar
Maggio, D., Mario, C., Carlone, L.: VERF: runtime monitoring of pose estimation with neural radiance fields. In: ICCV (2023)
Google Scholar
Malley, T.: A shading method for computer generated images. Master’s thesis, Department of Computer Science, University of Utah (1988)
Google Scholar
Masset, L., Brüls, O., Kerschen, G.: Partition of the circle in cells of equal area and shape. Technical report, Structural Dynamics Research Group, Aerospace and Mechanical Engineering Department, University of Liege, Institut de Mecanique et G’enie Civil (B52/3) (2011)
Google Scholar
Meng, Y., Zhou, B.: Ellipsoid slam with novel object initialization. In: CASE (2022)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Moreau, A., Piasco, N., Bennehar, M., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Crossfire: camera relocalization on self-supervised features from an implicit representation. In: ICCV (2023)
Google Scholar
Oquab, M., et al.: DINOv2: learning robust visual features without supervision (2023)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Shan, M., Feng, Q., Jau, Y.Y., Atanasov, N.: ELLIPSDF: joint object pose and shape optimization with a bi-level ellipsoid and signed distance function description. In: ICCV (2021)
Google Scholar
Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: ICML (2018)
Google Scholar
Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)
Article MathSciNet Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)
Google Scholar
Tombari, F., Salti, S., di Stefano, L.: Unique signatures of histograms for local surface description. In: ECCV (2010)
Google Scholar
Tsesmelis, T., Hasan, I., Cristani, M., Bue, A.D., Galasso, F.: RGBD2lux: dense light intensity estimation with an RGBD sensor. In: WACV (2018)
Google Scholar
Wang, A., Kortylewski, A., Yuille, A.: Nemo: neural mesh models of contrastive features for robust 3D pose estimation. In: ICLR (2020)
Google Scholar
Wang, A., Wang, P., Sun, J., Kortylewski, A., Yuille, A.: Voge: a differentiable volume renderer using Gaussian ellipsoids for analysis-by-synthesis. In: ICLR (2022)
Google Scholar
Xie, T., et al.: PhysGaussian: physics-integrated 3D Gaussians for generative dynamics. In: CVPR (2024)
Google Scholar
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)
Google Scholar
Zins, M., Simon, G., Berger, M.O.: OA-SLAM: leveraging objects for camera relocalization in visual SLAM. In: ISMAR (2022)
Google Scholar

Download references

Acknowledgments

This work is part of the RePAIR project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 964854. This work has also received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092043, project AGILEHAND (Smart Grading, Handling and Packaging Solutions for Soft and Deformable Products in Agile and Reconfigurable Lines). We thank S. Fiorini for the discussion on the optimizers.

Author information

Authors and Affiliations

PAVIS, Fondazione Istituto Italiano di Tecnologia (IIT), Genoa, Italy
Bortolon Matteo, Theodore Tsesmelis, Stuart James & Alessio Del Bue
TeV, Fondazione Bruno Kessler (FBK), Trento, Italy
Bortolon Matteo & Fabio Poiesi
Università di Trento, Trento, Italy
Bortolon Matteo
Durham University, Durham, UK
Stuart James

Authors

Bortolon Matteo
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Tsesmelis
View author publications
You can also search for this author in PubMed Google Scholar
Stuart James
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Poiesi
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Del Bue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bortolon Matteo .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 102365 KB)

Supplementary material 2 (pdf 8570 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matteo, B., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A. (2025). 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15110. Springer, Cham. https://doi.org/10.1007/978-3-031-72943-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-72943-0_24
Published: 29 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72942-3
Online ISBN: 978-3-031-72943-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

GGRt: Towards Pose-Free Generalizable 3D Gaussian Splatting in Real-Time

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 2 (pdf 8570 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

GGRt: Towards Pose-Free Generalizable 3D Gaussian Splatting in Real-Time

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 2 (pdf 8570 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation