Abstract
We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e. g.iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an “a priori” pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by \(\mathbf {12\%}\) and translation accuracy by \(\mathbf {22\%}\) on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15 fps on consumer hardware.
Project page: https://mbortolon97.github.io/6dgs/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Google maps nerf integration. https://blog.google/products/maps/sustainable-immersive-maps-announcements/. Accessed 07 Mar 2024
Akenine-Mo, T., Haines, E., Hoffman, N., et al.: Real-Time Rendering. AK Peters/CRC Press (2018)
Almkvist, G., Berndt, B.: Gauss, Landen, Ramanujan, the arithmetic-geometric mean, ellipses, \(\pi \), and the ladies diary. Am. Math. Mon. 95(7), 585–608 (1988)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: MIP-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Beckers, B., Beckers, P.: Fast and accurate view factor generation. In: FICUP (2016)
Bortolon, M., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A.: Iffnerf: initialization free and fast 6DoF pose estimation from a single image and a nerf model. In: ICRA (2024)
Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: tensorial radiance fields. In: ECCV (2022)
Chen, S., et al.: Robust dual quadric initialization for forward-translating camera movements. RAL 6(3), 4712–4719 (2021)
Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: CVPR (2016)
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: Camnet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)
Gaudillière, V., Simon, G., Berger, M.O.: Camera relocalization with ellipsoidal abstraction of objects. In: ISMAR (2019)
Gaudillière, V., Simon, G., Berger, M.O.: Perspective-2-ellipsoid: bridging the gap between object detections and 6-DoF camera pose. RAL 5(4), 5189–5196 (2020)
Gay, P., Rubino, C., Bansal, V., Del Bue, A.: Probabilistic structure from motion with objects (PSFMO). In: ICCV
Gay, P., Stuart, J., Del Bue, A.: Visual graphs from motion (VGFM): scene understanding with object geometry reasoning. In: ACCV (2019)
He, X., Sun, J., Wang, Y., Huang, D., Bao, H., Zhou, X.: Onepose++: keypoint-free one-shot object pose estimation without cad models. In: NeurIPS (2022)
Hosseinzadeh, M., Latif, Y., Pham, T., Suenderhauf, N., Reid, I.: Structure aware slam using quadrics and planes. In: ACCV (2019)
Jacques, L., Masset, L., Kerschen, G.: Direction and surface sampling in ray tracing for spacecraft radiative heat transfer. Aerosp. Sci. Technol. 47 (2015)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. TCG 42(4) (2023)
Kim, S., Min, J., Cho, M.: Transformatcher: match-to-match attention for semantic correspondence. In: CVPR (2022)
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. TCG 36(4) (2017)
Laidlow, T., Davison, A.J.: Simultaneous localisation and mapping with quadric surfaces. In: 3DV (2022)
Lee, J., Kim, B., Cho, M.: Self-supervised equivariant learning for oriented keypoint detection. In: CVPR (2022)
Lee, J., Kim, B., Kim, S., Cho, M.: Learning rotation-equivariant features for visual correspondence. In: CVPR
Liao, Z., Hu, Y., Zhang, J., Qi, X., Zhang, X., Wang, W.: So-SLAM: semantic object slam with scale proportional and symmetrical texture constraints. RAL 7(2), 4008–4015 (2022)
Lin, Y., et al.: Parallel inversion of neural radiance fields for robust pose estimation. In: ICRA (2023)
Liu, L., Gu, J., Lin, K.Z., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. In: NeurIPS (2020)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)
Maggio, D., Mario, C., Carlone, L.: VERF: runtime monitoring of pose estimation with neural radiance fields. In: ICCV (2023)
Malley, T.: A shading method for computer generated images. Master’s thesis, Department of Computer Science, University of Utah (1988)
Masset, L., Brüls, O., Kerschen, G.: Partition of the circle in cells of equal area and shape. Technical report, Structural Dynamics Research Group, Aerospace and Mechanical Engineering Department, University of Liege, Institut de Mecanique et G’enie Civil (B52/3) (2011)
Meng, Y., Zhou, B.: Ellipsoid slam with novel object initialization. In: CASE (2022)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Moreau, A., Piasco, N., Bennehar, M., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Crossfire: camera relocalization on self-supervised features from an implicit representation. In: ICCV (2023)
Oquab, M., et al.: DINOv2: learning robust visual features without supervision (2023)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: CVPR (2020)
Shan, M., Feng, Q., Jau, Y.Y., Atanasov, N.: ELLIPSDF: joint object pose and shape optimization with a bi-level ellipsoid and signed distance function description. In: ICCV (2021)
Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: ICML (2018)
Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)
Tombari, F., Salti, S., di Stefano, L.: Unique signatures of histograms for local surface description. In: ECCV (2010)
Tsesmelis, T., Hasan, I., Cristani, M., Bue, A.D., Galasso, F.: RGBD2lux: dense light intensity estimation with an RGBD sensor. In: WACV (2018)
Wang, A., Kortylewski, A., Yuille, A.: Nemo: neural mesh models of contrastive features for robust 3D pose estimation. In: ICLR (2020)
Wang, A., Wang, P., Sun, J., Kortylewski, A., Yuille, A.: Voge: a differentiable volume renderer using Gaussian ellipsoids for analysis-by-synthesis. In: ICLR (2022)
Xie, T., et al.: PhysGaussian: physics-integrated 3D Gaussians for generative dynamics. In: CVPR (2024)
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)
Zins, M., Simon, G., Berger, M.O.: OA-SLAM: leveraging objects for camera relocalization in visual SLAM. In: ISMAR (2022)
Acknowledgments
This work is part of the RePAIR project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 964854. This work has also received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092043, project AGILEHAND (Smart Grading, Handling and Packaging Solutions for Soft and Deformable Products in Agile and Reconfigurable Lines). We thank S. Fiorini for the discussion on the optimizers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 102365 KB)
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Matteo, B., Tsesmelis, T., James, S., Poiesi, F., Del Bue, A. (2025). 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15110. Springer, Cham. https://doi.org/10.1007/978-3-031-72943-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-72943-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72942-3
Online ISBN: 978-3-031-72943-0
eBook Packages: Computer ScienceComputer Science (R0)