SGS-SLAM: Semantic Gaussian Splatting for Neural Dense SLAM

Li, Mingrui; Liu, Shuhong; Zhou, Heng; Zhu, Guohao; Cheng, Na; Deng, Tianchen; Wang, Hongyu

doi:10.1007/978-3-031-72751-1_10

Mingrui Li¹³,
Shuhong Liu¹⁴,
Heng Zhou¹⁵,
Guohao Zhu¹⁴,
Na Cheng¹³,
Tianchen Deng¹⁶ &
…
Hongyu Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15089))

Included in the following conference series:

European Conference on Computer Vision

786 Accesses
5 Citations

Abstract

We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.

M. Li and S. Liu—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

$$I^2$$ -SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

References

Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: Codeslam-learning a compact, optimisable representation for dense visual slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2560–2568 (2018)
Google Scholar
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Rob. 37(6), 1874–1890 (2021)
Article Google Scholar
Chung, C.M., et al.: Orbeez-slam: a real-time monocular visual slam with orb features and nerf-realized mapping. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9400–9406. IEEE (2023)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)
Article Google Scholar
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Deng, T., et al.: Compact 3D gaussian splatting for dense visual slam. arXiv preprint arXiv:2403.11247 (2024)
Deng, T., et al.: Plgslam: progressive neural scene representation with local to global bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19657–19666 (2024)
Google Scholar
Deng, T., et al.: Incremental joint learning of depth, pose and implicit scene representation on monocular camera in large-scale scenes. arXiv preprint arXiv:2404.06050 (2024)
Deng, T., et al.: Neslam: neural implicit mapping and self-supervised feature tracking with depth completion and denoising. arXiv preprint arXiv:2403.20034 (2024)
Deng, T., Xie, H., Wang, J., Chen, W.: Long-term visual simultaneous localization and mapping: Using a Bayesian persistence filter-based global map prediction. IEEE Rob. Autom. Mag. 30(1), 36–49 (2023)
Article Google Scholar
Haghighi, Y., Kumar, S., Thiran, J.P., Van Gool, L.: Neural implicit dense semantic slam. arXiv preprint arXiv:2304.14560 (2023)
He, J., Li, M., Wang, Y., Wang, H.: Ovd-slam: an online visual slam for dynamic environments. IEEE Sens. J. (2023)
Google Scholar
Hermans, A., Floros, G., Leibe, B.: Dense 3D semantic mapping of indoor scenes from RGB-D images. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2631–2638. IEEE (2014)
Google Scholar
Huang, H., Li, L., Cheng, H., Yeung, S.K.: Photo-slam: real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras. arXiv preprint arXiv:2311.16728 (2023)
Johari, M.M., Carta, C., Fleuret, F.: Eslam: efficient dense slam system based on hybrid representation of signed distance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17408–17419 (2023)
Google Scholar
Keetha, N., et al.: Splatam: splat, track & map 3D gaussians for dense RGB-D slam. arXiv preprint arXiv:2312.02126 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)
Google Scholar
Kong, X., Liu, S., Taher, M., Davison, A.J.: vmap: Vectorised object mapping for neural field slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 952–961 (2023)
Google Scholar
Li, K., Niemeyer, M., Navab, N., Tombari, F.: DNS slam: dense neural semantic-informed slam. arXiv preprint arXiv:2312.00204 (2023)
Li, M., He, J., Jiang, G., Wang, H.: DDN-slam: real-time dense dynamic neural implicit slam with joint semantic encoding. arXiv preprint arXiv:2401.01545 (2024)
Li, M., He, J., Wang, Y., Wang, H.: End-to-end RGB-D slam with multi-MLPs dense neural implicit representations. IEEE Rob. Autom. Lett. 8(11), 7138–7145 (2023)
Article Google Scholar
Liu, S., et al.: Structure gaussian slam with Manhattan world hypothesis. arXiv preprint arXiv:2405.20031 (2024)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)
Google Scholar
Matsuki, H., Murai, R., Kelly, P.H., Davison, A.J.: Gaussian splatting slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18039–18048 (2024)
Google Scholar
McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level slam. In: 2018 international conference on 3D vision (3DV), pp. 32–41. IEEE (2018)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)
Article Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Article Google Scholar
Narita, G., Seno, T., Ishikawa, T., Kaji, Y.: Panopticfusion: online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212. IEEE (2019)
Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp. 2320–2327. IEEE (2011)
Google Scholar
Qin, T., Li, P., Shen, S.: Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Rob. 34(4), 1004–1020 (2018)
Article Google Scholar
Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1689–1696. IEEE (2020)
Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)
Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: Implicit mapping and positioning in real-time. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238 (2021)
Google Scholar
Sucar, E., Wada, K., Davison, A.: Nodeslam: neural object descriptors for multi-view shape reconstruction. In: 2020 International Conference on 3D Vision (3DV), pp. 949–958. IEEE (2020)
Google Scholar
Teed, Z., Deng, J.: Droid-slam: deep visual slam for monocular, stereo, and RGB-D cameras. In: Advances in Neural Information Processing Systems, vol. 34, pp. 16558–16569 (2021)
Google Scholar
Wang, H., Wang, J., Agapito, L.: Co-slam: joint coordinate and sparse parametric encodings for neural real-time slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13293–13302 (2023)
Google Scholar
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. In: Proceedings of Robotics: Science and Systems. Robotics: Science and Systems (2015)
Google Scholar
Yan, C., et al.: Gs-slam: dense visual slam with 3d gaussian splatting. arXiv preprint arXiv:2311.11700 (2023)
Yang, X., Li, H., Zhai, H., Ming, Y., Liu, Y., Zhang, G.: Vox-fusion: dense tracking and mapping with voxel-based neural implicit representation. In: 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 499–507. IEEE (2022)
Google Scholar
Yeshwanth, C., Liu, Y.C., Nießner, M., Dai, A.: Scannet++: a high-fidelity dataset of 3D indoor scenes. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Yugay, V., Li, Y., Gevers, T., Oswald, M.R.: Gaussian-slam: photo-realistic dense slam with gaussian splatting. arXiv preprint arXiv:2312.10070 (2023)
Zhang, Y., Tosi, F., Mattoccia, S., Poggi, M.: Go-slam: global optimization for consistent 3d instant reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3727–3737 (2023)
Google Scholar
Zhou, H., et al.: Mod-slam: monocular dense mapping for unbounded 3D scene reconstruction. arXiv preprint arXiv:2402.03762 (2024)
Zhu, S., et al.: SNI-slam: semantic neural implicit slam. arXiv preprint arXiv:2311.11016 (2023)
Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Dalian University of Technology, Dalian, China
Mingrui Li, Na Cheng & Hongyu Wang
The University of Tokyo, Bunkyo City, Japan
Shuhong Liu & Guohao Zhu
Columbia University, New York, USA
Heng Zhou
Shanghai Jiao Tong University, Shanghai, China
Tianchen Deng

Authors

Mingrui Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Heng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Guohao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Na Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Tianchen Deng
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongyu Wang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 25179 KB)

Supplementary material 2 (mp4 36317 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M. et al. (2025). SGS-SLAM: Semantic Gaussian Splatting for Neural Dense SLAM. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15089. Springer, Cham. https://doi.org/10.1007/978-3-031-72751-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-72751-1_10
Published: 26 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72750-4
Online ISBN: 978-3-031-72751-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics