Skip to main content

SGS-SLAM: Semantic Gaussian Splatting for Neural Dense SLAM

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15089))

Included in the following conference series:

Abstract

We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.

M. Li and S. Liu—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: Codeslam-learning a compact, optimisable representation for dense visual slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2560–2568 (2018)

    Google Scholar 

  2. Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Rob. 37(6), 1874–1890 (2021)

    Article  Google Scholar 

  3. Chung, C.M., et al.: Orbeez-slam: a real-time monocular visual slam with orb features and nerf-realized mapping. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9400–9406. IEEE (2023)

    Google Scholar 

  4. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  5. Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)

    Article  Google Scholar 

  6. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  7. Deng, T., et al.: Compact 3D gaussian splatting for dense visual slam. arXiv preprint arXiv:2403.11247 (2024)

  8. Deng, T., et al.: Plgslam: progressive neural scene representation with local to global bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19657–19666 (2024)

    Google Scholar 

  9. Deng, T., et al.: Incremental joint learning of depth, pose and implicit scene representation on monocular camera in large-scale scenes. arXiv preprint arXiv:2404.06050 (2024)

  10. Deng, T., et al.: Neslam: neural implicit mapping and self-supervised feature tracking with depth completion and denoising. arXiv preprint arXiv:2403.20034 (2024)

  11. Deng, T., Xie, H., Wang, J., Chen, W.: Long-term visual simultaneous localization and mapping: Using a Bayesian persistence filter-based global map prediction. IEEE Rob. Autom. Mag. 30(1), 36–49 (2023)

    Article  Google Scholar 

  12. Haghighi, Y., Kumar, S., Thiran, J.P., Van Gool, L.: Neural implicit dense semantic slam. arXiv preprint arXiv:2304.14560 (2023)

  13. He, J., Li, M., Wang, Y., Wang, H.: Ovd-slam: an online visual slam for dynamic environments. IEEE Sens. J. (2023)

    Google Scholar 

  14. Hermans, A., Floros, G., Leibe, B.: Dense 3D semantic mapping of indoor scenes from RGB-D images. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2631–2638. IEEE (2014)

    Google Scholar 

  15. Huang, H., Li, L., Cheng, H., Yeung, S.K.: Photo-slam: real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras. arXiv preprint arXiv:2311.16728 (2023)

  16. Johari, M.M., Carta, C., Fleuret, F.: Eslam: efficient dense slam system based on hybrid representation of signed distance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17408–17419 (2023)

    Google Scholar 

  17. Keetha, N., et al.: Splatam: splat, track & map 3D gaussians for dense RGB-D slam. arXiv preprint arXiv:2312.02126 (2023)

  18. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)

    Google Scholar 

  19. Kong, X., Liu, S., Taher, M., Davison, A.J.: vmap: Vectorised object mapping for neural field slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 952–961 (2023)

    Google Scholar 

  20. Li, K., Niemeyer, M., Navab, N., Tombari, F.: DNS slam: dense neural semantic-informed slam. arXiv preprint arXiv:2312.00204 (2023)

  21. Li, M., He, J., Jiang, G., Wang, H.: DDN-slam: real-time dense dynamic neural implicit slam with joint semantic encoding. arXiv preprint arXiv:2401.01545 (2024)

  22. Li, M., He, J., Wang, Y., Wang, H.: End-to-end RGB-D slam with multi-MLPs dense neural implicit representations. IEEE Rob. Autom. Lett. 8(11), 7138–7145 (2023)

    Article  Google Scholar 

  23. Liu, S., et al.: Structure gaussian slam with Manhattan world hypothesis. arXiv preprint arXiv:2405.20031 (2024)

  24. Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)

    Google Scholar 

  25. Matsuki, H., Murai, R., Kelly, P.H., Davison, A.J.: Gaussian splatting slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18039–18048 (2024)

    Google Scholar 

  26. McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level slam. In: 2018 international conference on 3D vision (3DV), pp. 32–41. IEEE (2018)

    Google Scholar 

  27. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  28. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)

    Article  Google Scholar 

  29. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  30. Narita, G., Seno, T., Ishikawa, T., Kaji, Y.: Panopticfusion: online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212. IEEE (2019)

    Google Scholar 

  31. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp. 2320–2327. IEEE (2011)

    Google Scholar 

  32. Qin, T., Li, P., Shen, S.: Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Rob. 34(4), 1004–1020 (2018)

    Article  Google Scholar 

  33. Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1689–1696. IEEE (2020)

    Google Scholar 

  34. Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)

    Google Scholar 

  35. Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)

  36. Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: Implicit mapping and positioning in real-time. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238 (2021)

    Google Scholar 

  37. Sucar, E., Wada, K., Davison, A.: Nodeslam: neural object descriptors for multi-view shape reconstruction. In: 2020 International Conference on 3D Vision (3DV), pp. 949–958. IEEE (2020)

    Google Scholar 

  38. Teed, Z., Deng, J.: Droid-slam: deep visual slam for monocular, stereo, and RGB-D cameras. In: Advances in Neural Information Processing Systems, vol. 34, pp. 16558–16569 (2021)

    Google Scholar 

  39. Wang, H., Wang, J., Agapito, L.: Co-slam: joint coordinate and sparse parametric encodings for neural real-time slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13293–13302 (2023)

    Google Scholar 

  40. Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. In: Proceedings of Robotics: Science and Systems. Robotics: Science and Systems (2015)

    Google Scholar 

  41. Yan, C., et al.: Gs-slam: dense visual slam with 3d gaussian splatting. arXiv preprint arXiv:2311.11700 (2023)

  42. Yang, X., Li, H., Zhai, H., Ming, Y., Liu, Y., Zhang, G.: Vox-fusion: dense tracking and mapping with voxel-based neural implicit representation. In: 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 499–507. IEEE (2022)

    Google Scholar 

  43. Yeshwanth, C., Liu, Y.C., Nießner, M., Dai, A.: Scannet++: a high-fidelity dataset of 3D indoor scenes. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

  44. Yugay, V., Li, Y., Gevers, T., Oswald, M.R.: Gaussian-slam: photo-realistic dense slam with gaussian splatting. arXiv preprint arXiv:2312.10070 (2023)

  45. Zhang, Y., Tosi, F., Mattoccia, S., Poggi, M.: Go-slam: global optimization for consistent 3d instant reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3727–3737 (2023)

    Google Scholar 

  46. Zhou, H., et al.: Mod-slam: monocular dense mapping for unbounded 3D scene reconstruction. arXiv preprint arXiv:2402.03762 (2024)

  47. Zhu, S., et al.: SNI-slam: semantic neural implicit slam. arXiv preprint arXiv:2311.11016 (2023)

  48. Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyu Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 25179 KB)

Supplementary material 2 (mp4 36317 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M. et al. (2025). SGS-SLAM: Semantic Gaussian Splatting for Neural Dense SLAM. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15089. Springer, Cham. https://doi.org/10.1007/978-3-031-72751-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72751-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72750-4

  • Online ISBN: 978-3-031-72751-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics