Abstract
We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.
M. Li and S. Liu—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: Codeslam-learning a compact, optimisable representation for dense visual slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2560–2568 (2018)
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans. Rob. 37(6), 1874–1890 (2021)
Chung, C.M., et al.: Orbeez-slam: a real-time monocular visual slam with orb features and nerf-realized mapping. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9400–9406. IEEE (2023)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Deng, T., et al.: Compact 3D gaussian splatting for dense visual slam. arXiv preprint arXiv:2403.11247 (2024)
Deng, T., et al.: Plgslam: progressive neural scene representation with local to global bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19657–19666 (2024)
Deng, T., et al.: Incremental joint learning of depth, pose and implicit scene representation on monocular camera in large-scale scenes. arXiv preprint arXiv:2404.06050 (2024)
Deng, T., et al.: Neslam: neural implicit mapping and self-supervised feature tracking with depth completion and denoising. arXiv preprint arXiv:2403.20034 (2024)
Deng, T., Xie, H., Wang, J., Chen, W.: Long-term visual simultaneous localization and mapping: Using a Bayesian persistence filter-based global map prediction. IEEE Rob. Autom. Mag. 30(1), 36–49 (2023)
Haghighi, Y., Kumar, S., Thiran, J.P., Van Gool, L.: Neural implicit dense semantic slam. arXiv preprint arXiv:2304.14560 (2023)
He, J., Li, M., Wang, Y., Wang, H.: Ovd-slam: an online visual slam for dynamic environments. IEEE Sens. J. (2023)
Hermans, A., Floros, G., Leibe, B.: Dense 3D semantic mapping of indoor scenes from RGB-D images. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2631–2638. IEEE (2014)
Huang, H., Li, L., Cheng, H., Yeung, S.K.: Photo-slam: real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras. arXiv preprint arXiv:2311.16728 (2023)
Johari, M.M., Carta, C., Fleuret, F.: Eslam: efficient dense slam system based on hybrid representation of signed distance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17408–17419 (2023)
Keetha, N., et al.: Splatam: splat, track & map 3D gaussians for dense RGB-D slam. arXiv preprint arXiv:2312.02126 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)
Kong, X., Liu, S., Taher, M., Davison, A.J.: vmap: Vectorised object mapping for neural field slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 952–961 (2023)
Li, K., Niemeyer, M., Navab, N., Tombari, F.: DNS slam: dense neural semantic-informed slam. arXiv preprint arXiv:2312.00204 (2023)
Li, M., He, J., Jiang, G., Wang, H.: DDN-slam: real-time dense dynamic neural implicit slam with joint semantic encoding. arXiv preprint arXiv:2401.01545 (2024)
Li, M., He, J., Wang, Y., Wang, H.: End-to-end RGB-D slam with multi-MLPs dense neural implicit representations. IEEE Rob. Autom. Lett. 8(11), 7138–7145 (2023)
Liu, S., et al.: Structure gaussian slam with Manhattan world hypothesis. arXiv preprint arXiv:2405.20031 (2024)
Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: tracking by persistent dynamic view synthesis. In: 3DV (2024)
Matsuki, H., Murai, R., Kelly, P.H., Davison, A.J.: Gaussian splatting slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18039–18048 (2024)
McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level slam. In: 2018 international conference on 3D vision (3DV), pp. 32–41. IEEE (2018)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Narita, G., Seno, T., Ishikawa, T., Kaji, Y.: Panopticfusion: online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212. IEEE (2019)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: dense tracking and mapping in real-time. In: 2011 International Conference on Computer Vision, pp. 2320–2327. IEEE (2011)
Qin, T., Li, P., Shen, S.: Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Rob. 34(4), 1004–1020 (2018)
Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1689–1696. IEEE (2020)
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: Implicit mapping and positioning in real-time. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238 (2021)
Sucar, E., Wada, K., Davison, A.: Nodeslam: neural object descriptors for multi-view shape reconstruction. In: 2020 International Conference on 3D Vision (3DV), pp. 949–958. IEEE (2020)
Teed, Z., Deng, J.: Droid-slam: deep visual slam for monocular, stereo, and RGB-D cameras. In: Advances in Neural Information Processing Systems, vol. 34, pp. 16558–16569 (2021)
Wang, H., Wang, J., Agapito, L.: Co-slam: joint coordinate and sparse parametric encodings for neural real-time slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13293–13302 (2023)
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. In: Proceedings of Robotics: Science and Systems. Robotics: Science and Systems (2015)
Yan, C., et al.: Gs-slam: dense visual slam with 3d gaussian splatting. arXiv preprint arXiv:2311.11700 (2023)
Yang, X., Li, H., Zhai, H., Ming, Y., Liu, Y., Zhang, G.: Vox-fusion: dense tracking and mapping with voxel-based neural implicit representation. In: 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 499–507. IEEE (2022)
Yeshwanth, C., Liu, Y.C., Nießner, M., Dai, A.: Scannet++: a high-fidelity dataset of 3D indoor scenes. In: Proceedings of the International Conference on Computer Vision (ICCV) (2023)
Yugay, V., Li, Y., Gevers, T., Oswald, M.R.: Gaussian-slam: photo-realistic dense slam with gaussian splatting. arXiv preprint arXiv:2312.10070 (2023)
Zhang, Y., Tosi, F., Mattoccia, S., Poggi, M.: Go-slam: global optimization for consistent 3d instant reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3727–3737 (2023)
Zhou, H., et al.: Mod-slam: monocular dense mapping for unbounded 3D scene reconstruction. arXiv preprint arXiv:2402.03762 (2024)
Zhu, S., et al.: SNI-slam: semantic neural implicit slam. arXiv preprint arXiv:2311.11016 (2023)
Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (mp4 36317 KB)
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, M. et al. (2025). SGS-SLAM: Semantic Gaussian Splatting for Neural Dense SLAM. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15089. Springer, Cham. https://doi.org/10.1007/978-3-031-72751-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-72751-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72750-4
Online ISBN: 978-3-031-72751-1
eBook Packages: Computer ScienceComputer Science (R0)