skip to main content
10.1145/3664647.3680739acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

GS3LAM: Gaussian Semantic Splatting SLAM

Published: 28 October 2024 Publication History

Abstract

Recently, the multi-modal fusion of RGB, depth, and semantics has shown great potential in the domain of dense Simultaneous Localization and Mapping (SLAM), as known as dense semantic SLAM. Yet a prerequisite for generating consistent and continuous semantic maps is the availability of dense, efficient, and scalable scene representations. To date, existing semantic SLAM systems based on explicit scene representations (points/meshes/surfels) are limited by their resolutions and inabilities to predict unknown areas, thus failing to generate dense maps. Contrarily, a few implicit scene representations (Neural Radiance Fields) to deal with these problems rely on time-consuming ray tracing-based volume rendering technique, which cannot meet the real-time rendering requirements of SLAM. Fortunately, the Gaussian Splatting scene representation has recently emerged, which inherits the efficiency and scalability of point/surfel representations while smoothly represents geometric structures in a continuous manner, showing promise in addressing the aforementioned challenges. To this end, we propose GS3LAM, a Gaussian Semantic Splatting SLAM framework, which takes multimodal data as input and can render consistent, continuous dense semantic maps in real-time. To fuse multimodal data, GS3LAM models the scene as a Semantic Gaussian Field (SG-Field), and jointly optimizes camera poses and the field by establishing error constraints between observed and predicted data. Furthermore, a Depth-adaptive Scale Regularization (DSR) scheme is proposed to tackle the problem of misalignment between scale-invariant Gaussians and geometric surfaces within the SG-Field. To mitigate the forgetting phenomenon, we propose an effective Random Sampling-based Keyframe Mapping (RSKM) strategy, which exhibits notable superiority over local covisibility optimization strategies commonly utilized in 3DGS-based SLAM systems. Extensive experiments conducted on the benchmark datasets reveal that compared with state-of-the-art competitors, GS3 LAM demonstrates increased tracking robustness, superior real-time rendering quality, and enhanced semantic reconstruction precision. To make the results reproducible, the source code is available at https://github.com/lif314/GS3LAM.

Supplemental Material

MP4 File - GS$^3$LAM: Gaussian Semantic Splatting SLAM
Video presentation about Gaussian Semantic Splatting SLAM

References

[1]
Yun Chang, Yulun Tian, Jonathan P. How, and Luca Carlone. 2021. Kimera-Multi: A System for Distributed Multi-Robot Metric-Semantic Simultaneous Localization and Mapping. In Proceedings of IEEE International Conference on Robotics and Automation. Xi'an, China, 11210--11218.
[2]
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA, USA, 2432--2443.
[3]
Yasaman Haghighi, Suryansh Kumar, Jean-Philippe Thiran, and Luc Van Gool. 2023. Neural Implicit Dense Semantic SLAM. arxiv: 2304.14560.
[4]
Huajian Huang, Longwei Li, Hui Cheng, and Sai-Kit Yeung. 2023. Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras. arxiv: 2311.16728.
[5]
Mohammad Mahdi Johari, Camilla Carta, and François Fleuret. 2023. ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada, 17408--17419.
[6]
Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. 2024. SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM. arxiv: 2312.02126.
[7]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, Vol. 42, 4 (2023), 1--14.
[8]
Olaf Kähler, Victor Prisacariu, Julien Valentin, and David Murray. 2016. Hierarchical Voxel Block Hashing for Efficient Integration of Depth Images. IEEE Robotics and Automation Letters, Vol. 1, 1 (2016), 192--197.
[9]
Kunyi Li, Michael Niemeyer, Nassir Navab, and Federico Tombari. 2023. DNS SLAM: Dense Neural Semantic-Informed SLAM. arxiv: 2312.00204.
[10]
Konstantinos-Nektarios Lianos, Johannes L Schonberger, Marc Pollefeys, and Torsten Sattler. 2018. VSO: Visual Semantic Odometry. In Proceedings of the European Conference on Computer Vision. Munich,Germany, 234--250.
[11]
Robert Maier, Raphael Schaller, and Daniel Cremers. 2017. Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction. arxiv: 1709.03763.
[12]
Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, and Andrew J. Davison. 2023. Gaussian Splatting SLAM. arxiv: 2312.06741.
[13]
John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger. 2017. SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. In Proceedings of IEEE International Conference on Robotics and Automation. Singapore, 4628--4635.
[14]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proceedings of the European Conference on Computer Vision. Glasgow, United Kingdom, 405--421.
[15]
Raúl Mur-Artal and Juan D. Tardós. 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Transactions on Robotics, Vol. 33, 5 (2017), 1255--1262.
[16]
Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time Dense Surface Mapping and Tracking. In Proceedings of IEEE International Symposium on Mixed and Augmented Reality. Basel, Switzerland, 127--136.
[17]
Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Real-time 3D Reconstruction at Scale Using Voxel Hashing. ACM Transactions on Graphics, Vol. 32, 6 (2013), 1--11.
[18]
Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, and Andrew J. Davison. 2013. SLAM: Simultaneous Localisation and Mapping at the Level of Objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, 1352--1359.
[19]
Erik Sandström, Yue Li, Luc Van Gool, and Martin R. Oswald. 2023. Point-SLAM: Dense Neural Point Cloud-based SLAM. In Proceedings of the IEEE/CVF International Conference on Computer Vision. Los Alamitos, CA, USA, 18387--18398.
[20]
Xuan Shao, Lin Zhang, Tianjun Zhang, Ying Shen, Hongyu Li, and Yicong Zhou. 2020. A Tightly-coupled Semantic SLAM System with Visual, Inertial and Surround-view Sensors for Autonomous Indoor Parking. In Proceedings of the 28th ACM International Conference on Multimedia. New York, NY, USA, 2691--2699.
[21]
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, and Richard Newcombe. 2019. The Replica Dataset: A Digital Replica of Indoor Spaces. arxiv: 1906.05797.
[22]
Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A Benchmark for The Evaluation of RGB-D SLAM Systems. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura-Algarve, Portugal, 573--580.
[23]
Jörg Stückler and Sven Behnke. 2014. Multi-resolution Surfel Maps for Efficient Dense 3D Modeling and Tracking. Journal of Visual Communication and Image Representation, Vol. 25, 1 (2014), 137--147.
[24]
Edgar Sucar, Shikun Liu, Joseph Ortiz, and Andrew J. Davison. 2021. iMAP: Implicit Mapping and Positioning in Real-Time. In Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, QC, Canada, 6209--6218.
[25]
Hengyi Wang, Jingwen Wang, and Lourdes Agapito. 2023. Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada, 13293--13302.
[26]
Kaixuan Wang, Fei Gao, and Shaojie Shen. 2019. Real-time Scalable Dense Surfel Mapping. In Proceedings of IEEE International Conference on Robotics and Automation. Montreal, QC, Canada, 6919--6925.
[27]
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, Vol. 13, 4 (2004), 600--612.
[28]
Thomas Whelan, Stefan Leutenegger, Rafael F. Salas-Moreno, Ben Glocker, and Andrew J. Davison. 2015. ElasticFusion: Dense SLAM Without A Pose Graph. In Proceedings of Robotics: Science and Systems.
[29]
Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, and Xuelong Li. 2024. GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting. arxiv: 2311.11700.
[30]
Xingrui Yang, Hai Li, Hongjia Zhai, Yuhang Ming, Yuqian Liu, and Guofeng Zhang. 2022. Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation. In Proceedings of IEEE International Symposium on Mixed and Augmented Reality. Los Alamitos, CA, USA, 499--507.
[31]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 586--595.
[32]
Siting Zhu, Guangming Wang, Hermann Blum, Jiuming Liu, Liang Song, Marc Pollefeys, and Hesheng Wang. 2024. SNI-SLAM: Semantic Neural Implicit SLAM. arxiv: 2311.11016.
[33]
Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R Oswald, Andreas Geiger, and Marc Pollefeys. 2024. NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM. In Proceedings of the International Conference on 3D Vision. Davos, Switzerland.
[34]
Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, and Marc Pollefeys. 2022. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, 12776--12786.
[35]
M. Zwicker, H. Pfister, J. van Baar, and M. Gross. 2001. EWA volume splatting. In Proceedings of IEEE Conference on Visualization. San Diego, CA, USA, 29--538.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3d segmentation
  2. gaussian splatting
  3. semantic slam

Qualifiers

  • Research-article

Funding Sources

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 352
    Total Downloads
  • Downloads (Last 12 months)352
  • Downloads (Last 6 weeks)152
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media