research-article

Open access

HFN-SLAM:Hybrid Scene Neural Representation SLAM Based on Frame Alignment and Normal Consistency

Authors:

Kai ZhangAuthors Info & Claims

ICCAI '24: Proceedings of the 2024 10th International Conference on Computing and Artificial Intelligence

Pages 295 - 300

https://doi.org/10.1145/3669754.3669798

Published: 30 August 2024 Publication History

All formats PDF

Abstract

Recent advancements in SLAM based on neural radiance fields have demonstrated promising performance. However, existing methods still exhibit shortcomings in terms of reconstruction and pose estimation accuracy, particularly in medium-to-large indoor scenes. These limitations stem from inadequate utilization of structural scene information and ineffective constraint handling for cumulative errors between sequences. To address these challenges, we introduce HFN-SLAM, a neurovisual SLAM system designed to achieve real-time, high-fidelity scene reconstruction and robust camera tracking. To attain fine-grained scene reconstruction without compromising real-time performance, we propose a hybrid representation method for scenes. This method integrates high-resolution, dense 3D hash grid features and 2D plane features, enhancing scene reconstruction accuracy while minimizing parameter overhead. To effectively leverage information between input frames and mitigate accumulated sequence errors, we introduce a frame-aligned algorithm. This algorithm globally aligns input sequences by constraining reprojection errors between keyframes. Furthermore, to enhance scene details, we propose a region-aware normal consistency method. This method implements constraints on large planar scenes, facilitating detailed scene reconstruction. Experimental results demonstrate that our method operates at 3-5 Hz on a desktop PC, surpassing existing methods in both scene reconstruction and camera tracking performance.

References

[1]

John Ashburner and Karl J Friston. 2000. Voxel-based morphometry—the methods. Neuroimage 11, 6 (2000), 805–821.

[2]

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5828–5839.

[3]

Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision. Springer, 834–849.

[4]

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. 2022. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5501–5510.

[5]

Petr Kellnhofer, Lars C Jebe, Andrew Jones, Ryan Spicer, Kari Pulli, and Gordon Wetzstein. 2021. Neural lumigraph rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4287–4297.

[6]

Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. 2023. Neuralangelo: High-Fidelity Neural Surface Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8456–8465.

[7]

Donald Meagher. 1982. Geometric modeling using octree encoding. Computer graphics and image processing 19, 2 (1982), 129–147.

[8]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.

Digital Library

[9]

Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, and Jan Novák. 2019. Neural importance sampling. ACM Transactions on Graphics (ToG) 38, 5 (2019), 1–19.

Digital Library

[10]

Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. 2011. DTAM: Dense tracking and mapping in real-time. In 2011 international conference on computer vision. IEEE, 2320–2327.

Digital Library

[11]

Michael Oechsle, Songyou Peng, and Andreas Geiger. 2021. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5589–5599.

[12]

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.

[13]

Erik Sandström, Yue Li, Luc Van Gool, and Martin R Oswald. 2023. Point-slam: Dense neural point cloud-based slam. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 18433–18444.

[14]

Thomas Schops, Torsten Sattler, and Marc Pollefeys. 2019. Bad slam: Bundle adjusted direct rgb-d slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 134–144.

[15]

Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, 2019. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019).

[16]

Edgar Sucar, Shikun Liu, Joseph Ortiz, and Andrew J Davison. 2021. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6229–6238.

[17]

Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2022. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5459–5469.

[18]

Jingwen Wang, Tymoteusz Bleja, and Lourdes Agapito. 2022. Go-surf: Neural feature grid optimization for fast, high-fidelity rgb-d surface reconstruction. In 2022 International Conference on 3D Vision (3DV). IEEE, 433–442.

[19]

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. 2021. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021).

[20]

Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu. 2021. NeRF–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064 (2021).

[21]

Thomas Whelan, Stefan Leutenegger, Renato Salas-Moreno, Ben Glocker, and Andrew Davison. 2015. ElasticFusion: Dense SLAM without a pose graph. Robotics: Science and Systems.

[22]

Qi Wu, David Bauer, Michael J Doyle, and Kwan-Liu Ma. 2023. Interactive volume visualization via multi-resolution hash encoding based neural representation. IEEE Transactions on Visualization and Computer Graphics (2023).

[23]

Xingrui Yang, Hai Li, Hongjia Zhai, Yuhang Ming, Yuqian Liu, and Guofeng Zhang. 2022. Vox-Fusion: Dense tracking and mapping with voxel-based neural implicit representation. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 499–507.

[24]

Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. 2021. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems 34 (2021), 4805–4815.

[25]

Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. 2021. inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 1323–1330.

Digital Library

[26]

Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R Oswald, and Marc Pollefeys. 2022. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12786–12796.

Index Terms

HFN-SLAM:Hybrid Scene Neural Representation SLAM Based on Frame Alignment and Normal Consistency
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Self-calibration and visual SLAM with a multi-camera system on a micro aerial vehicle

The use of a multi-camera system enables a robot to obtain a surround view, and thus, maximize its perceptual awareness of its environment. If vision-based simultaneous localization and mapping (vSLAM) is expected to provide reliable pose estimates for ...
Global Localization from Monocular SLAM on a Mobile Phone

We propose the combination of a keyframe-based monocular SLAM system and a global localization method. The SLAM system runs locally on a camera-equipped mobile client and provides continuous, relative 6DoF pose estimation as well as keyframe images with ...
MF-SLAM: Multi-focal SLAM
Intelligent Robotics and Applications
Abstract
SLAM has achieved excellent achievement in the development of the past two decades and it has been extensively developed in robotics communities. The present binocular SLAM is based on the standard binocular camera to obtain images, and they have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCAI '24: Proceedings of the 2024 10th International Conference on Computing and Artificial Intelligence

April 2024

491 pages

ISBN:9798400717055

DOI:10.1145/3669754

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCAI 2024

ICCAI 2024: 2024 10th International Conference on Computing and Artificial Intelligence

April 26 - 29, 2024

Bali Island, Indonesia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
98
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)25

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten