MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Zheng, Jianhao; Barath, Daniel; Pollefeys, Marc; Armeni, Iro

doi:10.1007/978-3-031-72933-1_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15097))

Included in the following conference series:

European Conference on Computer Vision

329 Accesses

Abstract

Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-resolution map, particularly if the objects to interact are of small size or intricate geometry. In recent practice, this leads to the entire map being in the same high-quality resolution, which results in increased computational and storage costs. To address this challenge, we propose MAP-ADAPT, a real-time method for quality-adaptive semantic 3D reconstruction using RGBD frames. MAP-ADAPT is the first adaptive semantic 3D mapping algorithm that, unlike prior work, generates directly a single map with regions of different quality based on both the semantic information and the geometric complexity of the scene. Leveraging a semantic SLAM pipeline for pose and semantic estimation, we achieve comparable or superior results to state-of-the-art methods on synthetic and real-world data, while significantly reducing storage and computation requirements. Code is available at https://map-adapt.github.io/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor

Improving Semantic Mapping with Prior Object Dimensions Extracted from 3D Models

RGBD GS-ICP SLAM

Notes

1.
Even though we describe the map assuming three levels of hierarchy, its depth in our implementation can be chosen arbitrarily, depending on the application at hand.
2.
Even though we demonstrate MAP-ADAPT with object categories, other semantic information can be used, e.g., material, function, change.
3.
We employ the validation set since the test set does not have publicly available annotations.

References

Aotani, Y., et al.: Development of autonomous navigation system using 3D map with geometric and semantic information. J. Robot. Mechatron. 29(4), 639–648 (2017)
Article Google Scholar
Breyer, M., Chung, J.J., Ott, L., Siegwart, R., Nieto, J.: Volumetric grasping network: real-time 6 DOF grasp detection in clutter. In: Conference on Robot Learning, pp. 1602–1611. PMLR (2021)
Google Scholar
Cai, Y., Chen, X., Zhang, C., Lin, K.Y., Wang, X., Li, H.: Semantic scene completion via integrating instances and scene in-the-loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 324–333 (2021)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: rchly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
Fuhrmann, S., Goesele, M.: Fusion of depth maps with multiple scales. ACM Trans. Graph. (TOG) 30(6), 1–8 (2011)
Article Google Scholar
Funk, N., Tarrio, J., Papatheodorou, S., Popović, M., Alcantarilla, P.F., Leutenegger, S.: Multi-resolution 3D mapping with explicit free space representation for fast and accurate mobile robot motion planning. IEEE Robot. Autom. Lett. 6(2), 3553–3560 (2021)
Article Google Scholar
Grinvald, M., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Let. 4(3), 3037–3044 (2019). https://doi.org/10.1109/LRA.2019.2923960
Article Google Scholar
Grinvald, M., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Lett. 4(3), 3037–3044 (2019)
Article Google Scholar
Hachiuma, R., Pirchheim, C., Schmalstieg, D., Saito, H.: Detectfusion: detecting and segmenting both known and unknown dynamic objects in real-time slam. arXiv preprint arXiv:1907.09127 (2019)
Han, L., Gao, F., Zhou, B., Shen, S.: Fiesta: fast incremental Euclidean distance fields for online motion planning of aerial robots. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4423–4430. IEEE (2019)
Google Scholar
Han, M., et al.: Reconstructing interactive 3d scenes by panoptic mapping and cad model alignments, pp. 12199–12206 (2021). https://doi.org/10.1109/ICRA48506.2021.9561546
Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 34, 189–206 (2013)
Article Google Scholar
Intel: Realsense depth camera d435i. https://www.intelrealsense.com/depthcamera-d435i/
Jutzi, B., Gross, H.: Nearest neighbour classification on laser point clouds to gain object structures from buildings. Int. Arch. Photogrammetry, Remote Sens. Spat. Inf. Sci. 38(Part 1), 4–7 (2009)
Google Scholar
Kähler, O., Prisacariu, V., Valentin, J., Murray, D.: Hierarchical voxel block hashing for efficient integration of depth images. IEEE Robot. Autom. Lett. 1(1), 192–197 (2015)
Article Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Khanna*, M., et al.: Habitat synthetic scenes dataset (HSSD-200): an analysis of 3D scene scale and realism tradeoffs for objectgoal navigation. arXiv preprint (2023)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Lai, X., Chen, Y., Lu, F., Liu, J., Jia, J.: Spherical transformer for lidar-based 3D recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17545–17555 (2023)
Google Scholar
Lin, Y., et al.: Autonomous aerial navigation using monocular visual-inertial fusion. J. Field Robot. 35(1), 23–51 (2018)
Article Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. ACM Siggraph Comput. Graph. 21(4), 163–169 (1987)
Article Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Mascaro, R., Teixeira, L., Chli, M.: Volumetric instance-level semantic mapping via multi-view 2D-to-3D label diffusion. IEEE Robot. Autom. Lett. 7(2), 3531–3538 (2022)
Article Google Scholar
McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level slam. In: 2018 International Conference on 3D Vision (3DV), pp. 32–41. IEEE (2018)
Google Scholar
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3D semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and automation (ICRA), pp. 4628–4635. IEEE (2017)
Google Scholar
Microsoft: Azure kinect dk. https://azure.microsoft.com/en-us/products/kinect-dk
Mur-Artal, R., Tardós, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017)
Article Google Scholar
Narita, G., Seno, T., Ishikawa, T., Kaji, Y.: Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212 (2019). https://doi.org/10.1109/IROS40897.2019.8967890
Nekrasov, V., Shen, C., Reid, I.: Light-weight refinenet for real-time semantic segmentation. arXiv preprint arXiv:1810.03272 (2018)
Nicholson, L., Milford, M., Sünderhauf, N.: Quadricslam: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 4(1), 1–8 (2018)
Article Google Scholar
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32(6), 1–11 (2013)
Article Google Scholar
Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., Nieto, J.: Voxblox: incremental 3D Euclidean signed distance fields for on-board MAV planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017)
Google Scholar
Orbbec: Astra series. https://www.orbbec.com/products/structured-light-camera/astra-series/
Pan, Y., Kompis, Y., Bartolomei, L., Mascaro, R., Stachniss, C., Chli, M.: Voxfield: non-projective signed distance fields for online planning and 3D reconstruction. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5331–5338. IEEE (2022)
Google Scholar
Pauly, M., Gross, M., Kobbelt, L.P.: Efficient simplification of point-sampled surfaces. In: IEEE Visualization, 2002. VIS 2002, pp. 163–170. IEEE (2002)
Google Scholar
Qian, Z., Patath, K., Fu, J., Xiao, J.: Semantic slam with autonomous object-level data association. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11203–11209. IEEE (2021)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1689–1696. IEEE (2020)
Google Scholar
Runz, M., Buffier, M., Agapito, L.: Maskfusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 10–20. IEEE (2018)
Google Scholar
Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 345–348 (2010)
Article Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)
Google Scholar
Schmid, L., et al.: Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In: 2022 IEEE International Conference on Robotics and Automation (ICRA), pp. 8018–8024 (2022). https://doi.org/10.1109/ICRA46639.2022.9811877
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Steinbrücker, F., Sturm, J., Cremers, D.: Volumetric 3D mapping in real-time on a CPU. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2021–2028. IEEE (2014)
Google Scholar
Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent. 25(1), 137–147 (2014)
Article Google Scholar
Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: implicit mapping and positioning in real-time. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238 (2021)
Google Scholar
Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079–5085. IEEE (2017)
Google Scholar
Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Vespa, E., Funk, N., Kelly, P.H., Leutenegger, S.: Adaptive-resolution octree-based volumetric slam. In: 2019 International Conference on 3D Vision (3DV), pp. 654–662. IEEE (2019)
Google Scholar
Vespa, E., Nikolov, N., Grimm, M., Nardi, L., Kelly, P.H., Leutenegger, S.: Efficient octree-based volumetric slam supporting signed-distance and occupancy mapping. IEEE Robot. Autom. Lett. 3(2), 1144–1151 (2018)
Article Google Scholar
Wald, I.: A simple, general, and GPU friendly method for computing dual mesh and ISO-surfaces of adaptive mesh refinement (amr) data. arXiv preprint arXiv:2004.08475 (2020)
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. Sci. Systems, Robot. (2015)
Google Scholar
Wu, S.C., Wald, J., Tateno, K., Navab, N., Tombari, F.: Scenegraphfusion: incremental 3d scene graph prediction from RGB-D sequences. In: CVPR, pp. 7515–7525 (2021)
Google Scholar
Yang, S., Scherer, S.: CubeSLAM: monocular 3-D object slam. IEEE Trans. Rob. 35(4), 925–938 (2019)
Article Google Scholar
Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)
Google Scholar
Zienkiewicz, J., Tsiotsios, A., Davison, A., Leutenegger, S.: Monocular, real-time surface reconstruction using dynamic level of detail. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 37–46. IEEE (2016)
Google Scholar

Download references

Acknowledgement

This project was supported by the ETH RobotX research grant.

Author information

Authors and Affiliations

Gradient Spaces Laboratory, Stanford University, Stanford, USA
Jianhao Zheng & Iro Armeni
ETH Zurich, Computer Vision and Geometry Group, Zurich, Switzerland
Daniel Barath & Marc Pollefeys
Microsoft Mixed Reality & AI Lab, Zurich, Switzerland
Marc Pollefeys

Authors

Jianhao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Barath
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pollefeys
View author publications
You can also search for this author in PubMed Google Scholar
Iro Armeni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhao Zheng .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7055 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, J., Barath, D., Pollefeys, M., Armeni, I. (2025). MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15097. Springer, Cham. https://doi.org/10.1007/978-3-031-72933-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-72933-1_13
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72932-4
Online ISBN: 978-3-031-72933-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor

Improving Semantic Mapping with Prior Object Dimensions Extracted from 3D Models

RGBD GS-ICP SLAM

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 7055 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor

Improving Semantic Mapping with Prior Object Dimensions Extracted from 3D Models

RGBD GS-ICP SLAM

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 7055 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation