Abstract
Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-resolution map, particularly if the objects to interact are of small size or intricate geometry. In recent practice, this leads to the entire map being in the same high-quality resolution, which results in increased computational and storage costs. To address this challenge, we propose MAP-ADAPT, a real-time method for quality-adaptive semantic 3D reconstruction using RGBD frames. MAP-ADAPT is the first adaptive semantic 3D mapping algorithm that, unlike prior work, generates directly a single map with regions of different quality based on both the semantic information and the geometric complexity of the scene. Leveraging a semantic SLAM pipeline for pose and semantic estimation, we achieve comparable or superior results to state-of-the-art methods on synthetic and real-world data, while significantly reducing storage and computation requirements. Code is available at https://map-adapt.github.io/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Even though we describe the map assuming three levels of hierarchy, its depth in our implementation can be chosen arbitrarily, depending on the application at hand.
- 2.
Even though we demonstrate MAP-ADAPT with object categories, other semantic information can be used, e.g., material, function, change.
- 3.
We employ the validation set since the test set does not have publicly available annotations.
References
Aotani, Y., et al.: Development of autonomous navigation system using 3D map with geometric and semantic information. J. Robot. Mechatron. 29(4), 639–648 (2017)
Breyer, M., Chung, J.J., Ott, L., Siegwart, R., Nieto, J.: Volumetric grasping network: real-time 6 DOF grasp detection in clutter. In: Conference on Robot Learning, pp. 1602–1611. PMLR (2021)
Cai, Y., Chen, X., Zhang, C., Lin, K.Y., Wang, X., Li, H.: Semantic scene completion via integrating instances and scene in-the-loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 324–333 (2021)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: rchly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Fuhrmann, S., Goesele, M.: Fusion of depth maps with multiple scales. ACM Trans. Graph. (TOG) 30(6), 1–8 (2011)
Funk, N., Tarrio, J., Papatheodorou, S., Popović, M., Alcantarilla, P.F., Leutenegger, S.: Multi-resolution 3D mapping with explicit free space representation for fast and accurate mobile robot motion planning. IEEE Robot. Autom. Lett. 6(2), 3553–3560 (2021)
Grinvald, M., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Let. 4(3), 3037–3044 (2019). https://doi.org/10.1109/LRA.2019.2923960
Grinvald, M., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Lett. 4(3), 3037–3044 (2019)
Hachiuma, R., Pirchheim, C., Schmalstieg, D., Saito, H.: Detectfusion: detecting and segmenting both known and unknown dynamic objects in real-time slam. arXiv preprint arXiv:1907.09127 (2019)
Han, L., Gao, F., Zhou, B., Shen, S.: Fiesta: fast incremental Euclidean distance fields for online motion planning of aerial robots. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4423–4430. IEEE (2019)
Han, M., et al.: Reconstructing interactive 3d scenes by panoptic mapping and cad model alignments, pp. 12199–12206 (2021). https://doi.org/10.1109/ICRA48506.2021.9561546
Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 34, 189–206 (2013)
Intel: Realsense depth camera d435i. https://www.intelrealsense.com/depthcamera-d435i/
Jutzi, B., Gross, H.: Nearest neighbour classification on laser point clouds to gain object structures from buildings. Int. Arch. Photogrammetry, Remote Sens. Spat. Inf. Sci. 38(Part 1), 4–7 (2009)
Kähler, O., Prisacariu, V., Valentin, J., Murray, D.: Hierarchical voxel block hashing for efficient integration of depth images. IEEE Robot. Autom. Lett. 1(1), 192–197 (2015)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Khanna*, M., et al.: Habitat synthetic scenes dataset (HSSD-200): an analysis of 3D scene scale and realism tradeoffs for objectgoal navigation. arXiv preprint (2023)
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Lai, X., Chen, Y., Lu, F., Liu, J., Jia, J.: Spherical transformer for lidar-based 3D recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17545–17555 (2023)
Lin, Y., et al.: Autonomous aerial navigation using monocular visual-inertial fusion. J. Field Robot. 35(1), 23–51 (2018)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. ACM Siggraph Comput. Graph. 21(4), 163–169 (1987)
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Mascaro, R., Teixeira, L., Chli, M.: Volumetric instance-level semantic mapping via multi-view 2D-to-3D label diffusion. IEEE Robot. Autom. Lett. 7(2), 3531–3538 (2022)
McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level slam. In: 2018 International Conference on 3D Vision (3DV), pp. 32–41. IEEE (2018)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3D semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and automation (ICRA), pp. 4628–4635. IEEE (2017)
Microsoft: Azure kinect dk. https://azure.microsoft.com/en-us/products/kinect-dk
Mur-Artal, R., Tardós, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017)
Narita, G., Seno, T., Ishikawa, T., Kaji, Y.: Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212 (2019). https://doi.org/10.1109/IROS40897.2019.8967890
Nekrasov, V., Shen, C., Reid, I.: Light-weight refinenet for real-time semantic segmentation. arXiv preprint arXiv:1810.03272 (2018)
Nicholson, L., Milford, M., Sünderhauf, N.: Quadricslam: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 4(1), 1–8 (2018)
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32(6), 1–11 (2013)
Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., Nieto, J.: Voxblox: incremental 3D Euclidean signed distance fields for on-board MAV planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017)
Orbbec: Astra series. https://www.orbbec.com/products/structured-light-camera/astra-series/
Pan, Y., Kompis, Y., Bartolomei, L., Mascaro, R., Stachniss, C., Chli, M.: Voxfield: non-projective signed distance fields for online planning and 3D reconstruction. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5331–5338. IEEE (2022)
Pauly, M., Gross, M., Kobbelt, L.P.: Efficient simplification of point-sampled surfaces. In: IEEE Visualization, 2002. VIS 2002, pp. 163–170. IEEE (2002)
Qian, Z., Patath, K., Fu, J., Xiao, J.: Semantic slam with autonomous object-level data association. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11203–11209. IEEE (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1689–1696. IEEE (2020)
Runz, M., Buffier, M., Agapito, L.: Maskfusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 10–20. IEEE (2018)
Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 345–348 (2010)
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)
Schmid, L., et al.: Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In: 2022 IEEE International Conference on Robotics and Automation (ICRA), pp. 8018–8024 (2022). https://doi.org/10.1109/ICRA46639.2022.9811877
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Steinbrücker, F., Sturm, J., Cremers, D.: Volumetric 3D mapping in real-time on a CPU. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2021–2028. IEEE (2014)
Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent. 25(1), 137–147 (2014)
Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: implicit mapping and positioning in real-time. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238 (2021)
Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079–5085. IEEE (2017)
Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Vespa, E., Funk, N., Kelly, P.H., Leutenegger, S.: Adaptive-resolution octree-based volumetric slam. In: 2019 International Conference on 3D Vision (3DV), pp. 654–662. IEEE (2019)
Vespa, E., Nikolov, N., Grimm, M., Nardi, L., Kelly, P.H., Leutenegger, S.: Efficient octree-based volumetric slam supporting signed-distance and occupancy mapping. IEEE Robot. Autom. Lett. 3(2), 1144–1151 (2018)
Wald, I.: A simple, general, and GPU friendly method for computing dual mesh and ISO-surfaces of adaptive mesh refinement (amr) data. arXiv preprint arXiv:2004.08475 (2020)
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. Sci. Systems, Robot. (2015)
Wu, S.C., Wald, J., Tateno, K., Navab, N., Tombari, F.: Scenegraphfusion: incremental 3d scene graph prediction from RGB-D sequences. In: CVPR, pp. 7515–7525 (2021)
Yang, S., Scherer, S.: CubeSLAM: monocular 3-D object slam. IEEE Trans. Rob. 35(4), 925–938 (2019)
Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)
Zienkiewicz, J., Tsiotsios, A., Davison, A., Leutenegger, S.: Monocular, real-time surface reconstruction using dynamic level of detail. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 37–46. IEEE (2016)
Acknowledgement
This project was supported by the ETH RobotX research grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, J., Barath, D., Pollefeys, M., Armeni, I. (2025). MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15097. Springer, Cham. https://doi.org/10.1007/978-3-031-72933-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-72933-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72932-4
Online ISBN: 978-3-031-72933-1
eBook Packages: Computer ScienceComputer Science (R0)