Skip to main content

MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-resolution map, particularly if the objects to interact are of small size or intricate geometry. In recent practice, this leads to the entire map being in the same high-quality resolution, which results in increased computational and storage costs. To address this challenge, we propose MAP-ADAPT, a real-time method for quality-adaptive semantic 3D reconstruction using RGBD frames. MAP-ADAPT is the first adaptive semantic 3D mapping algorithm that, unlike prior work, generates directly a single map with regions of different quality based on both the semantic information and the geometric complexity of the scene. Leveraging a semantic SLAM pipeline for pose and semantic estimation, we achieve comparable or superior results to state-of-the-art methods on synthetic and real-world data, while significantly reducing storage and computation requirements. Code is available at https://map-adapt.github.io/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Even though we describe the map assuming three levels of hierarchy, its depth in our implementation can be chosen arbitrarily, depending on the application at hand.

  2. 2.

    Even though we demonstrate MAP-ADAPT with object categories, other semantic information can be used, e.g., material, function, change.

  3. 3.

    We employ the validation set since the test set does not have publicly available annotations.

References

  1. Aotani, Y., et al.: Development of autonomous navigation system using 3D map with geometric and semantic information. J. Robot. Mechatron. 29(4), 639–648 (2017)

    Article  Google Scholar 

  2. Breyer, M., Chung, J.J., Ott, L., Siegwart, R., Nieto, J.: Volumetric grasping network: real-time 6 DOF grasp detection in clutter. In: Conference on Robot Learning, pp. 1602–1611. PMLR (2021)

    Google Scholar 

  3. Cai, Y., Chen, X., Zhang, C., Lin, K.Y., Wang, X., Li, H.: Semantic scene completion via integrating instances and scene in-the-loop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 324–333 (2021)

    Google Scholar 

  4. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: rchly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  5. Fuhrmann, S., Goesele, M.: Fusion of depth maps with multiple scales. ACM Trans. Graph. (TOG) 30(6), 1–8 (2011)

    Article  Google Scholar 

  6. Funk, N., Tarrio, J., Papatheodorou, S., Popović, M., Alcantarilla, P.F., Leutenegger, S.: Multi-resolution 3D mapping with explicit free space representation for fast and accurate mobile robot motion planning. IEEE Robot. Autom. Lett. 6(2), 3553–3560 (2021)

    Article  Google Scholar 

  7. Grinvald, M., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Let. 4(3), 3037–3044 (2019). https://doi.org/10.1109/LRA.2019.2923960

    Article  Google Scholar 

  8. Grinvald, M., et al.: Volumetric instance-aware semantic mapping and 3D object discovery. IEEE Robot. Autom. Lett. 4(3), 3037–3044 (2019)

    Article  Google Scholar 

  9. Hachiuma, R., Pirchheim, C., Schmalstieg, D., Saito, H.: Detectfusion: detecting and segmenting both known and unknown dynamic objects in real-time slam. arXiv preprint arXiv:1907.09127 (2019)

  10. Han, L., Gao, F., Zhou, B., Shen, S.: Fiesta: fast incremental Euclidean distance fields for online motion planning of aerial robots. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4423–4430. IEEE (2019)

    Google Scholar 

  11. Han, M., et al.: Reconstructing interactive 3d scenes by panoptic mapping and cad model alignments, pp. 12199–12206 (2021). https://doi.org/10.1109/ICRA48506.2021.9561546

  12. Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 34, 189–206 (2013)

    Article  Google Scholar 

  13. Intel: Realsense depth camera d435i. https://www.intelrealsense.com/depthcamera-d435i/

  14. Jutzi, B., Gross, H.: Nearest neighbour classification on laser point clouds to gain object structures from buildings. Int. Arch. Photogrammetry, Remote Sens. Spat. Inf. Sci. 38(Part 1), 4–7 (2009)

    Google Scholar 

  15. Kähler, O., Prisacariu, V., Valentin, J., Murray, D.: Hierarchical voxel block hashing for efficient integration of depth images. IEEE Robot. Autom. Lett. 1(1), 192–197 (2015)

    Article  Google Scholar 

  16. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

  17. Khanna*, M., et al.: Habitat synthetic scenes dataset (HSSD-200): an analysis of 3D scene scale and realism tradeoffs for objectgoal navigation. arXiv preprint (2023)

    Google Scholar 

  18. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  19. Lai, X., Chen, Y., Lu, F., Liu, J., Jia, J.: Spherical transformer for lidar-based 3D recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17545–17555 (2023)

    Google Scholar 

  20. Lin, Y., et al.: Autonomous aerial navigation using monocular visual-inertial fusion. J. Field Robot. 35(1), 23–51 (2018)

    Article  Google Scholar 

  21. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  22. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. ACM Siggraph Comput. Graph. 21(4), 163–169 (1987)

    Article  Google Scholar 

  23. Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  24. Mascaro, R., Teixeira, L., Chli, M.: Volumetric instance-level semantic mapping via multi-view 2D-to-3D label diffusion. IEEE Robot. Autom. Lett. 7(2), 3531–3538 (2022)

    Article  Google Scholar 

  25. McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level slam. In: 2018 International Conference on 3D Vision (3DV), pp. 32–41. IEEE (2018)

    Google Scholar 

  26. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: dense 3D semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and automation (ICRA), pp. 4628–4635. IEEE (2017)

    Google Scholar 

  27. Microsoft: Azure kinect dk. https://azure.microsoft.com/en-us/products/kinect-dk

  28. Mur-Artal, R., Tardós, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017)

    Article  Google Scholar 

  29. Narita, G., Seno, T., Ishikawa, T., Kaji, Y.: Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212 (2019). https://doi.org/10.1109/IROS40897.2019.8967890

  30. Nekrasov, V., Shen, C., Reid, I.: Light-weight refinenet for real-time semantic segmentation. arXiv preprint arXiv:1810.03272 (2018)

  31. Nicholson, L., Milford, M., Sünderhauf, N.: Quadricslam: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 4(1), 1–8 (2018)

    Article  Google Scholar 

  32. Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32(6), 1–11 (2013)

    Article  Google Scholar 

  33. Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., Nieto, J.: Voxblox: incremental 3D Euclidean signed distance fields for on-board MAV planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017)

    Google Scholar 

  34. Orbbec: Astra series. https://www.orbbec.com/products/structured-light-camera/astra-series/

  35. Pan, Y., Kompis, Y., Bartolomei, L., Mascaro, R., Stachniss, C., Chli, M.: Voxfield: non-projective signed distance fields for online planning and 3D reconstruction. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5331–5338. IEEE (2022)

    Google Scholar 

  36. Pauly, M., Gross, M., Kobbelt, L.P.: Efficient simplification of point-sampled surfaces. In: IEEE Visualization, 2002. VIS 2002, pp. 163–170. IEEE (2002)

    Google Scholar 

  37. Qian, Z., Patath, K., Fu, J., Xiao, J.: Semantic slam with autonomous object-level data association. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11203–11209. IEEE (2021)

    Google Scholar 

  38. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  39. Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 1689–1696. IEEE (2020)

    Google Scholar 

  40. Runz, M., Buffier, M., Agapito, L.: Maskfusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 10–20. IEEE (2018)

    Google Scholar 

  41. Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 345–348 (2010)

    Article  Google Scholar 

  42. Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359 (2013)

    Google Scholar 

  43. Schmid, L., et al.: Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In: 2022 IEEE International Conference on Robotics and Automation (ICRA), pp. 8018–8024 (2022). https://doi.org/10.1109/ICRA46639.2022.9811877

  44. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  45. Steinbrücker, F., Sturm, J., Cremers, D.: Volumetric 3D mapping in real-time on a CPU. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2021–2028. IEEE (2014)

    Google Scholar 

  46. Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent. 25(1), 137–147 (2014)

    Article  Google Scholar 

  47. Sucar, E., Liu, S., Ortiz, J., Davison, A.J.: imap: implicit mapping and positioning in real-time. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238 (2021)

    Google Scholar 

  48. Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079–5085. IEEE (2017)

    Google Scholar 

  49. Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  50. Vespa, E., Funk, N., Kelly, P.H., Leutenegger, S.: Adaptive-resolution octree-based volumetric slam. In: 2019 International Conference on 3D Vision (3DV), pp. 654–662. IEEE (2019)

    Google Scholar 

  51. Vespa, E., Nikolov, N., Grimm, M., Nardi, L., Kelly, P.H., Leutenegger, S.: Efficient octree-based volumetric slam supporting signed-distance and occupancy mapping. IEEE Robot. Autom. Lett. 3(2), 1144–1151 (2018)

    Article  Google Scholar 

  52. Wald, I.: A simple, general, and GPU friendly method for computing dual mesh and ISO-surfaces of adaptive mesh refinement (amr) data. arXiv preprint arXiv:2004.08475 (2020)

  53. Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A.: Elasticfusion: dense slam without a pose graph. Sci. Systems, Robot. (2015)

    Google Scholar 

  54. Wu, S.C., Wald, J., Tateno, K., Navab, N., Tombari, F.: Scenegraphfusion: incremental 3d scene graph prediction from RGB-D sequences. In: CVPR, pp. 7515–7525 (2021)

    Google Scholar 

  55. Yang, S., Scherer, S.: CubeSLAM: monocular 3-D object slam. IEEE Trans. Rob. 35(4), 925–938 (2019)

    Article  Google Scholar 

  56. Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)

    Google Scholar 

  57. Zienkiewicz, J., Tsiotsios, A., Davison, A., Leutenegger, S.: Monocular, real-time surface reconstruction using dynamic level of detail. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 37–46. IEEE (2016)

    Google Scholar 

Download references

Acknowledgement

This project was supported by the ETH RobotX research grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhao Zheng .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 7055 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, J., Barath, D., Pollefeys, M., Armeni, I. (2025). MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15097. Springer, Cham. https://doi.org/10.1007/978-3-031-72933-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72933-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72932-4

  • Online ISBN: 978-3-031-72933-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics