RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping

Qin, Yusheng; Mei, Tiancan; Gao, Zhi; Lin, Zhipeng; Song, Weiwei; Zhao, Xuhui

doi:10.1007/s10846-022-01697-y

RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping

Short Paper
Published: 08 August 2022

Volume 105, article number 90, (2022)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Yusheng Qin^1,2,
Tiancan Mei¹,
Zhi Gao²,
Zhipeng Lin³,
Weiwei Song⁴ &
…
Xuhui Zhao²

661 Accesses
4 Citations
Explore all metrics

Abstract

Dynamic environments pose a severe challenge to visual SLAM as moving objects invalidate the assumption of a static background. While recent works employ deep learning to address the challenge, they still fail to determine whether an object actually moves or not, resulting in the misguidance of object tracking and background reconstruction. Hence we design a SLAM system to simultaneously estimate trajectory and construct object-level dense 3D semantic maps in dynamic environments. Synergizing deep learning-based object detection, we leverage geometric constraints by using optical flow and the relationship between objects to identify those moving but predefined static objects. To construct more precise 3D semantic maps, our method employs an unsupervised algorithm to segment 3D point cloud generated by depth data into meaningful clusters. The 3D point clusters are then synergized with semantic cues generated by deep learning to produce a more accurate 3D semantic map. We evaluate the proposed system on TUM RGB-D dataset and ICL-NUIM dataset as well as in real-world indoor environments. Qualitative and quantitative experiments show that our method outperforms state-of-the-art approaches in various dynamic scenes in terms of both accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PoseFusion: Dense RGB-D SLAM in Dynamic Human Environments

Approach to 3D SLAM for Mobile Robot Based on RGB-D Image with Semantic Feature in Dynamic Environment

Article 02 September 2023

Semantic Ground Plane Constraint in Visual SLAM for Indoor Scenes

Data Availability

All data and materials generated or analysed during this study are included in this published article and its supplementary information files.

Code Availability

The code genesrated during the current study will soon be available in the jason-yspjf repository(“https://github.com/jason-yspjf”).

References

Civera, J., Grasa, O.G., Davison, A.J., Montiel, J.: 1-point ransac for extended kalman filtering: Application to real-time structure from motion and visual odometry. J. Field Robot 27(5), 609–631 (2010)
Article Google Scholar
Sim, R., Elinas, P., Griffin, M., Little, J.J., et al. : Vision-based slam using the rao-blackwellised particle filter. In: IJCAI Workshop on Reasoning with Uncertainty in Robotics, vol. 14, pp. 9–16 (2005)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: Real-time single camera slam. IEEE Trans. Pattern Anal. Machine Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234. IEEE (2007)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Zhang, J., Henein, M., Mahony, R., Ila, V.: Vdo-slam: a visual dynamic object-aware slam system. arXiv preprint arXiv:2005.11052 (2020)
Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
Article Google Scholar
Qin, T., Li, P., Shen, S.: Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34(4), 1004–1020 (2018)
Article Google Scholar
Wang, C.-C., Thorpe, C., Thrun, S., Hebert, M., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 26(9), 889–916 (2007)
Article Google Scholar
Rosen, D.M., Mason, J., Leonard, J.J.: Towards lifelong feature-based mapping in semi-static environments. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1063–1070. IEEE (2016)
Krajník, T., Fentanes, J.P., Santos, J.M., Duckett, T.: Fremen: Frequency map enhancement for long-term mobile robot autonomy in changing environments. IEEE Trans. Robot. 33(4), 964–977 (2017)
Article Google Scholar
Palazzolo, E., Behley, J., Lottes, P., Giguere, P., Stachniss, C.: Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7855–7862. IEEE (2019)
Scona, R., Jaimez, M., Petillot, Y.R., Fallon, M., Cremers, D.: Staticfusion: Background reconstruction for dense rgb-d slam in dynamic environments. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3849–3856. IEEE (2018)
Bescos, B., Fácil, J.M., Civera, J., Neira, J.: Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)
Article Google Scholar
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174. IEEE (2018)
Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-slam: Making object detection and slam mutually beneficial. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001–1010. IEEE (2018)
Ji, T., Wang, C., Xie, L.: Towards real-time semantic rgb-d slam in dynamic environments. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11175–11181. IEEE (2021)
Runz, M., Buffier, M., Agapito, L.: Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 10–20. IEEE (2018)
Xu, B., Li, W., Tzoumanikas, D., Bloesch, M., Davison, A., Leutenegger, S.: Mid-fusion: Octree-based object-level multi-instance dynamic slam. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5231–5237. IEEE (2019)
Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.J.: Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot 32(6), 1309–1332 (2016)
Article Google Scholar
Bakkay, M.C., Arafa, M., Zagrouba, E.: Dense 3d slam in dynamic scenes using kinect. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 121–129. Springer (2015)
Zhang, T., Zhang, H., Li, Y., Nakamura, Y., Zhang, L.: Flowfusion: Dynamic dense rgb-d slam based on optical flow. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7322–7328. IEEE (2020)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Hermans, A., Floros, G., Leibe, B.: Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2631–2638. IEEE (2014)
Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079–5085. IEEE (2017)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4628–4635. IEEE (2017)
Cheng, J., Wang, C., Mai, X., Min, Z., Meng, M.Q.-H.: Improving dense mapping for mobile robots in dynamic environments based on semantic information. IEEE Sens. J. 21(10), 11740–11747 (2020)
Article Google Scholar
Zhao, X., Zuo, T., Hu, X.: OFM-SLAM: a visual semantic SLAM for dynamic indoor environments. Math. Probl. Eng. 2021 (2021). Hindawi
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Lucas, B.D., Kanade, T., et al. : An iterative image registration technique with an application to stereo vision. Vancouver (1981)
Chen, L.-H., Peng, C.-C.: A robust 2d-slam technology with environmental variation adaptability. IEEE Sens. J. 19(23), 11475–11491 (2019)
Article Google Scholar
Pham, T.T., Eich, M., Reid, I., Wyeth, G.: Geometrically consistent plane extraction for dense indoor 3d maps segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4199–4204. IEEE (2016)
Christoph Stein, S., Schoeler, M., Papon, J., Worgotter, F.: Object partitioning using local convexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2014)
Verdoja, F., Thomas, D., Sugimoto, A.: Fast 3d point cloud segmentation using supervoxels with geometry and color for 3d scene understanding. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1285–1290. IEEE (2017)
Gala, D., Lindsay, N., Sun, L.: Multi-sound-source localization using machine learning for small autonomous unmanned vehicles with a self-rotating bi-microphone array. J. Intell. Robot. Syst. 103(3), 1–20 (2021)
Article Google Scholar
Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: Octomap: An efficient probabilistic 3d mapping framework based on octrees. Autonomous Robots 34(3), 189–206 (2013)
Article Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580. IEEE (2012)
Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524–1531. IEEE (2014)
Whelan, T., Salas-Moreno, R.F., Glocker, B., Davison, A.J., Leutenegger, S.: Elasticfusion: Real-time dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)
Article Google Scholar

Download references

Funding

This research is partially supported by the National Natural Science Foundation of China (Grant No. 42192580 and 42192583), Hubei Province Natural Science Foundation (Grant No. 2021CFA088), The Science and Technology Major Project (Grant No. 2021AAA010), and Wuhan University - Huawei Geoinformatics Innovation Laboratory. We also like to acknowledge the supercomputing system in Supercomputing Center of Wuhan University for the support of numerical calculations.

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, 430072, Wuhan, Hubei, China
Yusheng Qin & Tiancan Mei
School of Remote Sensing and Information Engineering, Wuhan University, 430079, Wuhan, Hubei, China
Yusheng Qin, Zhi Gao & Xuhui Zhao
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, 999077, Shatin NT, Hong Kong, China
Zhipeng Lin
Department of Mathematics and Theories, Peng Cheng Laboratory, 518000, Shenzhen, China
Weiwei Song

Authors

Yusheng Qin
View author publications
You can also search for this author in PubMed Google Scholar
Tiancan Mei
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Song
View author publications
You can also search for this author in PubMed Google Scholar
Xuhui Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yusheng Qin and Tiancan Mei. The first draft of the manuscript was written by Yusheng Qin and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tiancan Mei.

Ethics declarations

Ethics Approval

The authors declare that no human or animal subjects are involved in the study.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

Patients signed informed consent regarding publishing their data and photographs.

Conflicts of Interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qin, Y., Mei, T., Gao, Z. et al. RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping. J Intell Robot Syst 105, 90 (2022). https://doi.org/10.1007/s10846-022-01697-y

Download citation

Received: 08 March 2022
Accepted: 08 July 2022
Published: 08 August 2022
DOI: https://doi.org/10.1007/s10846-022-01697-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping

Abstract

Access this article

Similar content being viewed by others

PoseFusion: Dense RGB-D SLAM in Dynamic Human Environments

Approach to 3D SLAM for Mobile Robot Based on RGB-D Image with Semantic Feature in Dynamic Environment

Semantic Ground Plane Constraint in Visual SLAM for Indoor Scenes

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping

Abstract

Access this article

Similar content being viewed by others

PoseFusion: Dense RGB-D SLAM in Dynamic Human Environments

Approach to 3D SLAM for Mobile Robot Based on RGB-D Image with Semantic Feature in Dynamic Environment

Semantic Ground Plane Constraint in Visual SLAM for Indoor Scenes

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation