Skip to main content
Log in

RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Dynamic environments pose a severe challenge to visual SLAM as moving objects invalidate the assumption of a static background. While recent works employ deep learning to address the challenge, they still fail to determine whether an object actually moves or not, resulting in the misguidance of object tracking and background reconstruction. Hence we design a SLAM system to simultaneously estimate trajectory and construct object-level dense 3D semantic maps in dynamic environments. Synergizing deep learning-based object detection, we leverage geometric constraints by using optical flow and the relationship between objects to identify those moving but predefined static objects. To construct more precise 3D semantic maps, our method employs an unsupervised algorithm to segment 3D point cloud generated by depth data into meaningful clusters. The 3D point clusters are then synergized with semantic cues generated by deep learning to produce a more accurate 3D semantic map. We evaluate the proposed system on TUM RGB-D dataset and ICL-NUIM dataset as well as in real-world indoor environments. Qualitative and quantitative experiments show that our method outperforms state-of-the-art approaches in various dynamic scenes in terms of both accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

All data and materials generated or analysed during this study are included in this published article and its supplementary information files.

Code Availability

The code genesrated during the current study will soon be available in the jason-yspjf repository(“https://github.com/jason-yspjf”).

References

  1. Civera, J., Grasa, O.G., Davison, A.J., Montiel, J.: 1-point ransac for extended kalman filtering: Application to real-time structure from motion and visual odometry. J. Field Robot 27(5), 609–631 (2010)

    Article  Google Scholar 

  2. Sim, R., Elinas, P., Griffin, M., Little, J.J., et al. : Vision-based slam using the rao-blackwellised particle filter. In: IJCAI Workshop on Reasoning with Uncertainty in Robotics, vol. 14, pp. 9–16 (2005)

  3. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: Real-time single camera slam. IEEE Trans. Pattern Anal. Machine Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  4. Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234. IEEE (2007)

  5. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  6. Zhang, J., Henein, M., Mahony, R., Ila, V.: Vdo-slam: a visual dynamic object-aware slam system. arXiv preprint arXiv:2005.11052 (2020)

  7. Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)

    Article  Google Scholar 

  8. Qin, T., Li, P., Shen, S.: Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34(4), 1004–1020 (2018)

    Article  Google Scholar 

  9. Wang, C.-C., Thorpe, C., Thrun, S., Hebert, M., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 26(9), 889–916 (2007)

    Article  Google Scholar 

  10. Rosen, D.M., Mason, J., Leonard, J.J.: Towards lifelong feature-based mapping in semi-static environments. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1063–1070. IEEE (2016)

  11. Krajník, T., Fentanes, J.P., Santos, J.M., Duckett, T.: Fremen: Frequency map enhancement for long-term mobile robot autonomy in changing environments. IEEE Trans. Robot. 33(4), 964–977 (2017)

    Article  Google Scholar 

  12. Palazzolo, E., Behley, J., Lottes, P., Giguere, P., Stachniss, C.: Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7855–7862. IEEE (2019)

  13. Scona, R., Jaimez, M., Petillot, Y.R., Fallon, M., Cremers, D.: Staticfusion: Background reconstruction for dense rgb-d slam in dynamic environments. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3849–3856. IEEE (2018)

  14. Bescos, B., Fácil, J.M., Civera, J., Neira, J.: Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)

    Article  Google Scholar 

  15. Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174. IEEE (2018)

  16. Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-slam: Making object detection and slam mutually beneficial. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001–1010. IEEE (2018)

  17. Ji, T., Wang, C., Xie, L.: Towards real-time semantic rgb-d slam in dynamic environments. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11175–11181. IEEE (2021)  

  18. Runz, M., Buffier, M., Agapito, L.: Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 10–20. IEEE (2018)

  19. Xu, B., Li, W., Tzoumanikas, D., Bloesch, M., Davison, A., Leutenegger, S.: Mid-fusion: Octree-based object-level multi-instance dynamic slam. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5231–5237. IEEE (2019)

  20. Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.J.: Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot 32(6), 1309–1332 (2016)

    Article  Google Scholar 

  21. Bakkay, M.C., Arafa, M., Zagrouba, E.: Dense 3d slam in dynamic scenes using kinect. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 121–129. Springer (2015)

  22. Zhang, T., Zhang, H., Li, Y., Nakamura, Y., Zhang, L.: Flowfusion: Dynamic dense rgb-d slam based on optical flow. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7322–7328. IEEE (2020)

  23. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  24. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  25. Hermans, A., Floros, G., Leibe, B.: Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2631–2638. IEEE (2014)

  26. Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079–5085. IEEE (2017)

  27. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4628–4635. IEEE (2017)

  28. Cheng, J., Wang, C., Mai, X., Min, Z., Meng, M.Q.-H.: Improving dense mapping for mobile robots in dynamic environments based on semantic information. IEEE Sens. J. 21(10), 11740–11747 (2020)

    Article  Google Scholar 

  29. Zhao, X., Zuo, T., Hu, X.: OFM-SLAM: a visual semantic SLAM for dynamic indoor environments. Math. Probl. Eng. 2021 (2021). Hindawi

  30. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  31. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)

  32. Lucas, B.D., Kanade, T., et al. : An iterative image registration technique with an application to stereo vision. Vancouver (1981)

  33. Chen, L.-H., Peng, C.-C.: A robust 2d-slam technology with environmental variation adaptability. IEEE Sens. J. 19(23), 11475–11491 (2019)

    Article  Google Scholar 

  34. Pham, T.T., Eich, M., Reid, I., Wyeth, G.: Geometrically consistent plane extraction for dense indoor 3d maps segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4199–4204. IEEE (2016)

  35. Christoph Stein, S., Schoeler, M., Papon, J., Worgotter, F.: Object partitioning using local convexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2014)

  36. Verdoja, F., Thomas, D., Sugimoto, A.: Fast 3d point cloud segmentation using supervoxels with geometry and color for 3d scene understanding. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1285–1290. IEEE (2017)

  37. Gala, D., Lindsay, N., Sun, L.: Multi-sound-source localization using machine learning for small autonomous unmanned vehicles with a self-rotating bi-microphone array. J. Intell. Robot. Syst. 103(3), 1–20 (2021)

    Article  Google Scholar 

  38. Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: Octomap: An efficient probabilistic 3d mapping framework based on octrees. Autonomous Robots 34(3), 189–206 (2013)

    Article  Google Scholar 

  39. Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580. IEEE (2012)

  40. Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524–1531. IEEE (2014)

  41. Whelan, T., Salas-Moreno, R.F., Glocker, B., Davison, A.J., Leutenegger, S.: Elasticfusion: Real-time dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)

    Article  Google Scholar 

Download references

Funding

This research is partially supported by the National Natural Science Foundation of China (Grant No. 42192580 and 42192583), Hubei Province Natural Science Foundation (Grant No. 2021CFA088), The Science and Technology Major Project (Grant No. 2021AAA010), and Wuhan University - Huawei Geoinformatics Innovation Laboratory. We also like to acknowledge the supercomputing system in Supercomputing Center of Wuhan University for the support of numerical calculations.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yusheng Qin and Tiancan Mei. The first draft of the manuscript was written by Yusheng Qin and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tiancan Mei.

Ethics declarations

Ethics Approval

The authors declare that no human or animal subjects are involved in the study.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

Patients signed informed consent regarding publishing their data and photographs.

Conflicts of Interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Mei, T., Gao, Z. et al. RGB-D SLAM in Dynamic Environments with Multilevel Semantic Mapping. J Intell Robot Syst 105, 90 (2022). https://doi.org/10.1007/s10846-022-01697-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-022-01697-y

Keywords

Navigation