Skip to main content
Log in

Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection

  • Regular Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Simultaneous Localization and Mapping is a fundamental problem in mobile robotics. However, the majority of Visual SLAM algorithms assume a static scenario, limiting their applicability in real-world environments. Dealing with dynamic content in Visual SLAM is still an open problem, with solutions usually relying on purely geometric approaches. Deep learning techniques can improve the SLAM solution in environments with a priori dynamic objects, providing high-level information of the scene. However, most solutions are not prepared to deal with crowded scenarios. This paper presents Crowd-SLAM, a new approach to SLAM for crowded environments using object detection. The main objective is to achieve high accuracy while increasing the performance, in comparison with other methods. The system is built on ORB-SLAM2, a state-of-the-art SLAM system. The proposed methodology is evaluated using benchmark datasets, outperforming other Visual SLAM methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of Data and Materials

The code and datasets used are publicly available in:

- Crowd-SLAM:

https://github.com/virgolinosoares/Crowd-SLAM

- MOT Challenge: https://motchallenge.net/

- TUM RGB-D Dataset:

https://vision.in.tum.de/data/datasets/rgbd-dataset

- LOEWENPLATZ:

https://data.vision.ee.ethz.ch/cvl/aess/dataset/

Bonn RGB-D Dynamic Dataset:

http://www.ipb.uni-bonn.de/data/rgbd-dynamic-dataset/

References

  1. Mur-Artal, R., Tardós, J.: ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 1255–1262 (2017)

    Article  Google Scholar 

  2. Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: Large-Scale direct monocular SLAM. In: European Conference On Computer Vision, pp 834–849 (2014)

  3. Endres, F., Hess, J., Sturm, J., Cremers, D.: Burgard, w.: 3D mapping with an RGB-D camera. IEEE Trans. Robot. 30(1), 177–187 (2013)

    Article  Google Scholar 

  4. Labbé, M., Michaud, F.: RTAB-Map as an open-source lidar and visual SLAM library for large-scale and long-term online operation. J. Field Robot. 36(2), 416–446 (2019)

    Article  Google Scholar 

  5. Soares, J.C.V., Gattass, M., Meggiolaro, M.: Visual SLAM in human populated environments: exploring the trade-off between accuracy and speed of YOLO and Mask R-CNN. In: Proc of the IEEE International Conference on Advanced Robotics (2019)

  6. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Machine Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  7. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (2007)

  8. Mur-Artal, R., Montiel, J., Tardós, J.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transact. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  9. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31, 647–663 (2012)

    Article  Google Scholar 

  10. Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 10th IEEE International Symposium on Mixed and Augmented Reality, pp 127–136 (2011)

  11. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Machine Intell. 40(3), 611–625 (2017)

    Article  Google Scholar 

  12. Scona, R., Jaimez, M., Petillot, Y.R., Fallon, M., Cremers, D.: StaticFusion: background reconstruction for dense RGB-D SLAM in dynamic environments. In: IEEE International Conference on Robotics and Automation (2018)

  13. Palazzolo, E., Behley, J., Lottes, P., Giguère, P., Stachniss, C.: ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2019)

  14. Dib, A., Charpillet, F.: Robust dense visual odometry for RGB-D cameras in a dynamic environment. In: Proc. of the International Conference on Advanced Robotics, Istanbul, pp 1–7 (2015)

  15. Alcantarilla, P.F., Yebes, J.J., Almazan, J., Bergasa, L.M.: On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In: Proc. of the International Conference on Robotics and Automation (2012)

  16. Sun, Y., Liu, M., Meng, M.Q.H.: Improving RGB-D SLAM in dynamic environments: a motion removal approach. Robot. Autonom. Syst. 89, 110–122 (2017)

    Article  Google Scholar 

  17. Sun, Y., Liu, M., Meng, M.Q.H.: Motion removal for reliable RGB-D SLAM in dynamic environments. Robot. Auton. Syst. 108, 115–128 (2018)

    Article  Google Scholar 

  18. Wang, Y., Huang, S.: Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios. In: Proc. of the 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, pp 1841–1846 (2014)

  19. Kim, D., Kim, J.: Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Trans. Robot. 32(6), 1565–1573 (2016)

    Article  Google Scholar 

  20. Cheng, J., Sun, Y., Chi, W., Wang, C., Cheng, H., Meng, M.Q.H.: An accurate localization scheme for mobile robots using optical flow in dynamic environments. In: Proc. of the IEEE International Conference on Robotics and Biomimetics (ROBIO), 723–728 (2018)

  21. Yu, C., Liu, Z., Liu, X., Xie, F., Yang, Y., Wei, Q., Fei, Q.: DS-SLAM: a semantic visual SLAM towards dynamic environments. In: Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (2018)

  22. Cui, L., Ma, C.: SOF-SLAM: A semantic visual SLAM for dynamic environments. IEEE Access 7, 166528–166539 (2019)

    Article  Google Scholar 

  23. Sun, T., Sun, Y., Liu, M., Yeung, D.Y.: Movable-object-aware visual SLAM via weakly supervised semantic segmentation. arXiv:1906.03629 (2019)

  24. Bescós, B., Fácil, J., Civeira, J., Neira, J.: DynaSLAM: Tracking, mapping and inpainting in dynamic environments. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)

    Article  Google Scholar 

  25. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proc. of the IEEE International Conference on Computer Vision (ICCV), pp 2961–2969 (2017)

  26. Girshick, R.: Fast R-CNN. In: Proc. of the IEEE International Conference on Computer Vision (ICCV), pp 1440–1448 (2015)

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems 91–99 (2015)

  28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788 (2016)

  29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: Proc. of the European Conference on Computer Vision, pp 21–37 (2016)

  30. Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. In: Arxiv:1804.02767 (2018)

  31. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C. L.: Microsoft coco: Common objects in context. In: Proc. of the 13th European Conference on Computer Vision (2014)

  32. Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-SLAM: making object detection and SLAM mutually beneficial. In: IEEE Winter Conference on Applications of Computer Vision (WACV)., pp 1001–1010 (2018)

  33. Liu, H., Liu, G., Tian, G., Xin, S., Ji, Z.: Visual SLAM based on dynamic object removal. In: IEEE International Conference on Robotics and Biomimetics (ROBIO), pp 596–601 (2019)

  34. Xiao, L., Wang, J., Qiu, X., Rong, Z., Zou, X.: Dynamic-SLAM: semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 117, 1–16 (2019)

    Article  Google Scholar 

  35. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Proc. of the IEEE International Conference on Computer Vision (ICCV), pp 2564–2571 (2011)

  36. Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)

    Article  Google Scholar 

  37. Kuemmerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: G2o: A general framework for graph optimization. In: IEEE Int. Conf. on Robot. and Autom. (ICRA), pp 3607–3613 (2011)

  38. Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L.: MOT20: A benchmark for multi object tracking in crowded scenes. In: arXiv:2003.09003 (2020)

  39. Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. In: arXiv:1805.00123 (2018)

  40. Redmon, J.: Darknet: Open source neural networks in C. http://pjreddie.com/darknet (2016)

  41. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: A benchmark for multi-object tracking. In: arXiv:1603.00831 (2016)

  42. Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 evaluation. In: International evaluation workshop on classification of events, activities and relationships, pp 1–44 (2006)

  43. Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 573–580 (2012)

  44. Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8 (2008)

  45. Scaramuzza, D., Spinello, L., Triebel, R., Siegwart, R.: Key technologies for intelligent and safer cars-from motion estimation to predictive collision avoidance. In: IEEE International Symposium on Industrial Electronics, pp 2803–2808 (2010)

Download references

Funding

Partial financial support was received from the Brazilian National Council for Scientific and Technological Development (CNPq).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the methodology conception and design. João Carlos Virgolino Soares collected the data, performed the analysis, and wrote the manuscript. Marcelo Gattass and Marco Antonio Meggiolaro reviewed and approved the manuscript.

Corresponding author

Correspondence to João Carlos Virgolino Soares.

Ethics declarations

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(MP4 124 MB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soares, J.C.V., Gattass, M. & Meggiolaro, M.A. Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection. J Intell Robot Syst 102, 50 (2021). https://doi.org/10.1007/s10846-021-01414-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-021-01414-1

Keywords

Navigation