Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection

Soares, João Carlos Virgolino; Gattass, Marcelo; Meggiolaro, Marco Antonio

doi:10.1007/s10846-021-01414-1

Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection

Regular Paper
Published: 28 May 2021

Volume 102, article number 50, (2021)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

João Carlos Virgolino Soares ORCID: orcid.org/0000-0002-6278-378X¹,
Marcelo Gattass² &
Marco Antonio Meggiolaro¹

1187 Accesses
23 Citations
Explore all metrics

Abstract

Simultaneous Localization and Mapping is a fundamental problem in mobile robotics. However, the majority of Visual SLAM algorithms assume a static scenario, limiting their applicability in real-world environments. Dealing with dynamic content in Visual SLAM is still an open problem, with solutions usually relying on purely geometric approaches. Deep learning techniques can improve the SLAM solution in environments with a priori dynamic objects, providing high-level information of the scene. However, most solutions are not prepared to deal with crowded scenarios. This paper presents Crowd-SLAM, a new approach to SLAM for crowded environments using object detection. The main objective is to achieve high accuracy while increasing the performance, in comparison with other methods. The system is built on ORB-SLAM2, a state-of-the-art SLAM system. The proposed methodology is evaluated using benchmark datasets, outperforming other Visual SLAM methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Availability of Data and Materials

The code and datasets used are publicly available in:

- Crowd-SLAM:

https://github.com/virgolinosoares/Crowd-SLAM

- MOT Challenge: https://motchallenge.net/

- TUM RGB-D Dataset:

https://vision.in.tum.de/data/datasets/rgbd-dataset

- LOEWENPLATZ:

https://data.vision.ee.ethz.ch/cvl/aess/dataset/

Bonn RGB-D Dynamic Dataset:

http://www.ipb.uni-bonn.de/data/rgbd-dynamic-dataset/

References

Mur-Artal, R., Tardós, J.: ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 1255–1262 (2017)
Article Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: Large-Scale direct monocular SLAM. In: European Conference On Computer Vision, pp 834–849 (2014)
Endres, F., Hess, J., Sturm, J., Cremers, D.: Burgard, w.: 3D mapping with an RGB-D camera. IEEE Trans. Robot. 30(1), 177–187 (2013)
Article Google Scholar
Labbé, M., Michaud, F.: RTAB-Map as an open-source lidar and visual SLAM library for large-scale and long-term online operation. J. Field Robot. 36(2), 416–446 (2019)
Article Google Scholar
Soares, J.C.V., Gattass, M., Meggiolaro, M.: Visual SLAM in human populated environments: exploring the trade-off between accuracy and speed of YOLO and Mask R-CNN. In: Proc of the IEEE International Conference on Advanced Robotics (2019)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Machine Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (2007)
Mur-Artal, R., Montiel, J., Tardós, J.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transact. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31, 647–663 (2012)
Article Google Scholar
Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 10th IEEE International Symposium on Mixed and Augmented Reality, pp 127–136 (2011)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Machine Intell. 40(3), 611–625 (2017)
Article Google Scholar
Scona, R., Jaimez, M., Petillot, Y.R., Fallon, M., Cremers, D.: StaticFusion: background reconstruction for dense RGB-D SLAM in dynamic environments. In: IEEE International Conference on Robotics and Automation (2018)
Palazzolo, E., Behley, J., Lottes, P., Giguère, P., Stachniss, C.: ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2019)
Dib, A., Charpillet, F.: Robust dense visual odometry for RGB-D cameras in a dynamic environment. In: Proc. of the International Conference on Advanced Robotics, Istanbul, pp 1–7 (2015)
Alcantarilla, P.F., Yebes, J.J., Almazan, J., Bergasa, L.M.: On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In: Proc. of the International Conference on Robotics and Automation (2012)
Sun, Y., Liu, M., Meng, M.Q.H.: Improving RGB-D SLAM in dynamic environments: a motion removal approach. Robot. Autonom. Syst. 89, 110–122 (2017)
Article Google Scholar
Sun, Y., Liu, M., Meng, M.Q.H.: Motion removal for reliable RGB-D SLAM in dynamic environments. Robot. Auton. Syst. 108, 115–128 (2018)
Article Google Scholar
Wang, Y., Huang, S.: Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios. In: Proc. of the 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, pp 1841–1846 (2014)
Kim, D., Kim, J.: Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Trans. Robot. 32(6), 1565–1573 (2016)
Article Google Scholar
Cheng, J., Sun, Y., Chi, W., Wang, C., Cheng, H., Meng, M.Q.H.: An accurate localization scheme for mobile robots using optical flow in dynamic environments. In: Proc. of the IEEE International Conference on Robotics and Biomimetics (ROBIO), 723–728 (2018)
Yu, C., Liu, Z., Liu, X., Xie, F., Yang, Y., Wei, Q., Fei, Q.: DS-SLAM: a semantic visual SLAM towards dynamic environments. In: Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (2018)
Cui, L., Ma, C.: SOF-SLAM: A semantic visual SLAM for dynamic environments. IEEE Access 7, 166528–166539 (2019)
Article Google Scholar
Sun, T., Sun, Y., Liu, M., Yeung, D.Y.: Movable-object-aware visual SLAM via weakly supervised semantic segmentation. arXiv:1906.03629 (2019)
Bescós, B., Fácil, J., Civeira, J., Neira, J.: DynaSLAM: Tracking, mapping and inpainting in dynamic environments. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proc. of the IEEE International Conference on Computer Vision (ICCV), pp 2961–2969 (2017)
Girshick, R.: Fast R-CNN. In: Proc. of the IEEE International Conference on Computer Vision (ICCV), pp 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems 91–99 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: Proc. of the European Conference on Computer Vision, pp 21–37 (2016)
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. In: Arxiv:1804.02767 (2018)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C. L.: Microsoft coco: Common objects in context. In: Proc. of the 13th European Conference on Computer Vision (2014)
Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-SLAM: making object detection and SLAM mutually beneficial. In: IEEE Winter Conference on Applications of Computer Vision (WACV)., pp 1001–1010 (2018)
Liu, H., Liu, G., Tian, G., Xin, S., Ji, Z.: Visual SLAM based on dynamic object removal. In: IEEE International Conference on Robotics and Biomimetics (ROBIO), pp 596–601 (2019)
Xiao, L., Wang, J., Qiu, X., Rong, Z., Zou, X.: Dynamic-SLAM: semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 117, 1–16 (2019)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Proc. of the IEEE International Conference on Computer Vision (ICCV), pp 2564–2571 (2011)
Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)
Article Google Scholar
Kuemmerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: G2o: A general framework for graph optimization. In: IEEE Int. Conf. on Robot. and Autom. (ICRA), pp 3607–3613 (2011)
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L.: MOT20: A benchmark for multi object tracking in crowded scenes. In: arXiv:2003.09003 (2020)
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. In: arXiv:1805.00123 (2018)
Redmon, J.: Darknet: Open source neural networks in C. http://pjreddie.com/darknet (2016)
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: A benchmark for multi-object tracking. In: arXiv:1603.00831 (2016)
Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 evaluation. In: International evaluation workshop on classification of events, activities and relationships, pp 1–44 (2006)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 573–580 (2012)
Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8 (2008)
Scaramuzza, D., Spinello, L., Triebel, R., Siegwart, R.: Key technologies for intelligent and safer cars-from motion estimation to predictive collision avoidance. In: IEEE International Symposium on Industrial Electronics, pp 2803–2808 (2010)

Download references

Funding

Partial financial support was received from the Brazilian National Council for Scientific and Technological Development (CNPq).

Author information

Authors and Affiliations

Department of Mechanical Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
João Carlos Virgolino Soares & Marco Antonio Meggiolaro
Department of Informatics / Tecgraf Institute, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
Marcelo Gattass

Authors

João Carlos Virgolino Soares
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Gattass
View author publications
You can also search for this author in PubMed Google Scholar
Marco Antonio Meggiolaro
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the methodology conception and design. João Carlos Virgolino Soares collected the data, performed the analysis, and wrote the manuscript. Marcelo Gattass and Marco Antonio Meggiolaro reviewed and approved the manuscript.

Corresponding author

Correspondence to João Carlos Virgolino Soares.

Ethics declarations

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(MP4 124 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soares, J.C.V., Gattass, M. & Meggiolaro, M.A. Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection. J Intell Robot Syst 102, 50 (2021). https://doi.org/10.1007/s10846-021-01414-1

Download citation

Received: 01 July 2020
Accepted: 04 May 2021
Published: 28 May 2021
DOI: https://doi.org/10.1007/s10846-021-01414-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Availability of Data and Materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Crowd-SLAM: Visual SLAM Towards Crowded Environments using Object Detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Availability of Data and Materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation