research-article

Dynamic Scene Vision SLAM Based on Target Detection in RGB-D Images

Authors:

Wangge BaoAuthors Info & Claims

ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

Pages 195 - 200

https://doi.org/10.1145/3633637.3633666

Published: 28 February 2024 Publication History

Abstract

Visual SLAM is easily interfered by movable objects in dynamic scenes, which reduces the localization accuracy and robustness due to existence of inaccurate key points on movable objects. To address this problem, this paper proposes a visual SLAM algorithm for dynamic scenes based on target detection in RGB-D images. The algorithm first identifies movable objects in the scene using the Yolov5 target detector, whose results will be transmitted into a SLAM framework through socket communication. Then a threshold operation on a depth map is used to generate a mask of movable objects have been removed are inputted into the ORB-SLAM2 system. Experimental results show that the proposed algorithm successfully handles dynamic scenes, obtaining a better balance between processing speed and localization accuracy of the reconstructed map comparing with some other SLAM system for dynamic scenes.

References

[1]

Yongbao Ai, Ting Rui, Ming Lu, Lei Fu, Shuai Liu, and Song Wang. 2020. DDL-SLAM: A robust RGB-D SLAM in dynamic environments combined with deep learning. Ieee Access 8 (2020), 162335–162342.

[2]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39, 12 (2017), 2481–2495.

[3]

Irene Ballester, Alejandro Fontán, Javier Civera, Klaus H Strobl, and Rudolph Triebel. 2021. DOT: Dynamic object tracking for visual SLAM. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11705–11711.

Digital Library

[4]

Berta Bescos, José M Fácil, Javier Civera, and José Neira. 2018. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters 3, 4 (2018), 4076–4083.

[5]

Carlos Campos, Richard Elvira, Juan J Gómez Rodríguez, José MM Montiel, and Juan D Tardós. 2021. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics 37, 6 (2021), 1874–1890.

[6]

Weichen Dai, Yu Zhang, Ping Li, Zheng Fang, and Sebastian Scherer. 2020. Rgb-d slam in dynamic environments using point correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2020), 373–389.

Digital Library

[7]

Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. 2007. MonoSLAM: Real-time single camera SLAM. IEEE transactions on pattern analysis and machine intelligence 29, 6 (2007), 1052–1067.

[8]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.

[9]

Wolfgang Hess, Damon Kohler, Holger Rapp, and Daniel Andor. 2016. Real-time loop closure in 2D LIDAR SLAM. In 2016 IEEE international conference on robotics and automation (ICRA). IEEE, 1271–1278.

Digital Library

[10]

Yi Lin, Fei Gao, Tong Qin, Wenliang Gao, Tianbo Liu, William Wu, Zhenfei Yang, and Shaojie Shen. 2018. Autonomous aerial navigation using monocular visual-inertial fusion. Journal of Field Robotics 35, 1 (2018), 23–51.

[11]

Sherif AS Mohamed, Mohammad-Hashem Haghbayan, Tomi Westerlund, Jukka Heikkonen, Hannu Tenhunen, and Juha Plosila. 2019. A survey on odometry for autonomous navigation systems. IEEE access 7 (2019), 97466–97486.

[12]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics 31, 5 (2015), 1147–1163.

[13]

Raul Mur-Artal and Juan D Tardós. 2017. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics 33, 5 (2017), 1255–1262.

Digital Library

[14]

Gokul B Nair, Swapnil Daga, Rahul Sajnani, Anirudha Ramesh, Junaid Ahmed Ansari, Krishna Murthy Jatavallabhula, and K Madhava Krishna. 2020. Multi-object monocular SLAM for dynamic environments. In 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 651–657.

[15]

Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguere, and Cyrill Stachniss. 2019. ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 7855–7862.

Digital Library

[16]

Tong Qin, Peiliang Li, and Shaojie Shen. 2018. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34, 4 (2018), 1004–1020.

Digital Library

[17]

Martin Rünz and Lourdes Agapito. 2017. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4471–4478.

Digital Library

[18]

Raluca Scona, Mariano Jaimez, Yvan R Petillot, Maurice Fallon, and Daniel Cremers. 2018. Staticfusion: Background reconstruction for dense rgb-d slam in dynamic environments. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 3849–3856.

Digital Library

[19]

Guangjun Shi, Xiangyang Xu, and Yaping Dai. 2013. SIFT feature point matching based on improved RANSAC algorithm. In 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, Vol. 1. IEEE, 474–477.

Digital Library

[20]

Linlin Xia, Jiashuo Cui, Ran Shen, Xun Xu, Yiping Gao, and Xinying Li. 2020. A survey of image semantics-based visual simultaneous localization and mapping: Application-oriented solutions to autonomous navigation of mobile robots. International Journal of Advanced Robotic Systems 17, 3 (2020), 1729881420919185.

[21]

Wanfang Xie, Peter Xiaoping Liu, and Minhua Zheng. 2020. Moving object segmentation and detection for robust RGBD-SLAM in dynamic environments. IEEE Transactions on Instrumentation and Measurement 70 (2020), 1–8.

[22]

Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, and Qiao Fei. 2018. DS-SLAM: A semantic visual SLAM towards dynamic environments. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 1168–1174.

Digital Library

[23]

Jun Zhang, Mina Henein, Robert Mahony, and Viorela Ila. 2020. VDO-SLAM: A Visual Dynamic Object-aware SLAM System. (2020).

[24]

Ji Zhang and Sanjiv Singh. 2014. LOAM: Lidar odometry and mapping in real-time. In Robotics: Science and systems, Vol. 2. Berkeley, CA, 1–9.

[25]

Xinguang Zhang, Ruidong Zhang, and Xiankun Wang. 2022. Visual SLAM Mapping Based on YOLOv5 in Dynamic Scenes. Applied Sciences 12, 22 (2022), 11548.

[26]

Fangwei Zhong, Sheng Wang, Ziqi Zhang, and Yizhou Wang. 2018. Detect-SLAM: Making object detection and SLAM mutually beneficial. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1001–1010.

Index Terms

Dynamic Scene Vision SLAM Based on Target Detection in RGB-D Images
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
      1. Robotic control

Recommendations

Vision-based Moving Target Tracking of Certain Target for Quadruped Robots
Abstract
Due to the high flexibility of quadruped robots compared with some traditional robots, it has become an important branch in the field of mobile robot research. Target detection and tracking technology is important for the environment perception ...
SLAM system based on improved DeepLabv3+ semantic network model
ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing

Most of visual synchronous localization and mapping (VSLAM) algorithms are designed based on static scenes and the influence of moving objects in the scene can not be ignored. Due to the presence of moving objects in real scenes, the feature points of ...
Improving target detection by coupling it with tracking

Target detection and tracking represent two fundamental steps in automatic video-based surveillance systems where the goal is to provide intelligent recognition capabilities by analyzing target behavior. This paper presents a framework for video-based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

October 2023

589 pages

ISBN:9798400707988

DOI:10.1145/3633637

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCPR 2023

ICCPR 2023: 2023 12th International Conference on Computing and Pattern Recognition

October 27 - 29, 2023

Qingdao, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
17
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten