AGAM-SLAM: An Adaptive Dynamic Scene Semantic SLAM Method Based on GAM

Cai, Dupeng; Hu, Zhuhua; Li, Ruoqing; Qi, Hao; Xiang, Yunfeng; Zhao, Yaochi

doi:10.1007/978-981-99-4761-4_3

Dupeng Cai¹³,
Zhuhua Hu¹³,
Ruoqing Li¹⁴,
Hao Qi¹³,
Yunfeng Xiang¹³ &
…
Yaochi Zhao¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14090))

Included in the following conference series:

International Conference on Intelligent Computing

854 Accesses

Abstract

With rapid developments in the fields of autonomous driving, robot navigation, and augmented reality, visual SLAM technology has become one of the core key technologies. While VSLAM systems perform more consistently in static scenes, introducing dynamic objects such as people, vehicles, and animals into the scene makes reliable map building and localization more difficult, and accurate trajectory estimation more challenging to achieve. In this paper, we propose a semantic VSLAM system based on the Global attention mechanism (GAM) and adaptive thresholding. First, GAM improves the segmentation accuracy of the Mask R-CNN network model for dynamic objects and eliminates the influence of dynamic objects on the VSLAM system. In addition, adaptive thresholding generates adaptive factors based on the number of key points extracted in the scene and dynamically adjusts the FAST threshold, which enables more stable extraction of feature points in dynamic scenes. We have verified our approach on the TUM public dataset, and compared with the DynaSLAM method. The absolute trajectory error (ATE) and relative trajectory error (RPE) are reduced to some extent on its dataset. Especially on the W_rpy dataset, the accuracy of our method is improved by 38.78%. The experimental results show that our proposed method can significantly improve the overall performance of the system in highly dynamic environments.

This work was supported in part by the Key Research and Development Project of Hainan Province (ZDYF2022GXJS348, ZDYF2022SHFZ039), the Hainan Province Natural Science Foundation (623RC446) and the National Natural Science Foun- dation of China (62161010, 61963012). The authors would like to thank the referees for their constructive suggestions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chang ,Y., Tian, Y., How, J.P., Carlone, L.: Kimera-multi: a system for distributed multi-robot metric-semantic simultaneous localization and mapping. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11210−11218 (IEEE)
Google Scholar
Cheng, J., Zhang, L., Chen, Q., Hu, X., Cai, J.: A review of visual slam methods for autonomous driving vehicles. Appl. Artif. Intell. 114, 104992 (2022)
Article Google Scholar
Jinyu, L., Bangbang, Y., Danpeng, C., Nan, W., Guofeng, Z., Hujun B.: Survey and evaluation of monocular visual-inertial slam algorithms for augmented reality. Virt. Real. Intell. Hardw. 1(4), 386 410 (2019)
Google Scholar
Liu, Y., Miura J.: Rds-slam: Real-time dynamic slam using semantic segmentation methods. IEEE Access 9, 23772−23785 (2021)
Google Scholar
Wang, H., Ko, J.Y., Xie, L.: Multi-modal semantic slam for complex dynamic environments (2022)
Google Scholar
Li, A., Wang, J., Xu, M., Chen, Z.: DP-SLAM: A visual slam with moving probability towards dynamic environments. Science 556, 128−142 (2021)
Google Scholar
Mur-Artal, R., Montiel, J.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147−1163 (2015)
Google Scholar
Mur-Artal, R., Tardós, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255−1262 (2017)
Google Scholar
Yu, C., et al.: DS- SLAM: a semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168−1174. IEEE (2018)
Google Scholar
Zhao, C., Sun, L., Purkait, P., Duckett, T., Stolkin, R.: Learning monocular visual odometry with dense 3d mapping from dense 3d ow. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6864 6871. IEEE (2018)
Google Scholar
Konda, K.R., Memisevic, R.: Learning visual odometry with a convolutional network. VISAPP 486(490), 2015 (2015)
Google Scholar
Wu, W., Guo, L., Gao, H., You, Z., Liu, Y., Chen, Z.: Yolo-slam: a semantic slam system to- wards dynamic environment with geometric constraint. Neural Comput. Appl. 1−16 (2022)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement (2018)
Google Scholar
Bescos, B., Fácil, J.M., Civera, J., Neira, J.: DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076−4083 (2018)
Google Scholar
Alismail, H., Kaess, M., Browning, B., Lucey, S.: Direct visual odometry in low light using binary descriptors. IEEE Robot. Autom. Lett. 2(2), 444−451(2016)
Google Scholar
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: Learning local features from images. Inf. Process. Syst. Yi. Lf-net. 31 (2018)
Google Scholar
Liu, Y., Shao, Z., Hoffmann, N.: Global attention mechanism: retain information to enhance channel-spatial interactions (2021)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961−2969 (2017)
Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3−19 (2318)
Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440−1448 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770−778 (2016)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573 580. IEEE (2012)
Google Scholar

Download references

Acknowledgements

We thank Shenzhen Umouse Technology Development Co., Ltd. For their support in equipments and experimental conditions.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Hainan University, Haikou, 570228, China
Dupeng Cai, Zhuhua Hu, Hao Qi & Yunfeng Xiang
School of Cyberspace Security (School of Cryptology), Hainan University, Haikou, 570228, China
Ruoqing Li & Yaochi Zhao

Authors

Dupeng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhuhua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ruoqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yunfeng Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Yaochi Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuhua Hu .

Editor information

Editors and Affiliations

Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Zhengzhou University of Light Industry, Zhengzhou, China
Baohua Jin
Zhong Yuan University of Technology, Zhengzhou, China
Boyang Qu
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Department of Computer Science, Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, D., Hu, Z., Li, R., Qi, H., Xiang, Y., Zhao, Y. (2023). AGAM-SLAM: An Adaptive Dynamic Scene Semantic SLAM Method Based on GAM. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14090. Springer, Singapore. https://doi.org/10.1007/978-981-99-4761-4_3

Download citation

DOI: https://doi.org/10.1007/978-981-99-4761-4_3
Published: 31 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4760-7
Online ISBN: 978-981-99-4761-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics