Object-aware data association for the semantically constrained visual SLAM

Liu, Yang; Guo, Chi; Wang, Yingli

doi:10.1007/s11370-023-00455-9

Object-aware data association for the semantically constrained visual SLAM

Original Research Paper
Published: 15 February 2023

Volume 16, pages 155–176, (2023)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Yang Liu¹,
Chi Guo^1,2,3 &
Yingli Wang⁴

749 Accesses
3 Citations
Explore all metrics

Abstract

Traditional vSLAM methods extract feature points from images to track and the data association of points is based on low-level geometric clues. When points are observed from variant viewpoints, these clues are not robust for matching. In contrast, semantic information remains consistent for variance of viewpoints and observed scales. Therefore, semantic vSLAM methods gain more attention in recent years. In particular, object-level semantic information can be utilized to model the environment as object landmarks and has been fused into many vSLAM methods which are called object-level vSLAM methods. How to associate objects over consecutive images and how to utilize object information in the pose estimation are two key problems for object-level vSLAM methods. In this work, we propose an object-level vSLAM method which is aimed to solve the object-level data association and estimate camera poses using object semantic constraints. We present an object-level data association scheme considering object appearance and geometry of point landmarks, processing both objects and points matching for mutual improvements. We propose a semantic re-projection error function based on object-level semantic information and integrate it into the pose optimization, establishing longer term constraints. We performed experiments on public datasets including both indoor and outdoor scenes. The evaluation results demonstrate that our method can achieve high accuracy in the object-level data association and outperforms the baseline method in the pose estimation. An open-source version of the code is also available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Slam in Dynamic Scenes Based on Object Tracking and Static Points Detection

Article 08 February 2022

SQ-SLAM: Monocular Semantic SLAM Based on Superquadric Object Representation

Article 22 September 2023

Dynamic point-line SLAM based on lightweight object detection

Article 03 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data and code that support the findings of this study are not openly available due to application for further research and are available from the corresponding author upon reasonable request.

References

Klein G, Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality, IEEE, pp 225–234
Mur-Artal R, Tardos JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262
Article Google Scholar
Engel J, Sch¨o ps T, Cremers D (2014) Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision, Springer, pp 834–849
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Analy Mach Intell 40(3):611–625
Article Google Scholar
Lianos K-N, Schonberger JL, Pollefeys M, Sattler T (2018) Vso: visual semantic odometry. In: Proceedings of the European conference on computer vision (ECCV), pp 234–250
Yang S, Scherer S (2019) Cubeslam: monocular 3-d object slam. IEEE Trans Rob 35(4):925–938
Article Google Scholar
Iqbal A, Gans NR (2018) Localization of classified objects in slam using nonparametric statistics and clustering. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 161–168
Wu Y, Zhang Y, Zhu D, Feng Y, Coleman S, Kerr D (2020) Eao-slam: monocular semi-dense object slam based on ensemble data association. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 4966–4973
Nicholson L, Milford M, Sunderhauf N (2018) Quadricslam: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot Autom Lett 4(1):1–8
Article Google Scholar
Campos C, Elvira R, Rodríguez JJG, Montiel JM, Tardόs JD (2021) Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans Robot 37(6):1874–1890
Article Google Scholar
Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ (2014) Dense planar slam. In: 2014 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE pp 157–164
Hsiao M, Westman E, Zhang G, Kaess M (2017) Keyframe-based dense planar slam. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 5110–5117.
Maity S, Saha A, Bhowmick B (2017) Edge slam: edge points based monocular visual slam. In: Proceedings of the IEEE international conference on computer vision workshops, pp 2408–2417
Gomez-Ojeda R, Moreno F-A, Zuniga-Noel D, Scaramuzza D, Gonzalez-Jimenez J (2019) Pl-slam: a stereo slam system through the combination of points and line segments. IEEE Transactions on Robotics 35(3):734–746
Article Google Scholar
Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer, F (2017) Pl-slam: realtime monocular visual slam with points and lines. In: 2017 IEEE International conference on robotics and automation (ICRA), IEEE, pp 4503–4508
Zhou H, Zou D, Pei L, Ying R, Liu P, Yu W (2015) Structslam: Visual slam with building structure lines. IEEE Trans Vehic Technol 64(4):1364–1375
Article Google Scholar
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 224–236
Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: a trainable cnn for joint description and detection of local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8092–8101
Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4938–4947
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1245
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C. (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
He K, Gkioxari G, Doll´ar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9157–9166
Kirillov A, He K, Girshick R, Rother C, Doll´ar P (2019) Panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9404–9413
Bista SR, Hall D, Talbot B, Zhang H, Dayoub F, Su¨nderhauf N (2021) Evaluating the impact of semantic segmentation and pose estimation on dense semantic slam. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5328–5335
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), IEEE pp 3464–3468
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3645–3649
Bowman SL, Atanasov N, Daniilidis K, Pappas GJ (2017) Probabilistic data association for semantic slam. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1722–1729
Zhang L, Wei L, Shen P, Wei W, Zhu G, Song J (2018) Semantic slam based on object detection and improved octomap. IEEE Access 6:75545–75559
Article Google Scholar
Zhang J, Gui M, Wang Q, Liu R, Xu J, Chen S (2019) Hierarchical topic model based object association for semantic slam. IEEE Trans Visual Comput Graph 25(11):3052–3062
Article Google Scholar
Qian Z, Patath K, Fu J, Xiao J (2021) Semantic slam with autonomous object-level data association. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 11203–11209
Galvez-Lopez D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):188–1197
Article Google Scholar
Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA), IEEE pp 2631–2638
McCormac J, Handa A, Davison A, Leutenegger S (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4628–4635
Zhong F, Wang S, Zhang Z, Wang Y (2018) Detect-slam: making object detection and slam mutually beneficial. In: 2018 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1001–1010
Bescos B, F’acil JM, Civera J, Neira J (2018) Dynaslam: tracking, mapping, and inpainting in dynamic scenes. IEEE Robot Autom Lett 3(4):4076–4083
Article Google Scholar
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174 (2018). IEEE
Wang K, Lin Y, Wang L, Han L, Hua M, Wang X, Lian S, Huang B (2019) A unified framework for mutual improvement of slam and semantic segmentation. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 5224–5230
Ru¨nz M, Agapito L (2017) Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 4471–4478
Runz M, Buffier M, Agapito L (2018) Maskfusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 10–20
Xu B, Li W, Tzoumanikas D, Bloesch M, Davison A, Leutenegger S (2019) Mid-fusion octree-based object-level multi-instance dynamic slam. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 5231–5237
Li P, Qin T, et al (2018) Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In: Proceedings of the European conference on computer vision (ECCV), pp 646–661
Henein M, Zhang J, Mahony R, Ila V (2020) Dynamic slam: the need for speed. In: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, pp 2123–2129
Huang J, Yang S, Mu T-J, Hu S-M (2020) Clustervo: clustering moving instances and estimating visual odometry for self and surroundings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2168–2177
Bescos B, Campos C, Tardo’s JD, Neira J (2021) Dynaslam ii: tightly-coupled multi-object tracking and slam. Robot Autom Lett 6(3):5191–5198
Article Google Scholar
Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ (2013) Slam++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1352–1359
Stenborg E, Toft C, Hammarstrand L (2018) Long-term visual localization using semantically segmented images. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE pp 6484–6490
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, IEEE, pp 2564–2571
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol 96, pp 226–231
Jessee E, Wiebe E (2008) Visual perception and the hsv color system: Exploring color in the communications technology classroom. Technol Eng Teacher 68(1):7
Google Scholar
Vadivel A, Sural S, Majumdar AK (2005) Human color perception in the hsv space and its application in histogram generation for image retrieval. In: Color imaging X: processing, hardcopy, and applications, vol 5667. SPIE, pp 598–609
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: Proceedings of the international conference on intelligent robot systems (IROS)
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

Download references

Funding

This work was supported by a grant from the National Key Research and Development Program of China (2018YFB1305001), the Hubei Major scientific and technological project (2021AAA010) and Open Fund of Hubei Luojia Laboratory.

Author information

Authors and Affiliations

GNSS Research Center, Wuhan University, Bayi, Wuhan, 430072, Hubei, China
Yang Liu & Chi Guo
The Artificial Intelligence Institute, Wuhan University, Bayi, Wuhan, 430072, Hubei, China
Chi Guo
Hubei Luojia Laboratory, Luoyu, Wuhan, 430079, Hubei, China
Chi Guo
School of Geodesy and Geomatics, Wuhan University, Luojiashan, Wuhan, 430072, Hubei, China
Yingli Wang

Authors

Yang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Chi Guo
View author publications
You can also search for this author inPubMed Google Scholar
Yingli Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the methodology conception and design. YL designed the methodology, collected the data, performed the experiments and analysis, and wrote the manuscript. CG designed the methodology, reviewed and approved the manuscript. YW reviewed and approved the manuscript.

Corresponding author

Correspondence to Chi Guo.

Ethics declarations

Competing interest

The authors declare that they have no competing interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Guo, C. & Wang, Y. Object-aware data association for the semantically constrained visual SLAM. Intel Serv Robotics 16, 155–176 (2023). https://doi.org/10.1007/s11370-023-00455-9

Download citation

Received: 20 May 2022
Accepted: 30 November 2022
Published: 15 February 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11370-023-00455-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object-aware data association for the semantically constrained visual SLAM

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visual Slam in Dynamic Scenes Based on Object Tracking and Static Points Detection

SQ-SLAM: Monocular Semantic SLAM Based on Superquadric Object Representation

Dynamic point-line SLAM based on lightweight object detection

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now