Abstract
Traditional vSLAM methods extract feature points from images to track and the data association of points is based on low-level geometric clues. When points are observed from variant viewpoints, these clues are not robust for matching. In contrast, semantic information remains consistent for variance of viewpoints and observed scales. Therefore, semantic vSLAM methods gain more attention in recent years. In particular, object-level semantic information can be utilized to model the environment as object landmarks and has been fused into many vSLAM methods which are called object-level vSLAM methods. How to associate objects over consecutive images and how to utilize object information in the pose estimation are two key problems for object-level vSLAM methods. In this work, we propose an object-level vSLAM method which is aimed to solve the object-level data association and estimate camera poses using object semantic constraints. We present an object-level data association scheme considering object appearance and geometry of point landmarks, processing both objects and points matching for mutual improvements. We propose a semantic re-projection error function based on object-level semantic information and integrate it into the pose optimization, establishing longer term constraints. We performed experiments on public datasets including both indoor and outdoor scenes. The evaluation results demonstrate that our method can achieve high accuracy in the object-level data association and outperforms the baseline method in the pose estimation. An open-source version of the code is also available.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data and code that support the findings of this study are not openly available due to application for further research and are available from the corresponding author upon reasonable request.
References
Klein G, Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality, IEEE, pp 225–234
Mur-Artal R, Tardos JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262
Engel J, Sch¨o ps T, Cremers D (2014) Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision, Springer, pp 834–849
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Analy Mach Intell 40(3):611–625
Lianos K-N, Schonberger JL, Pollefeys M, Sattler T (2018) Vso: visual semantic odometry. In: Proceedings of the European conference on computer vision (ECCV), pp 234–250
Yang S, Scherer S (2019) Cubeslam: monocular 3-d object slam. IEEE Trans Rob 35(4):925–938
Iqbal A, Gans NR (2018) Localization of classified objects in slam using nonparametric statistics and clustering. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 161–168
Wu Y, Zhang Y, Zhu D, Feng Y, Coleman S, Kerr D (2020) Eao-slam: monocular semi-dense object slam based on ensemble data association. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 4966–4973
Nicholson L, Milford M, Sunderhauf N (2018) Quadricslam: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot Autom Lett 4(1):1–8
Campos C, Elvira R, Rodríguez JJG, Montiel JM, Tardόs JD (2021) Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans Robot 37(6):1874–1890
Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ (2014) Dense planar slam. In: 2014 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE pp 157–164
Hsiao M, Westman E, Zhang G, Kaess M (2017) Keyframe-based dense planar slam. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 5110–5117.
Maity S, Saha A, Bhowmick B (2017) Edge slam: edge points based monocular visual slam. In: Proceedings of the IEEE international conference on computer vision workshops, pp 2408–2417
Gomez-Ojeda R, Moreno F-A, Zuniga-Noel D, Scaramuzza D, Gonzalez-Jimenez J (2019) Pl-slam: a stereo slam system through the combination of points and line segments. IEEE Transactions on Robotics 35(3):734–746
Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer, F (2017) Pl-slam: realtime monocular visual slam with points and lines. In: 2017 IEEE International conference on robotics and automation (ICRA), IEEE, pp 4503–4508
Zhou H, Zou D, Pei L, Ying R, Liu P, Yu W (2015) Structslam: Visual slam with building structure lines. IEEE Trans Vehic Technol 64(4):1364–1375
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 224–236
Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: a trainable cnn for joint description and detection of local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8092–8101
Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4938–4947
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1245
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C. (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
He K, Gkioxari G, Doll´ar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9157–9166
Kirillov A, He K, Girshick R, Rother C, Doll´ar P (2019) Panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9404–9413
Bista SR, Hall D, Talbot B, Zhang H, Dayoub F, Su¨nderhauf N (2021) Evaluating the impact of semantic segmentation and pose estimation on dense semantic slam. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5328–5335
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), IEEE pp 3464–3468
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3645–3649
Bowman SL, Atanasov N, Daniilidis K, Pappas GJ (2017) Probabilistic data association for semantic slam. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1722–1729
Zhang L, Wei L, Shen P, Wei W, Zhu G, Song J (2018) Semantic slam based on object detection and improved octomap. IEEE Access 6:75545–75559
Zhang J, Gui M, Wang Q, Liu R, Xu J, Chen S (2019) Hierarchical topic model based object association for semantic slam. IEEE Trans Visual Comput Graph 25(11):3052–3062
Qian Z, Patath K, Fu J, Xiao J (2021) Semantic slam with autonomous object-level data association. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 11203–11209
Galvez-Lopez D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):188–1197
Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA), IEEE pp 2631–2638
McCormac J, Handa A, Davison A, Leutenegger S (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4628–4635
Zhong F, Wang S, Zhang Z, Wang Y (2018) Detect-slam: making object detection and slam mutually beneficial. In: 2018 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1001–1010
Bescos B, F’acil JM, Civera J, Neira J (2018) Dynaslam: tracking, mapping, and inpainting in dynamic scenes. IEEE Robot Autom Lett 3(4):4076–4083
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174 (2018). IEEE
Wang K, Lin Y, Wang L, Han L, Hua M, Wang X, Lian S, Huang B (2019) A unified framework for mutual improvement of slam and semantic segmentation. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 5224–5230
Ru¨nz M, Agapito L (2017) Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 4471–4478
Runz M, Buffier M, Agapito L (2018) Maskfusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 10–20
Xu B, Li W, Tzoumanikas D, Bloesch M, Davison A, Leutenegger S (2019) Mid-fusion octree-based object-level multi-instance dynamic slam. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 5231–5237
Li P, Qin T, et al (2018) Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In: Proceedings of the European conference on computer vision (ECCV), pp 646–661
Henein M, Zhang J, Mahony R, Ila V (2020) Dynamic slam: the need for speed. In: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, pp 2123–2129
Huang J, Yang S, Mu T-J, Hu S-M (2020) Clustervo: clustering moving instances and estimating visual odometry for self and surroundings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2168–2177
Bescos B, Campos C, Tardo’s JD, Neira J (2021) Dynaslam ii: tightly-coupled multi-object tracking and slam. Robot Autom Lett 6(3):5191–5198
Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ (2013) Slam++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1352–1359
Stenborg E, Toft C, Hammarstrand L (2018) Long-term visual localization using semantically segmented images. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE pp 6484–6490
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, IEEE, pp 2564–2571
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol 96, pp 226–231
Jessee E, Wiebe E (2008) Visual perception and the hsv color system: Exploring color in the communications technology classroom. Technol Eng Teacher 68(1):7
Vadivel A, Sural S, Majumdar AK (2005) Human color perception in the hsv space and its application in histogram generation for image retrieval. In: Color imaging X: processing, hardcopy, and applications, vol 5667. SPIE, pp 598–609
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: Proceedings of the international conference on intelligent robot systems (IROS)
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)
Funding
This work was supported by a grant from the National Key Research and Development Program of China (2018YFB1305001), the Hubei Major scientific and technological project (2021AAA010) and Open Fund of Hubei Luojia Laboratory.
Author information
Authors and Affiliations
Contributions
All authors contributed to the methodology conception and design. YL designed the methodology, collected the data, performed the experiments and analysis, and wrote the manuscript. CG designed the methodology, reviewed and approved the manuscript. YW reviewed and approved the manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no competing interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Y., Guo, C. & Wang, Y. Object-aware data association for the semantically constrained visual SLAM. Intel Serv Robotics 16, 155–176 (2023). https://doi.org/10.1007/s11370-023-00455-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-023-00455-9