Skip to main content
Log in

Object-aware data association for the semantically constrained visual SLAM

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

Traditional vSLAM methods extract feature points from images to track and the data association of points is based on low-level geometric clues. When points are observed from variant viewpoints, these clues are not robust for matching. In contrast, semantic information remains consistent for variance of viewpoints and observed scales. Therefore, semantic vSLAM methods gain more attention in recent years. In particular, object-level semantic information can be utilized to model the environment as object landmarks and has been fused into many vSLAM methods which are called object-level vSLAM methods. How to associate objects over consecutive images and how to utilize object information in the pose estimation are two key problems for object-level vSLAM methods. In this work, we propose an object-level vSLAM method which is aimed to solve the object-level data association and estimate camera poses using object semantic constraints. We present an object-level data association scheme considering object appearance and geometry of point landmarks, processing both objects and points matching for mutual improvements. We propose a semantic re-projection error function based on object-level semantic information and integrate it into the pose optimization, establishing longer term constraints. We performed experiments on public datasets including both indoor and outdoor scenes. The evaluation results demonstrate that our method can achieve high accuracy in the object-level data association and outperforms the baseline method in the pose estimation. An open-source version of the code is also available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data and code that support the findings of this study are not openly available due to application for further research and are available from the corresponding author upon reasonable request.

References

  1. Klein G, Murray D (2007) Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality, IEEE, pp 225–234

  2. Mur-Artal R, Tardos JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262

    Article  Google Scholar 

  3. Engel J, Sch¨o ps T, Cremers D (2014) Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision, Springer, pp 834–849

  4. Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Analy Mach Intell 40(3):611–625

    Article  Google Scholar 

  5. Lianos K-N, Schonberger JL, Pollefeys M, Sattler T (2018) Vso: visual semantic odometry. In: Proceedings of the European conference on computer vision (ECCV), pp 234–250

  6. Yang S, Scherer S (2019) Cubeslam: monocular 3-d object slam. IEEE Trans Rob 35(4):925–938

    Article  Google Scholar 

  7. Iqbal A, Gans NR (2018) Localization of classified objects in slam using nonparametric statistics and clustering. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 161–168

  8. Wu Y, Zhang Y, Zhu D, Feng Y, Coleman S, Kerr D (2020) Eao-slam: monocular semi-dense object slam based on ensemble data association. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 4966–4973

  9. Nicholson L, Milford M, Sunderhauf N (2018) Quadricslam: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot Autom Lett 4(1):1–8

    Article  Google Scholar 

  10. Campos C, Elvira R, Rodríguez JJG, Montiel JM, Tardόs JD (2021) Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans Robot 37(6):1874–1890

    Article  Google Scholar 

  11. Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ (2014) Dense planar slam. In: 2014 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE pp 157–164

  12. Hsiao M, Westman E, Zhang G, Kaess M (2017) Keyframe-based dense planar slam. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 5110–5117.

  13. Maity S, Saha A, Bhowmick B (2017) Edge slam: edge points based monocular visual slam. In: Proceedings of the IEEE international conference on computer vision workshops, pp 2408–2417

  14. Gomez-Ojeda R, Moreno F-A, Zuniga-Noel D, Scaramuzza D, Gonzalez-Jimenez J (2019) Pl-slam: a stereo slam system through the combination of points and line segments. IEEE Transactions on Robotics 35(3):734–746

    Article  Google Scholar 

  15. Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer, F (2017) Pl-slam: realtime monocular visual slam with points and lines. In: 2017 IEEE International conference on robotics and automation (ICRA), IEEE, pp 4503–4508

  16. Zhou H, Zou D, Pei L, Ying R, Liu P, Yu W (2015) Structslam: Visual slam with building structure lines. IEEE Trans Vehic Technol 64(4):1364–1375

    Article  Google Scholar 

  17. DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 224–236

  18. Dusmanu M, Rocco I, Pajdla T, Pollefeys M, Sivic J, Torii A, Sattler T (2019) D2-net: a trainable cnn for joint description and detection of local features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8092–8101

  19. Sarlin P-E, DeTone D, Malisiewicz T, Rabinovich A (2020) Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4938–4947

  20. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  21. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  22. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062

  23. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  24. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1245

    Google Scholar 

  25. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C. (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  26. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  27. He K, Gkioxari G, Doll´ar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  28. Bolya D, Zhou C, Xiao F, Lee YJ (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9157–9166

  29. Kirillov A, He K, Girshick R, Rother C, Doll´ar P (2019) Panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9404–9413

  30. Bista SR, Hall D, Talbot B, Zhang H, Dayoub F, Su¨nderhauf N (2021) Evaluating the impact of semantic segmentation and pose estimation on dense semantic slam. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 5328–5335

  31. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), IEEE pp 3464–3468

  32. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3645–3649

  33. Bowman SL, Atanasov N, Daniilidis K, Pappas GJ (2017) Probabilistic data association for semantic slam. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 1722–1729

  34. Zhang L, Wei L, Shen P, Wei W, Zhu G, Song J (2018) Semantic slam based on object detection and improved octomap. IEEE Access 6:75545–75559

    Article  Google Scholar 

  35. Zhang J, Gui M, Wang Q, Liu R, Xu J, Chen S (2019) Hierarchical topic model based object association for semantic slam. IEEE Trans Visual Comput Graph 25(11):3052–3062

    Article  Google Scholar 

  36. Qian Z, Patath K, Fu J, Xiao J (2021) Semantic slam with autonomous object-level data association. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 11203–11209

  37. Galvez-Lopez D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):188–1197

    Article  Google Scholar 

  38. Hermans A, Floros G, Leibe B (2014) Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA), IEEE pp 2631–2638

  39. McCormac J, Handa A, Davison A, Leutenegger S (2017) Semanticfusion: dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 4628–4635

  40. Zhong F, Wang S, Zhang Z, Wang Y (2018) Detect-slam: making object detection and slam mutually beneficial. In: 2018 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1001–1010

  41. Bescos B, F’acil JM, Civera J, Neira J (2018) Dynaslam: tracking, mapping, and inpainting in dynamic scenes. IEEE Robot Autom Lett 3(4):4076–4083

    Article  Google Scholar 

  42. Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1168–1174 (2018). IEEE

  43. Wang K, Lin Y, Wang L, Han L, Hua M, Wang X, Lian S, Huang B (2019) A unified framework for mutual improvement of slam and semantic segmentation. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 5224–5230

  44. Ru¨nz M, Agapito L (2017) Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp 4471–4478

  45. Runz M, Buffier M, Agapito L (2018) Maskfusion: real-time recognition, tracking and reconstruction of multiple moving objects. In: 2018 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 10–20

  46. Xu B, Li W, Tzoumanikas D, Bloesch M, Davison A, Leutenegger S (2019) Mid-fusion octree-based object-level multi-instance dynamic slam. In: 2019 International conference on robotics and automation (ICRA), IEEE, pp 5231–5237

  47. Li P, Qin T, et al (2018) Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In: Proceedings of the European conference on computer vision (ECCV), pp 646–661

  48. Henein M, Zhang J, Mahony R, Ila V (2020) Dynamic slam: the need for speed. In: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, pp 2123–2129

  49. Huang J, Yang S, Mu T-J, Hu S-M (2020) Clustervo: clustering moving instances and estimating visual odometry for self and surroundings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2168–2177

  50. Bescos B, Campos C, Tardo’s JD, Neira J (2021) Dynaslam ii: tightly-coupled multi-object tracking and slam. Robot Autom Lett 6(3):5191–5198

    Article  Google Scholar 

  51. Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ (2013) Slam++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1352–1359

  52. Stenborg E, Toft C, Hammarstrand L (2018) Long-term visual localization using semantically segmented images. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE pp 6484–6490

  53. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision, IEEE, pp 2564–2571

  54. Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol 96, pp 226–231

  55. Jessee E, Wiebe E (2008) Visual perception and the hsv color system: Exploring color in the communications technology classroom. Technol Eng Teacher 68(1):7

    Google Scholar 

  56. Vadivel A, Sural S, Majumdar AK (2005) Human color perception in the hsv space and its application in histogram generation for image retrieval. In: Color imaging X: processing, hardcopy, and applications, vol 5667. SPIE, pp 598–609

  57. Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: Proceedings of the international conference on intelligent robot systems (IROS)

  58. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

Download references

Funding

This work was supported by a grant from the National Key Research and Development Program of China (2018YFB1305001), the Hubei Major scientific and technological project (2021AAA010) and Open Fund of Hubei Luojia Laboratory.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the methodology conception and design. YL designed the methodology, collected the data, performed the experiments and analysis, and wrote the manuscript. CG designed the methodology, reviewed and approved the manuscript. YW reviewed and approved the manuscript.

Corresponding author

Correspondence to Chi Guo.

Ethics declarations

Competing interest

The authors declare that they have no competing interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Guo, C. & Wang, Y. Object-aware data association for the semantically constrained visual SLAM. Intel Serv Robotics 16, 155–176 (2023). https://doi.org/10.1007/s11370-023-00455-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-023-00455-9

Keywords

Navigation