Abstract
The assumption of a static environment is a prerequisite for most of the traditional visual simultaneous localization and mapping (v-SLAM) algorithms, which limits their widespread application in a dynamic environment. Furthermore, in many applications such as autonomous driving, robot collaboration and AR/VR, it is necessary to track the moving objects in the environment. In this work, we propose a v-SLAM method that can effectively track multiple objects in dynamic environments by integrating a 3D object detection thread into the ORB-SLAM2 framework. The dynamic objects were detected and tracked in three steps. Firstly, 3D object detection was performed on the current frame, and the 3D bounding box was projected into a bird's-eye view. Secondly, an association for the object is made based on the motion state of the object and the bounding box in the bird’s-eye view. Thirdly, we track the object and remove feature points corresponding to the dynamic region. In addition, we set up a multi-view constraint adjustment for static objects to jointly optimize the pose of the camera, object, and map point. Experiments have been conducted on the KITTI-odom and KITTI-raw datasets. The performance of our method was verified in challenging scenarios. We demonstrate that dynamic object tracking not only provides useful information for scene understanding, but also help to improve camera tracking.
Similar content being viewed by others
Data availability
The authors declare that the information presented in this article is intended for preferential use by the Journal of Intelligent and Robotic Systems—JINT.
References
Huang, S., Dissanayake, G.: A critique of current developments in simultaneous localization and mapping. Int. J. Adv. Robot. Syst. 13(5), 1729 (2016). https://doi.org/10.1177/1729881416669482
Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Rob. 32(6), 1309–1332 (2016)
Stühmer, J., Gumhold, S., Cremers, D.: Real-time dense geometry from a handheld camera. In: Proceedings of the 32nd DAGM conference on pattern recognition, Berlin, Heidelberg, Springer-Verlag, 11–20, (2010)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: 2011 international conference on computer vision, 2320–2327, (2011)
Graber, G., Pock, T., Bischof, H.: Online 3D reconstruction using convex optimization. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), 708-711, (2011)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces.In: 2007 6th IEEE and ACM international symposium on mixed and augmented reality, 225–234, (2007)
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM Ssstem. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017)
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM3: an accurate open-source library for visual, visual–inertial, and multimap SLAM. IEEE Trans. Rob. 37(6), 1874–1890 (2021)
Chen, S.Y.: Kalman filter for robot vision: a survey. IEEE Trans. Industr. Electron. 59(11), 4409–4420 (2012)
Bescos, B., Fácil, J.M., Civera, J., Neira, J.: DynaSLAM: tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)
Sun, Y., Liu, M., Meng, M.Q.H.: Improving RGB-D SLAM in dynamic environments: a motion removal approach. Robot. Auton. Syst. 89, 110–122 (2017)
Xiao, L., Wang, J., Qiu, X., Rong, Z., Zou, X.: Dynamic-SLAM: semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 117, 04 (2019)
Li, S., Lee, D.: RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autom. Lett. 2(4), 2263–2270 (2017)
Baig, Q., Vu, T.-D., Aycard, O.: Online localization and mapping with moving objects detection in dynamic outdoor environments. In: 2009 IEEE 5th international conference on intelligent computer communication and processing, 401–408, (2009)
Wangsiripitak, S., Murray, D.W.: Avoiding moving outliers in visual SLAM by tracking moving objects. In: 2009 IEEE international conference on robotics and automation, 375–380, (2009)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, 3354–3361, (2012)
Yu, C. et al.: DS-SLAM: a semantic visual SLAM towards dynamic environments. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), 1168–1174, (2018)
Kaveti, P., Nir, J.S., Singh, H.: Towards robust VSLAM in dynamic environments: a light field approach. In: 2021 IEEE international conference on multisensor fusion and integration for intelligent systems (MFI), 1–8, (2021)
Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Real-time 3D reconstruction in dynamic scenes using point-based fusion. In: 2013 international conference on 3D Vision-3DV 2013, 1–8, (2013)
Kim, D.-H., Kim, J.-H.: Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Trans. Rob. 32(6), 1565–1573 (2016)
Huang, J., Yang, S., Mu, T.-J., Hu, S.-M.: ClusterVO: clustering moving instances and estimating visual odometry for self and surroundings. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2165–2174, (2020)
Ballester, I., Fontán, A., Civera, J., Strobl, K.H., Triebel, R.: DOT: dynamic object tracking for visual SLAM. In: 2021 IEEE international conference on robotics and automation (ICRA), 11705–11711, (2021)
Tian, R., et al.: Accurate and robust object SLAM with 3D quadric landmark reconstruction in outdoors. IEEE Robot. Autom. Lett. 7(2), 1534–1541 (2022)
Cao, Z., et al.: Object-aware SLAM based on efficient quadric initialization and joint data association. IEEE Robot. Autom. Lett. 7(4), 9802–9809 (2022)
Meng, Y., Zhou, B.: Ellipsoid SLAM with novel object initialization. In: 2022 IEEE 18th international conference on automation science and engineering (CASE), 1333–1338, (2022)
Zins, M., Simon, G., Berger, M.-O.: OA-SLAM: leveraging objects for camera relocalization in visual SLAM. In: 2022 IEEE international symposium on mixed and augmented reality (ISMAR), 720–728, (2022)
Zins, M., Simon, G., Berger, M.-O.: Object-based visual camera pose estimation from ellipsoidal model and 3D-aware ellipse prediction. Int. J. Comput. Vis. 130(4), 1107–1126 (2022). https://doi.org/10.1007/s11263-022-01585-w
Zins, M., Simon, G., Berger, M.-O.: Level set-based camera pose estimation from multiple 2D/3D ellipse-ellipsoid correspondences. (2022)
Li, P., Shi, J., Shen, S.: Joint spatial-temporal optimization for stereo 3D object tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 6876–6885, (2020)
Bescos, B., Campos, C., Tardós, J.D., Neira, J.: DynaSLAM II: Tightly-coupled multi-object tracking and SLAM. IEEE Robot. Autom. Lett. 6(3), 5191–5198 (2021)
Hosseinzadeh, M., Li, K., Latif, Y., Reid, I.: Real-time monocular object-model aware sparse SLAM. In: 2019 international conference on robotics and automation (ICRA), 7123–7129, (2019)
Nicholson, L., Milford, M., Sünderhauf, N.: QuadricSLAM: Dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot. Autom. Lett. 4(1), 1–8 (2019)
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: SLAM++: Simultaneous localisation and mapping at the level of objects. In: 2013 IEEE conference on computer vision and pattern recognition, 1352–1359, (2013)
Yang, S., Scherer, S.: CubeSLAM: monocular 3-D object SLAM. IEEE Trans. Rob. 35(4), 925–938 (2019)
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-YOLO: real-time 3D object detection on point clouds. (2018)
Moré, J.J.: The Levenberg-Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.) Numerical analysis, vol. 630, pp. 105–116. Springer, Heidelberg (1978)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, 573–580, (2012)
Li, X., et al.: DyStSLAM: an efficient stereo vision SLAM system in dynamic environment. Meas. Sci. Technol. 34(2), 025105 (2022). https://doi.org/10.1088/1361-6501/ac97b1
Tian, R., et al.: Object SLAM with robust quadric initialization and mapping for dynamic outdoors. IEEE Trans. Intell. Transp. Syst. (2023). https://doi.org/10.1109/TITS.2023.3281837
Huang, J., Yang, S., Zhao, Z., Lai, Y.-K., Hu, S.: ClusterSLAM: a SLAM backend for simultaneous rigid body clustering and motion estimation. In: 2019 IEEE/CVF international conference on computer vision (ICCV), 5874–5883, (2019)
Barsan, I.A., Liu, P., Pollefeys, M., Geiger, A.: Robust dense mapping for large-scale dynamic environments. In: 2018 IEEE international conference on robotics and automation (ICRA). (2018). https://doi.org/10.1109/ICRA.2018.8462974.
Zhang, J., Henein, M., Mahony, R., Ila, V.: VDO-SLAM: a visual dynamic object-aware SLAM system. (2021)
Acknowledgements
We would like to thank the Center for Public Security Technology.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by CH and MZ. CH and MZ contributed equally to this manuscript. The first draft of the manuscript was written by CH and all authors commented on previous versions of the manuscript. The revision of manuscript and data optimization was performed by ZJ, CY and ZW. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hong, C., Zhong, M., Jia, Z. et al. A stereo vision SLAM with moving vehicles tracking in outdoor environment. Machine Vision and Applications 35, 5 (2024). https://doi.org/10.1007/s00138-023-01488-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01488-x