skip to main content
research-article

SLAM for Indoor Parking: A Comprehensive Benchmark Dataset and a Tightly Coupled Semantic Framework

Authors Info & Claims
Published:05 January 2023Publication History
Skip Abstract Section

Abstract

For the task of autonomous indoor parking, various Visual-Inertial Simultaneous Localization And Mapping (SLAM) systems are expected to achieve comparable results with the benefit of complementary effects of visual cameras and the Inertial Measurement Units. To compare these competing SLAM systems, it is necessary to have publicly available datasets, offering an objective way to demonstrate the pros/cons of each SLAM system. However, the availability of such high-quality datasets is surprisingly limited due to the profound challenge of the groundtruth trajectory acquisition in the Global Positioning Satellite denied indoor parking environments. In this article, we establish BeVIS, a large-scale Benchmark dataset with Visual (front-view), Inertial and Surround-view sensors for evaluating the performance of SLAM systems developed for autonomous indoor parking, which is the first of its kind where both the raw data and the groundtruth trajectories are available. In BeVIS, the groundtruth trajectories are obtained by tracking artificial landmarks scattered in the indoor parking environments, whose coordinates are recorded in a surveying manner with a high-precision Electronic Total Station. Moreover, the groundtruth trajectories are comprehensively evaluated in terms of two respects, the reprojection error and the pose volatility, respectively. Apart from BeVIS, we propose a novel tightly coupled semantic SLAM framework, namely VISSLAM-2, leveraging Visual (front-view), Inertial, and Surround-view sensor modalities, specially for the task of autonomous indoor parking. It is the first work attempting to provide a general form to model various semantic objects on the ground. Experiments on BeVIS demonstrate the effectiveness of the proposed VISSLAM-2. Our benchmark dataset BeVIS is publicly available at https://shaoxuan92.github.io/BeVIS.

REFERENCES

  1. [1] Abdel-Aziz Youssef Ibrahim and Karara Hauck Michael. 2015. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogram. Eng. Remote Sens. 81, 2 (2015), 103107.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Awange Joseph, Grafarend Erik Wilhelm, Paláncz Béla, and Zaletnyik Piroska. 2010. Positioning by Intersection Methods. Algebraic Geodesy and Geoinformatics, Springer, Berlin, 249263.Google ScholarGoogle Scholar
  3. [3] Bescos Berta, Fácil José María, Civera Javier, and Neira José. 2018. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3, 4 (2018), 40764083.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Blanco-Claraco Jose Luis, Moreno-Duenas Francisco Angel, and Gonzalez-Jimenez Javier. 2014. The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario. Int. J. Robot. Res. 33, 2 (2014), 207214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bloesch Michael, Omari Sammy, Hutter Marco, and Siegwart Roland. 2015. Robust visual inertial odometry using a direct EKF-based approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 298304.Google ScholarGoogle Scholar
  6. [6] Bochkovskiy Alexey, Wang Chien-Yao, and Liao Hong-Yuan Mark. 2020. YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934. Retrieved from https://arxiv.org/abs/2004.10934.Google ScholarGoogle Scholar
  7. [7] Burri Michael, Nikolic Janosch, Gohl Pascal, Schneider Thomas, Rehder Joern, Omari Sammy, Achtelik Markus W., and Siegwart Roland. 2016. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 35, 10 (2016), 11571163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Campos Carlos, Elvira Richard, Rodríguez Juan José Gómez, Montiel José Manuel Montiel, and Tardós Juan Domingo. 2021. ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE Trans. Robotics 37, 6 (2021), 1874–1890.Google ScholarGoogle Scholar
  9. [9] Civera Javier, Galvez-Lopez Dorian, Riazuelo L., Tardos Juan Domingo, and Montiel José Manuel. 2011. Towards semantic SLAM using a monocular camera. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 12771284.Google ScholarGoogle Scholar
  10. [10] Dame Amaury, Prisacariu Victor Adrian, Ren Carl Yuheng, and Reid Ian. 2013. Dense reconstruction using 3D object shape priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12881295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Engel Jakob, Usenko Vladyslav, and Cremers Daniel. 2016. A photometrically calibrated benchmark for monocular visual odometry. arXiv:1607.02555. Retrieved from https://arxiv.org/abs/1607.02555.Google ScholarGoogle Scholar
  12. [12] Frost Duncan, Prisacariu Victor, and Murray David. 2018. Recovering stable scale in monocular SLAM using object-supplemented bundle adjustment. IEEE Trans. Robot. 34, 3 (2018), 736747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Furgale Paul, Rehder Joern, and Siegwart Roland. 2013. Unified temporal and spatial calibration for multi-sensor systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 12801286.Google ScholarGoogle Scholar
  14. [14] Gao Xiaoshan, Hou Xiaorong, Tang Jianliang, and Cheng Hangfei. 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930943.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Gaspar Ana Rita, Nunes Alexandra, Pinto Andry Maykol, and Matos Aníbal. 2018. Urban@CRAS dataset: Benchmarking of visual odometry and SLAM techniques. Robot. Auton. Syst. 109 (2018), 5967.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Geiger Andreas, Lenz Philip, Stiller Christoph, and Urtasun Raquel. 2013. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 11 (2013), 12311237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Girshick Ross. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 14401448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Girshick Ross, Donahue Jeff, Darrell Trevor, and Malik Jitendra. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Jeong Jinyong, Cho Younggun, Shin Young Sik, Roh Hyunchul, and Kim Ayoung. 2019. Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 38, 6 (2019), 642657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Judd Kevin Michael and Gammell Jonathan D.. 2019. The Oxford multimotion dataset: Multiple SE(3) motions with ground truth. IEEE Robot. Autom. Lett. 4, 2 (2019), 800807.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kaneko Masaya, Iwami Kazuya, Ogawa Torn, Yamasaki Toshihiko, and Aizawa Kiyoharu. 2018. Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 3713718.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lepetit Vincent, Moreno-Noguer Francesc, and Fua Pascal. 2009. EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 2 (2009), 155166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Leutenegger Stefan, Lynen Simon, Bosse Michael, Siegwart Roland, and Furgale Paul. 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34, 3 (2015), 314334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Moreno-Noguer Francesc, Lepetit Vincent, and Fua Pascal. 2007. Accurate non-iterative O(n) solution to the PnP problem. In Proceedings of the IEEE International Conference on Computer Vision. 22522259.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Munguía Rodrigo, Nuno Emmanuel, Aldana Carlos I., and Urzua Sarquis. 2016. A visual-aided inertial navigation and mapping system. IEEE Int. J. Adv. Robot. Syst. 13, 3 (2016), 94112.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Mur-Artal Raúl and Tardós Juan Domingo. 2017. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 5 (2017), 12551262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Mur-Artal Raúl and Tardós Juan Domingo. 2017. Visual-inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2, 2 (2017), 796803.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Nicholson Lachlan, Milford Michael, and Sünderhauf Niko. 2018. QuadricSLAM: Dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot. Autom. Lett. 4, 1 (2018), 18.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Olson Edwin. 2011. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the IEEE International Conference on Robotics and Automation. 34003407.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Penate-Sanchez Adrian, Andrade-Cetto Juan, and Moreno-Noguer Francesc. 2013. Exhaustive linearization for robust camera pose and focal length estimation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 10 (2013), 23872400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Qin Tong, Cao Shaozu, Pan Jie, and Shen Shaojie. 2019. A general optimization-based framework for global oose estimation with multiple sensors. arXiv:1901.03642. Retrieved from https://arxiv.org/abs/1901.03642.Google ScholarGoogle Scholar
  32. [32] Qin Tong, Li Peiliang, and Shen Shaojie. 2018. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 4 (2018), 10041020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Qin Tong and Shen Shaojie. 2018. Online temporal calibration for monocular visual-inertial systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 36623669.Google ScholarGoogle Scholar
  34. [34] Redmon Joseph and Farhadi Ali. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 65176525.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Redmon Joseph and Farhadi Ali. 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767. Retrieved from https://arxiv.org/abs/1804.02767.Google ScholarGoogle Scholar
  36. [36] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 11371149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Richardson Andrew, Strom Johannes, and Olson Edwin. 2013. AprilCal: Assisted and repeatable camera calibration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 18141821.Google ScholarGoogle Scholar
  38. [38] Salas-Moreno Renato F., Newcombe Richard, Strasdat Hauke, Kelly Paul, and Davison Andrew. 2013. SLAM++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13521359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Schubert David, Goll Thore, Demmel Nikolaus, Usenko Vladyslav, Stückler Jörg, and Cremers Daniel. 2018. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 16801687.Google ScholarGoogle Scholar
  40. [40] Shao Xuan, Liu Xiao, Zhang Lin, Zhao Shengjie, Shen Ying, and Yang Yukai. 2019. Revisit surround-view camera system calibration. In Proceedings of the IEEE International Conference on Multimedia and Expo. 14861491.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Shao Xuan, Zhang Lin, Zhang Tianjun, Shen Ying, and Zhou Yicong. 2020. A tightly-coupled semantic SLAM system with visual, inertial and surround-view sensors for autonomous indoor parking. In Proceedings of the ACM International Conference on Multimedia. 26912699.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Sturm Jrgen, Engelhard Nikolas, Endres Felix, Burgard Wolfram, and Cremers Daniel. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 573580.Google ScholarGoogle Scholar
  43. [43] Sünderhauf Niko, Pham Trung, Latif Yasir, Milford Michael, and Reid Ian. 2017. Meaningful maps with object-oriented semantic mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 50795085.Google ScholarGoogle Scholar
  44. [44] Umeyama S.. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13, 4 (1991), 376380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wang Chunxiang, Zhang Hengrun, Yang Ming, Wang Xudong, Ye Lei, and Guo Chunzhao. 2014. Automatic parking based on a bird’s eye view vision system. Adv. Mechanical Engineering 6 (2014), 847406.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang John and Olson Edwin. 2016. AprilTag 2: Efficient and robust fiducial detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1934198.Google ScholarGoogle Scholar
  47. [47] Weiss Stephan and Siegwart Roland. 2011. Real-time metric state estimation for modular vision-inertial systems. In Proceedings of the IEEE International Conference on Robotics and Automation. 45314537.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Woodman Oliver J.. 2017. An Introduction to Inertial Navigation. Retrieved from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.pdf.Google ScholarGoogle Scholar
  49. [49] Yang Shichao and Scherer Sebastian. 2019. CubeSLAM: Monocular 3D object SLAM. IEEE Trans. Robot. 35, 4 (2019), 925938.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Yang Shichao, Song Yu, Kaess Michael, and Scherer Sebastian. 2016. Pop-up SLAM: Semantic monocular plane SLAM for low-texture environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 12221229.Google ScholarGoogle Scholar
  51. [51] Zhang Lin, Huang Junhao, Li Xiyuan, and Xiong Lu. 2018. Vision-based parking-Slot detection: A DCNN-based approach and a large-scale benchmark dataset. IEEE Trans. Image Process. 27, 11 (2018), 53505364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Zhang Zhengyou. 2000. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22, 11 (2000), 13301334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhao Junqiao, Huang Yewei, He Xudong, Zhang Shaoming, Ye Chen, Feng Tiantian, and Xiong Lu. 2019. Visual semantic landmark-based robust mapping and localization for autonomous indoor parking. Sensors 19, 1 (2019), 161180.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SLAM for Indoor Parking: A Comprehensive Benchmark Dataset and a Tightly Coupled Semantic Framework

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
      January 2023
      505 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572858
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 January 2023
      • Online AM: 18 February 2022
      • Accepted: 9 January 2022
      • Revised: 28 November 2021
      • Received: 28 July 2021
      Published in tomm Volume 19, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)366
      • Downloads (Last 6 weeks)31

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format