skip to main content
research-article

SLAM for Indoor Parking: A Comprehensive Benchmark Dataset and a Tightly Coupled Semantic Framework

Published: 05 January 2023 Publication History

Abstract

For the task of autonomous indoor parking, various Visual-Inertial Simultaneous Localization And Mapping (SLAM) systems are expected to achieve comparable results with the benefit of complementary effects of visual cameras and the Inertial Measurement Units. To compare these competing SLAM systems, it is necessary to have publicly available datasets, offering an objective way to demonstrate the pros/cons of each SLAM system. However, the availability of such high-quality datasets is surprisingly limited due to the profound challenge of the groundtruth trajectory acquisition in the Global Positioning Satellite denied indoor parking environments. In this article, we establish BeVIS, a large-scale Benchmark dataset with Visual (front-view), Inertial and Surround-view sensors for evaluating the performance of SLAM systems developed for autonomous indoor parking, which is the first of its kind where both the raw data and the groundtruth trajectories are available. In BeVIS, the groundtruth trajectories are obtained by tracking artificial landmarks scattered in the indoor parking environments, whose coordinates are recorded in a surveying manner with a high-precision Electronic Total Station. Moreover, the groundtruth trajectories are comprehensively evaluated in terms of two respects, the reprojection error and the pose volatility, respectively. Apart from BeVIS, we propose a novel tightly coupled semantic SLAM framework, namely VISSLAM-2, leveraging Visual (front-view), Inertial, and Surround-view sensor modalities, specially for the task of autonomous indoor parking. It is the first work attempting to provide a general form to model various semantic objects on the ground. Experiments on BeVIS demonstrate the effectiveness of the proposed VISSLAM-2. Our benchmark dataset BeVIS is publicly available at https://shaoxuan92.github.io/BeVIS.

References

[1]
Youssef Ibrahim Abdel-Aziz and Hauck Michael Karara. 2015. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogram. Eng. Remote Sens. 81, 2 (2015), 103–107.
[2]
Joseph Awange, Erik Wilhelm Grafarend, Béla Paláncz, and Piroska Zaletnyik. 2010. Positioning by Intersection Methods. Algebraic Geodesy and Geoinformatics, Springer, Berlin, 249–263.
[3]
Berta Bescos, José María Fácil, Javier Civera, and José Neira. 2018. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3, 4 (2018), 4076–4083.
[4]
Jose Luis Blanco-Claraco, Francisco Angel Moreno-Duenas, and Javier Gonzalez-Jimenez. 2014. The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario. Int. J. Robot. Res. 33, 2 (2014), 207–214.
[5]
Michael Bloesch, Sammy Omari, Marco Hutter, and Roland Siegwart. 2015. Robust visual inertial odometry using a direct EKF-based approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 298–304.
[6]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934. Retrieved from https://arxiv.org/abs/2004.10934.
[7]
Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W. Achtelik, and Roland Siegwart. 2016. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 35, 10 (2016), 1157–1163.
[8]
Carlos Campos, Richard Elvira, Juan José Gómez Rodríguez, José Manuel Montiel Montiel, and Juan Domingo Tardós. 2021. ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE Trans. Robotics 37, 6 (2021), 1874–1890.
[9]
Javier Civera, Dorian Galvez-Lopez, L. Riazuelo, Juan Domingo Tardos, and José Manuel Montiel. 2011. Towards semantic SLAM using a monocular camera. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1277–1284.
[10]
Amaury Dame, Victor Adrian Prisacariu, Carl Yuheng Ren, and Ian Reid. 2013. Dense reconstruction using 3D object shape priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288–1295.
[11]
Jakob Engel, Vladyslav Usenko, and Daniel Cremers. 2016. A photometrically calibrated benchmark for monocular visual odometry. arXiv:1607.02555. Retrieved from https://arxiv.org/abs/1607.02555.
[12]
Duncan Frost, Victor Prisacariu, and David Murray. 2018. Recovering stable scale in monocular SLAM using object-supplemented bundle adjustment. IEEE Trans. Robot. 34, 3 (2018), 736–747.
[13]
Paul Furgale, Joern Rehder, and Roland Siegwart. 2013. Unified temporal and spatial calibration for multi-sensor systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1280–1286.
[14]
Xiaoshan Gao, Xiaorong Hou, Jianliang Tang, and Hangfei Cheng. 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930–943.
[15]
Ana Rita Gaspar, Alexandra Nunes, Andry Maykol Pinto, and Aníbal Matos. 2018. Urban@CRAS dataset: Benchmarking of visual odometry and SLAM techniques. Robot. Auton. Syst. 109 (2018), 59–67.
[16]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 11 (2013), 1231–1237.
[17]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.
[18]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.
[19]
Jinyong Jeong, Younggun Cho, Young Sik Shin, Hyunchul Roh, and Ayoung Kim. 2019. Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 38, 6 (2019), 642–657.
[20]
Kevin Michael Judd and Jonathan D. Gammell. 2019. The Oxford multimotion dataset: Multiple SE(3) motions with ground truth. IEEE Robot. Autom. Lett. 4, 2 (2019), 800–807.
[21]
Masaya Kaneko, Kazuya Iwami, Torn Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018. Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 371–3718.
[22]
Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. 2009. EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 2 (2009), 155–166.
[23]
Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34, 3 (2015), 314–334.
[24]
Francesc Moreno-Noguer, Vincent Lepetit, and Pascal Fua. 2007. Accurate non-iterative O(n) solution to the PnP problem. In Proceedings of the IEEE International Conference on Computer Vision. 2252–2259.
[25]
Rodrigo Munguía, Emmanuel Nuno, Carlos I. Aldana, and Sarquis Urzua. 2016. A visual-aided inertial navigation and mapping system. IEEE Int. J. Adv. Robot. Syst. 13, 3 (2016), 94–112.
[26]
Raúl Mur-Artal and Juan Domingo Tardós. 2017. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 5 (2017), 1255–1262.
[27]
Raúl Mur-Artal and Juan Domingo Tardós. 2017. Visual-inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2, 2 (2017), 796–803.
[28]
Lachlan Nicholson, Michael Milford, and Niko Sünderhauf. 2018. QuadricSLAM: Dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot. Autom. Lett. 4, 1 (2018), 1–8.
[29]
Edwin Olson. 2011. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the IEEE International Conference on Robotics and Automation. 3400–3407.
[30]
Adrian Penate-Sanchez, Juan Andrade-Cetto, and Francesc Moreno-Noguer. 2013. Exhaustive linearization for robust camera pose and focal length estimation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 10 (2013), 2387–2400.
[31]
Tong Qin, Shaozu Cao, Jie Pan, and Shaojie Shen. 2019. A general optimization-based framework for global oose estimation with multiple sensors. arXiv:1901.03642. Retrieved from https://arxiv.org/abs/1901.03642.
[32]
Tong Qin, Peiliang Li, and Shaojie Shen. 2018. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 4 (2018), 1004–1020.
[33]
Tong Qin and Shaojie Shen. 2018. Online temporal calibration for monocular visual-inertial systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 3662–3669.
[34]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6517–6525.
[35]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767. Retrieved from https://arxiv.org/abs/1804.02767.
[36]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.
[37]
Andrew Richardson, Johannes Strom, and Edwin Olson. 2013. AprilCal: Assisted and repeatable camera calibration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1814–1821.
[38]
Renato F. Salas-Moreno, Richard Newcombe, Hauke Strasdat, Paul Kelly, and Andrew Davison. 2013. SLAM++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1352–1359.
[39]
David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, and Daniel Cremers. 2018. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1680–1687.
[40]
Xuan Shao, Xiao Liu, Lin Zhang, Shengjie Zhao, Ying Shen, and Yukai Yang. 2019. Revisit surround-view camera system calibration. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1486–1491.
[41]
Xuan Shao, Lin Zhang, Tianjun Zhang, Ying Shen, and Yicong Zhou. 2020. A tightly-coupled semantic SLAM system with visual, inertial and surround-view sensors for autonomous indoor parking. In Proceedings of the ACM International Conference on Multimedia. 2691–2699.
[42]
Jrgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 573–580.
[43]
Niko Sünderhauf, Trung Pham, Yasir Latif, Michael Milford, and Ian Reid. 2017. Meaningful maps with object-oriented semantic mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 5079–5085.
[44]
S. Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13, 4 (1991), 376–380.
[45]
Chunxiang Wang, Hengrun Zhang, Ming Yang, Xudong Wang, Lei Ye, and Chunzhao Guo. 2014. Automatic parking based on a bird’s eye view vision system. Adv. Mechanical Engineering 6 (2014), 847406.
[46]
John Wang and Edwin Olson. 2016. AprilTag 2: Efficient and robust fiducial detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 193–4198.
[47]
Stephan Weiss and Roland Siegwart. 2011. Real-time metric state estimation for modular vision-inertial systems. In Proceedings of the IEEE International Conference on Robotics and Automation. 4531–4537.
[48]
Oliver J. Woodman. 2017. An Introduction to Inertial Navigation. Retrieved from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.pdf.
[49]
Shichao Yang and Sebastian Scherer. 2019. CubeSLAM: Monocular 3D object SLAM. IEEE Trans. Robot. 35, 4 (2019), 925–938.
[50]
Shichao Yang, Yu Song, Michael Kaess, and Sebastian Scherer. 2016. Pop-up SLAM: Semantic monocular plane SLAM for low-texture environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1222–1229.
[51]
Lin Zhang, Junhao Huang, Xiyuan Li, and Lu Xiong. 2018. Vision-based parking-Slot detection: A DCNN-based approach and a large-scale benchmark dataset. IEEE Trans. Image Process. 27, 11 (2018), 5350–5364.
[52]
Zhengyou Zhang. 2000. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22, 11 (2000), 1330–1334.
[53]
Junqiao Zhao, Yewei Huang, Xudong He, Shaoming Zhang, Chen Ye, Tiantian Feng, and Lu Xiong. 2019. Visual semantic landmark-based robust mapping and localization for autonomous indoor parking. Sensors 19, 1 (2019), 161–180.

Cited By

View all
  • (2024)AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird’s Eye View for Automated Valet Parking2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10802668(7937-7943)Online publication date: 14-Oct-2024
  • (2024)Dynamic indoor mapping for AVP: Crowdsourcing mapping without prior mapsIET Intelligent Transport Systems10.1049/itr2.1257818:12(2397-2408)Online publication date: 14-Oct-2024
  • (2024)Multi-camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet ParkingExperimental Robotics10.1007/978-3-031-63596-0_51(567-581)Online publication date: 6-Aug-2024

Index Terms

  1. SLAM for Indoor Parking: A Comprehensive Benchmark Dataset and a Tightly Coupled Semantic Framework

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
    January 2023
    505 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3572858
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 January 2023
    Online AM: 18 February 2022
    Accepted: 09 January 2022
    Revised: 28 November 2021
    Received: 28 July 2021
    Published in TOMM Volume 19, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Autonomous indoor parking
    2. benchmark dataset
    3. groundtruth trajectory acquisition
    4. Electronic Total Station
    5. semantic SLAM

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Natural Science Foundation of Shanghai
    • Shanghai Science and Technology Innovation Plan
    • Dawn Program of Shanghai Municipal Education Commission
    • Shanghai Municipal Science and Technology Major Project
    • Fundamental Research Funds for the Central Universities

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)213
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird’s Eye View for Automated Valet Parking2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10802668(7937-7943)Online publication date: 14-Oct-2024
    • (2024)Dynamic indoor mapping for AVP: Crowdsourcing mapping without prior mapsIET Intelligent Transport Systems10.1049/itr2.1257818:12(2397-2408)Online publication date: 14-Oct-2024
    • (2024)Multi-camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet ParkingExperimental Robotics10.1007/978-3-031-63596-0_51(567-581)Online publication date: 6-Aug-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media