Abstract
For the task of autonomous indoor parking, various Visual-Inertial Simultaneous Localization And Mapping (SLAM) systems are expected to achieve comparable results with the benefit of complementary effects of visual cameras and the Inertial Measurement Units. To compare these competing SLAM systems, it is necessary to have publicly available datasets, offering an objective way to demonstrate the pros/cons of each SLAM system. However, the availability of such high-quality datasets is surprisingly limited due to the profound challenge of the groundtruth trajectory acquisition in the Global Positioning Satellite denied indoor parking environments. In this article, we establish BeVIS, a large-scale Benchmark dataset with Visual (front-view), Inertial and Surround-view sensors for evaluating the performance of SLAM systems developed for autonomous indoor parking, which is the first of its kind where both the raw data and the groundtruth trajectories are available. In BeVIS, the groundtruth trajectories are obtained by tracking artificial landmarks scattered in the indoor parking environments, whose coordinates are recorded in a surveying manner with a high-precision Electronic Total Station. Moreover, the groundtruth trajectories are comprehensively evaluated in terms of two respects, the reprojection error and the pose volatility, respectively. Apart from BeVIS, we propose a novel tightly coupled semantic SLAM framework, namely VISSLAM-2, leveraging Visual (front-view), Inertial, and Surround-view sensor modalities, specially for the task of autonomous indoor parking. It is the first work attempting to provide a general form to model various semantic objects on the ground. Experiments on BeVIS demonstrate the effectiveness of the proposed VISSLAM-2. Our benchmark dataset BeVIS is publicly available at https://shaoxuan92.github.io/BeVIS.
- [1] . 2015. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogram. Eng. Remote Sens. 81, 2 (2015), 103–107.Google ScholarCross Ref
- [2] . 2010. Positioning by Intersection Methods. Algebraic Geodesy and Geoinformatics, Springer, Berlin, 249–263.Google Scholar
- [3] . 2018. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3, 4 (2018), 4076–4083.Google ScholarCross Ref
- [4] . 2014. The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario. Int. J. Robot. Res. 33, 2 (2014), 207–214.Google ScholarDigital Library
- [5] . 2015. Robust visual inertial odometry using a direct EKF-based approach. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 298–304.Google Scholar
- [6] . 2020. YOLOv4: Optimal speed and accuracy of object detection.
arXiv:2004.10934 . Retrieved from https://arxiv.org/abs/2004.10934.Google Scholar - [7] . 2016. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 35, 10 (2016), 1157–1163.Google ScholarDigital Library
- [8] . 2021. ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE Trans. Robotics 37, 6 (2021), 1874–1890.Google Scholar
- [9] . 2011. Towards semantic SLAM using a monocular camera. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1277–1284.Google Scholar
- [10] . 2013. Dense reconstruction using 3D object shape priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288–1295.Google ScholarDigital Library
- [11] . 2016. A photometrically calibrated benchmark for monocular visual odometry.
arXiv:1607.02555 . Retrieved from https://arxiv.org/abs/1607.02555.Google Scholar - [12] . 2018. Recovering stable scale in monocular SLAM using object-supplemented bundle adjustment. IEEE Trans. Robot. 34, 3 (2018), 736–747.Google ScholarDigital Library
- [13] . 2013. Unified temporal and spatial calibration for multi-sensor systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1280–1286.Google Scholar
- [14] . 2003. Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25, 8 (2003), 930–943.Google ScholarDigital Library
- [15] . 2018. Urban@CRAS dataset: Benchmarking of visual odometry and SLAM techniques. Robot. Auton. Syst. 109 (2018), 59–67.Google ScholarCross Ref
- [16] . 2013. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 11 (2013), 1231–1237.Google ScholarDigital Library
- [17] . 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.Google ScholarDigital Library
- [18] . 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.Google ScholarDigital Library
- [19] . 2019. Complex urban dataset with multi-level sensors from highly diverse urban environments. Int. J. Robot. Res. 38, 6 (2019), 642–657.Google ScholarDigital Library
- [20] . 2019. The Oxford multimotion dataset: Multiple SE(3) motions with ground truth. IEEE Robot. Autom. Lett. 4, 2 (2019), 800–807.Google ScholarCross Ref
- [21] . 2018. Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 371–3718.Google ScholarCross Ref
- [22] . 2009. EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 2 (2009), 155–166.Google ScholarDigital Library
- [23] . 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34, 3 (2015), 314–334.Google ScholarDigital Library
- [24] . 2007. Accurate non-iterative O(n) solution to the PnP problem. In Proceedings of the IEEE International Conference on Computer Vision. 2252–2259.Google ScholarCross Ref
- [25] . 2016. A visual-aided inertial navigation and mapping system. IEEE Int. J. Adv. Robot. Syst. 13, 3 (2016), 94–112.Google ScholarCross Ref
- [26] . 2017. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 5 (2017), 1255–1262.Google ScholarDigital Library
- [27] . 2017. Visual-inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2, 2 (2017), 796–803.Google ScholarCross Ref
- [28] . 2018. QuadricSLAM: Dual quadrics from object detections as landmarks in object-oriented SLAM. IEEE Robot. Autom. Lett. 4, 1 (2018), 1–8.Google ScholarCross Ref
- [29] . 2011. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the IEEE International Conference on Robotics and Automation. 3400–3407.Google ScholarCross Ref
- [30] . 2013. Exhaustive linearization for robust camera pose and focal length estimation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 10 (2013), 2387–2400.Google ScholarDigital Library
- [31] . 2019. A general optimization-based framework for global oose estimation with multiple sensors.
arXiv:1901.03642 . Retrieved from https://arxiv.org/abs/1901.03642.Google Scholar - [32] . 2018. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 4 (2018), 1004–1020.Google ScholarDigital Library
- [33] . 2018. Online temporal calibration for monocular visual-inertial systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 3662–3669.Google Scholar
- [34] . 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6517–6525.Google ScholarCross Ref
- [35] . 2018. YOLOv3: An Incremental Improvement.
arXiv:1804.02767 . Retrieved from https://arxiv.org/abs/1804.02767.Google Scholar - [36] . 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.Google ScholarDigital Library
- [37] . 2013. AprilCal: Assisted and repeatable camera calibration. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1814–1821.Google Scholar
- [38] . 2013. SLAM++: Simultaneous localisation and mapping at the level of objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1352–1359.Google ScholarDigital Library
- [39] . 2018. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1680–1687.Google Scholar
- [40] . 2019. Revisit surround-view camera system calibration. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1486–1491.Google ScholarCross Ref
- [41] . 2020. A tightly-coupled semantic SLAM system with visual, inertial and surround-view sensors for autonomous indoor parking. In Proceedings of the ACM International Conference on Multimedia. 2691–2699.Google ScholarDigital Library
- [42] . 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 573–580.Google Scholar
- [43] . 2017. Meaningful maps with object-oriented semantic mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 5079–5085.Google Scholar
- [44] . 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13, 4 (1991), 376–380.Google ScholarDigital Library
- [45] . 2014. Automatic parking based on a bird’s eye view vision system. Adv. Mechanical Engineering 6 (2014), 847406.Google ScholarCross Ref
- [46] . 2016. AprilTag 2: Efficient and robust fiducial detection. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 193–4198.Google Scholar
- [47] . 2011. Real-time metric state estimation for modular vision-inertial systems. In Proceedings of the IEEE International Conference on Robotics and Automation. 4531–4537.Google ScholarCross Ref
- [48] . 2017. An Introduction to Inertial Navigation. Retrieved from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.pdf.Google Scholar
- [49] . 2019. CubeSLAM: Monocular 3D object SLAM. IEEE Trans. Robot. 35, 4 (2019), 925–938.Google ScholarDigital Library
- [50] . 2016. Pop-up SLAM: Semantic monocular plane SLAM for low-texture environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 1222–1229.Google Scholar
- [51] . 2018. Vision-based parking-Slot detection: A DCNN-based approach and a large-scale benchmark dataset. IEEE Trans. Image Process. 27, 11 (2018), 5350–5364.Google ScholarDigital Library
- [52] . 2000. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22, 11 (2000), 1330–1334.Google ScholarDigital Library
- [53] . 2019. Visual semantic landmark-based robust mapping and localization for autonomous indoor parking. Sensors 19, 1 (2019), 161–180.Google ScholarCross Ref
Index Terms
- SLAM for Indoor Parking: A Comprehensive Benchmark Dataset and a Tightly Coupled Semantic Framework
Recommendations
A Tightly-coupled Semantic SLAM System with Visual, Inertial and Surround-view Sensors for Autonomous Indoor Parking
MM '20: Proceedings of the 28th ACM International Conference on MultimediaThe semantic SLAM (simultaneous localization and mapping) system is an indispensable module for autonomous indoor parking. Monocular and binocular visual cameras constitute the basic configuration to build such a system. Features used in existing SLAM ...
An Indoor RGB-D Dataset for the Evaluation of Robot Navigation Algorithms
ACIVS 2013: 15th International Conference on Advanced Concepts for Intelligent Vision Systems - Volume 8192The paper presents a RGB-D dataset for development and evaluation of mobile robot navigation systems. The dataset was registered using a WiFiBot robot equipped with a Kinect sensor. Unlike the presently available datasets, the environment was ...
GNHK: A Dataset for English Handwriting in the Wild
Document Analysis and Recognition – ICDAR 2021AbstractIn this paper, we present the GoodNotes Handwriting Kollection (GNHK) dataset. The GNHK dataset includes unconstrained camera-captured images of English handwritten text sourced from different regions around the world. The dataset is modeled after ...
Comments