Abstract
Dense SLAM is an important application on an embedded environment. However, embedded platforms usually fail to provide enough computation resources for high-accuracy real-time dense SLAM, even with high-parallelism architecture such as GPUs. To tackle this problem, one solution is to design proper approximation techniques for dense SLAM on embedded GPUs. In this work, we propose two novel approximation techniques, critical data identification and redundant branch elimination. We also analyze the error characteristics of the other two techniques—loop skipping and thread approximation. Then, we propose SLaPP, an online adaptive approximation controller, which aims to control the error to be under an acceptable threshold. The evaluation shows SLaPP can achieve 2.0× performance speedup and 30% energy saving on average compared to the case without approximation.
- [1] . 2018. Embedding SLAM algorithms: Has it come of age? Robotics and Autonomous Systems 100 (2018), 14–26.Google ScholarCross Ref
- [2] . 2020. YOLO v3-Tiny: Object detection and recognition using one stage improved model. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). 687–694.Google ScholarCross Ref
- [3] . 2008. Real-time visual loop-closure detection. In 2008 IEEE International Conference on Robotics and Automation. IEEE, 1842–1847.Google ScholarCross Ref
- [4] . 2006. Simultaneous localization and mapping (SLAM): Part II. IEEE Robotics and Automation Magazine 13, 3 (2006), 108–117.Google ScholarCross Ref
- [5] . 2016. Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding. In 2016 International Conference on Parallel Architectures and Compilation. 57–69. Google ScholarDigital Library
- [6] . 2018. SLAMBench2: Multi-objective head-to-head benchmarking for visual SLAM. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3637–3644.Google Scholar
- [7] . 2016. Semi-dense SLAM on an FPGA SoC. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1–4.Google Scholar
- [8] . 2017. Simultaneous localization and mapping: A survey of current trends in autonomous driving. IEEE Transactions on Intelligent Vehicles 2, 3 (2017), 194–220.Google ScholarCross Ref
- [9] . 2016. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics 32, 6 (2016), 1309–1332. Google ScholarDigital Library
- [10] . 2018. Towards efficient microarchitecture design of simultaneous localization and mapping in augmented reality era. In 2018 IEEE 36th International Conference on Computer Design (ICCD). IEEE, 397–404.Google ScholarCross Ref
- [11] . 2015. Information-based reduced landmark SLAM. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4620–4627.Google ScholarCross Ref
- [12] . 1996. A volumetric method for building complex models from range images. In 23rd Annual Conference on Computer Graphics and Interactive Techniques. 303–312. Google ScholarDigital Library
- [13] . 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In 24th Annual ACM Symposium on User Interface Software and Technology (UIST 2011). 559–568. Google ScholarDigital Library
- [14] . 2007. MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 1052–1067. Google ScholarDigital Library
- [15] . 2018. A benchmark comparison of monocular visual-inertial odometry algorithms for flying robots. In IEEE International Conference on Robotics and Automation. 2502–2509.Google ScholarCross Ref
- [16] . 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google ScholarCross Ref
- [17] . 2017. FPGA-based ORB feature extraction for real-time visual SLAM. In 2017 International Conference on Field Programmable Technology (ICFPT). IEEE, 275–278.Google ScholarCross Ref
- [18] . 2011. IMPACT: Imprecise adders for low-power approximate computing. In IEEE/ACM International Symposium on Low Power Electronics and Design. IEEE, 409–414. Google ScholarDigital Library
- [19] . 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In IEEE International Conference on Robotics and Automation.Google ScholarCross Ref
- [20] . 2014. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Experimental Robotics. Springer, 477–491. Google ScholarDigital Library
- [21] . 2019. A dynamic adaptation strategy for energy-efficient keyframe-based visual SLAM. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). 3–10.Google Scholar
- [22] . 2015. Rumba: An online quality management system for approximate computing. In 42nd Annual International Symposium on Computer Architecture. 554–566. Google ScholarDigital Library
- [23] . 2007. Parallel tracking and mapping for small AR workspaces. In IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE, 225–234. Google ScholarDigital Library
- [24] . 2015. Project beehive: A hardware/software co-designed stack for runtime and architectural research. arXiv preprint arXiv:1509.04085 (2015).Google Scholar
- [25] . 2012. GPU-based real-time RGB-D 3D slam. In 2012 9th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI). IEEE, 46–48.Google ScholarCross Ref
- [26] . 2018. Sculptor: Flexible approximation with selective dynamic loop perforation. In 2018 International Conference on Supercomputing. 341–351. Google ScholarDigital Library
- [27] . 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, 740–755.Google Scholar
- [28] . 2017. G-Scalar: Cost-effective generalized scalar execution architecture for power-efficient GPUs. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 601–612.Google ScholarCross Ref
- [29] . 2006. View Planning for Range Acquisition of Indoor Environments. Ph.D. Dissertation. University of North Carolina at Chapel Hill. Google ScholarDigital Library
- [30] . 2007. Predetermination of ICP registration errors and its application to view planning. In 6th International Conference on 3-D Digital Imaging and Modeling (3DIM 2007). IEEE, 73–80. Google ScholarDigital Library
- [31] . 2017. Hardware implementation and optimization of tiny-YOLO network. In International Forum on Digital TV and Wireless Multimedia Communications. Springer, 224–234.Google Scholar
- [32] . 2016. Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration. ACM SIGARCH Computer Architecture News 44, 3 (2016), 66–77. Google ScholarDigital Library
- [33] . 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. Google ScholarDigital Library
- [34] . 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147–1163.Google ScholarDigital Library
- [35] . 2015. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5783–5790.Google ScholarCross Ref
- [36] . 2012. KinectFusion: Real-time dense surface mapping and tracking. Ismar Basel Switzerland Oct 4, 6 (2012), 127–136. Google ScholarDigital Library
- [37] . 2004. Visual odometry. In 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 (CVPR 2004), Vol. 1. IEEE, 652–659. https://dblp.org/pid/07/1533.html.Google Scholar
- [38] . 2019. SLAMBooster: An application-aware online controller for approximation in dense SLAM. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).Google Scholar
- [39] . 2020. A methodology for principled approximation in visual SLAM. In ACM International Conference on Parallel Architectures and Compilation Techniques. 373–386. Google ScholarDigital Library
- [40] . 2020. Mobinet: A mobile binary network for image classification. In IEEE Winter Conference on Applications of Computer Vision. 3453–3462.Google Scholar
- [41] . 2017. Infinitam v3: A framework for large-scale 3D reconstruction with loop closure. arXiv preprint arXiv:1708.00783 (2017).Google Scholar
- [42] . 2013. Spatial memorization: Concurrent instruction reuse to correct timing errors in SIMD architectures. IEEE Transactions on Circuits and Systems II: Express Briefs 60, 12 (2013), 847–851.Google ScholarCross Ref
- [43] . 2013. GPU accelerated graph SLAM and occupancy voxel -based ICP for encoder-free mobile robots. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 540–547.Google ScholarCross Ref
- [44] . 2018. YOLOv3: An incremental improvement. arXiv (2018).Google Scholar
- [45] . 2001. Efficient variants of the ICP algorithm. In 3rd International Conference on 3-D Digital Imaging and Modeling. IEEE, 145–152.Google Scholar
- [46] . 2017. Application-oriented design space exploration for SLAM algorithms. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5716–5723.Google ScholarCross Ref
- [47] . 2014. Paraprox: Pattern-based approximation for data parallel applications. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems. 35–50. Google ScholarDigital Library
- [48] . 2013. Sage: Self-tuning approximation for graphics engines. In 46th Annual IEEE/ACM International Symposium on Microarchitecture. 13–24. Google ScholarDigital Library
- [49] . 2012. Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications. IEEE Transactions on Multimedia 15, 2 (2012), 279–290. Google ScholarDigital Library
- [50] . 2019. Automatic adaptive approximation for stencil computations. In 28th International Conference on Compiler Construction. 170–181. Google ScholarDigital Library
- [51] . 2011. Managing performance vs. accuracy trade-offs with loop perforation. In 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 124–134. Google ScholarDigital Library
- [52] . 2012. A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 573–580.Google ScholarCross Ref
- [53] . 2020. Trajectory drift-compensated solution of a stereo RGB-D mapping system. Photogrammetric Engineering Remote Sensing 86, 6 (2020), 359–372.Google ScholarCross Ref
- [54] . 2016. Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–14. Google ScholarDigital Library
- [55] . 2018. Globally consistent dense real-time 3D reconstruction from RGBD data. In OAGM Workshop 2018. 120–127.Google Scholar
- [56] . 2016. Approximating warps with intra-warp operand value similarity. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 176–187.Google ScholarCross Ref
- [57] . 2015. Neural acceleration for GPU throughput processors. In 48th International Symposium on Microarchitecture. 482–493. Google ScholarDigital Library
- [58] . 2014. Low power GPGPU computation with imprecise hardware. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1–6. Google ScholarDigital Library
- [59] . 2015. ApproxANN: An approximate computing framework for artificial neural network. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 701–706. Google ScholarDigital Library
- [60] . 2014. Approxit: An approximate computing framework for iterative methods. In 51st Annual Design Automation Conference. 1–6. Google ScholarDigital Library
- [61] . 2016. Comparative design space exploration of dense and semi-dense SLAM. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1292–1299.Google ScholarDigital Library
Index Terms
- Towards Fine-Grained Online Adaptive Approximation Control for Dense SLAM on Embedded GPUs
Recommendations
SFU-Driven Transparent Approximation Acceleration on GPUs
ICS '16: Proceedings of the 2016 International Conference on SupercomputingApproximate computing, the technique that sacrifices certain amount of accuracy in exchange for substantial performance boost or power reduction, is one of the most promising solutions to enable power control and performance scaling towards exascale. ...
Efficient Convex Optimization on GPUs for Embedded Model Predictive Control
GPGPU-10: Proceedings of the General Purpose GPUsGPU applications have traditionally run on PCs or in larger scale systems. With the introduction of the Tegra line of mobile processors, NVIDIA expanded the types of systems that can exploit the massive parallelism offered by GPU computing ...
Fine-Grained Synchronizations and Dataflow Programming on GPUs
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingThe last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming ...
Comments