Abstract
In this paper we present a novel semi-direct tracking and mapping (SDTAM) approach for RGB-D cameras which inherits the advantages of both direct and feature based methods, and consequently it achieves high efficiency, accuracy, and robustness. The input RGB-D frames are tracked with a direct method and keyframes are refined by minimizing a proposed measurement residual function which takes both geometric and depth information into account. A local optimization is performed to refine the local map while global optimization detects and corrects loops with the appearance based bag of words and a co-visibility weighted pose graph. Our method has higher accuracy on both trajectory tracking and surface reconstruction compared to state-of-the-art frame-to-frame or frame-model approaches. We test our system in challenging sequences with motion blur, fast pure rotation, and large moving objects, the results demonstrate it can still successfully obtain results with high accuracy. Furthermore, the proposed approach achieves real-time speed which only uses part of the CPU computation power, and it can be applied to embedded devices such as phones, tablets, or micro aerial vehicles (MAVs).
Similar content being viewed by others
Notes
3 The NPU dataset is public available at http://adv-ci.com/rgbd/npu/
References
Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: Computer vision–ECCV 2006. Springer, pp 404–417
Bu S, Cheng S, Liu Z, Han J (2014) Multimodal feature fusion for 3d shape recognition and retrieval. IEEE Multimedia 21(4):38–46
Bu S, Han P, Liu Z, Li K, Han J (2014) Shift-invariant ring feature for 3d shape. Vis Comput 30(6–8):867–876
Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimedia 16(8):2154–2167
Bu S, Han P, Liu Z, Han J, Lin H (2015) Local deep feature learning framework for 3d shape. Comput Graph 46:117–129
Bylow E, Sturm J, Kerl C, Kahl F, Cremers D (2013) Direct camera pose tracking and mapping with signed distance functions. In: RGB-D workshop on advanced reasoning with depth cameras (RGB-D 2013)
Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 1–9
Endres F, Hess J, Engelhard N, Sturm J, Cremers D, Burgard W (2012) An evaluation of the rgb-d slam system. In: IEEE international conference on robotics and automation (ICRA), 2012. IEEE, pp 1691–1696
Engel J, Sturm J, Cremers D (2012) Camera-based navigation of a low-cost quadrocopter. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012. IEEE, pp 2815–2821
Engel J, Sturm J, Cremers D (2012) Accurate figure flying with a quadrocopter using onboard visual and inertial sensing. IMU 320:240
Engel J, Schöps T, Cremers D (2014) Lsd-slam: large-scale direct monocular slam. In: Computer Vision–ECCV 2014. Springer, pp 834–849
Gálvez-López D, Tardos JD (2011) Real-time loop detection with bags of binary words. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2011. IEEE, pp 51–58
Glocker B, Shotton J, Criminisi A, Izadi S (2015) Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding. IEEE Trans Vis Comput Graph 21(5):571–583
Glover A, Maddern W, Warren M, Reid S, Milford M, Wyeth G (2012) Openfabmap: an open source toolbox for appearance-based loop closure detection. In: IEEE international conference on robotics and automation (ICRA), 2012, pp 4730–4735
Grisetti G, Strasdat H, Konolige K, Burgard W (2011) g2o: a general framework for graph optimization
Grzonka S, Grisetti G, Burgard W (2009) Towards a navigation system for autonomous indoor flying. In: IEEE international conference on robotics and automation, 2009. ICRA’09. IEEE, pp 2878– 2883
Han J, Pauwels EJ, De Zeeuw PM, De With PH (2012) Employing a rgb-d sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans Consum Electron 58(2):255–263
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334
Han J, He S, Qian X, Wang D, Guo L, Liu T (2013) An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans Circuits Syst Video Technol 23(12):2009–2021
Han J, Zhang D, Hu X, Guo L, Ren J, Wu F (2014) Background prior based salient object detection via deep reconstruction residual. IEEE Trans Circuits Syst Video Technol 25(8):1309–1321
Han J, Zhou P, Zhang D, Cheng G, Guo L, Liu Z, Bu S, Wu J (2014) Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS J Photogramm Remote Sens 89:37–48
Han J, Zhang D, Cheng G, Guo L, Ren J (2015) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337
Han J, Chen C, Shao L, Hu X, Han J (2015) Learning computational models of video memorability from fmri brain imaging. IEEE Trans Cybern 45(8):1692–1703
Handa A, Whelan T, McDonald J, Davison AJ (2014) A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 1524–1531
Henry P, Krainin M, Herbst E, Ren X, Fox D (2012) Rgb-d mapping: using kinect-style depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663
Kerl C, Sturm J, Cremers D (2013) Robust odometry estimation for rgb-d cameras. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 3748–3754
Kerl C, Sturm J, Cremers D (2013) Dense visual SLAM for RGB-D cameras. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp 2100–2106
Lee S-O, Lim H, Kim H-G, Ahn SC (2014) Rgb-d fusion: real-time robust tracking and dense mapping with rgb-d data fusion. In: IEEE/RSJ international conference on intelligent robots and systems (IROS 2014), 2014. IEEE, pp 2749–2754
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), 2010. IEEE, pp 9–14
Liu L, Shao L (2013) Learning discriminative representations from rgb-d video data. In: Proceedings of the 23rd international joint conference on artificial intelligence. AAAI Press, pp 1493–1500
Lowe DG (2004) Distinctive image features from scale-invariant keypoints,. Int J Comput Vis 60(2):91–110
Mur-Artal R, Tardós JD (2014) Fast relocalisation and loop closing in keyframe-based slam. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 846–853
Mur-Artal R, Montiel J, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. arXiv:1502.00956
Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A (2011) Kinectfusion: Real-time dense surface mapping and tracking. In: 10th IEEE international symposium on mixed and augmented reality (ISMAR), 2011. IEEE, pp 127–136
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: IEEE international conference on computer vision (ICCV), 2011. IEEE, pp 2564–2571
Segal A, Haehnel D, Thrun S (2009) Generalized-icp. In: Robotics: Science and Systems, vol 2
Selig J (2004) Lie groups and lie algebras in robotics. In: Computational noncommutative algebra and applications. Springer, pp 101–125
Steinbrucker F, Sturm J, Cremers D (2011) Real-time visual odometry from dense rgb-d images. In: IEEE international conference on computer vision workshops (ICCV Workshops), 2011. IEEE, pp 719–722
Steinbrucker F, Sturm J, Cremers D (2014) Volumetric 3d mapping in real-time on a cpu. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 2021–2028
Strasdat H, Davison AJ, Montiel J, Konolig K (2011) Double window optimisation for constant time visual slam. In: IEEE international conference on computer vision (ICCV), 2011. IEEE, pp 2352– 2359
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012. IEEE, pp 573–580
Stückler J, Behnke S (2014) Multi-resolution surfel maps for efficient dense 3d modeling and tracking. J Vis Commun Image Represent 25(1):137–147
Tao D, Jin L, Wang Y, Yuan Y, Li X (2013) Person re-identification by regularized smoothing kiss metric learning. IEEE Trans Circuits Syst Video Technol 23(10):1675–1685
Tao D, Jin L, Liu W, Li X (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimedia 15(4):833–844
Triggs B, McLauchlan PF, Hartley RI, Fitzgibbon AW (2000) Bundle adjustment–a modern synthesis. In: Vision algorithms: theory and practice. Springer, pp 298–372
Whelan T, Kaess M, Fallon M, Johannsson H, Leonard J, McDonald J (2012) Kintinuous: spatially extended kinectfusion
Whelan T, Kaess M, Johannsson H, Fallon M, Leonard JJ, McDonald J (2015) Real-time large-scale dense rgb-d slam with volumetric fusion. Int J Robot Res 34(4–5):598–626
Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJ (2015) Elasticfusion: dense slam without a pose graph. In: Robotics: science and systems
Wu C (2011) Siftgpu: A gpu implementation of scale invariant feature transform (sift)(2007), http://cs.unc.edu/ccwu/siftgpu
Yu J, Tao D, Li J, Cheng J (2014) Semantic preserving distance metric learning and applications. Inf Sci 281:674–686
Yu M, Liu L, Shao L (2015) Structure-preserving binary representations for rgb-d action recognition. IEEE Trans Pattern Anal Mach Intell
Acknowledgments
This work is partly supported by grants from National Natural Science Foundation of China (61202185, 61473231, 61573284), the Fundamental Research Funds for the Central Universities (310201401-(JCQ01009,JCQ01012)), Open Projects Program of National Laboratory of Pattern Recognition (NLPR).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Bu, S., Zhao, Y., Wan, G. et al. Semi-direct tracking and mapping with RGB-D camera for MAV. Multimed Tools Appl 76, 4445–4469 (2017). https://doi.org/10.1007/s11042-016-3524-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3524-x