Semi-direct tracking and mapping with RGB-D camera for MAV

Published: 22 April 2016

Volume 76, pages 4445–4469, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shuhui Bu¹,
Yong Zhao¹,
Gang Wan²,
Ke Li²,
Gong Cheng¹ &
…
Zhenbao Liu¹

732 Accesses
13 Citations
Explore all metrics

Abstract

In this paper we present a novel semi-direct tracking and mapping (SDTAM) approach for RGB-D cameras which inherits the advantages of both direct and feature based methods, and consequently it achieves high efficiency, accuracy, and robustness. The input RGB-D frames are tracked with a direct method and keyframes are refined by minimizing a proposed measurement residual function which takes both geometric and depth information into account. A local optimization is performed to refine the local map while global optimization detects and corrects loops with the appearance based bag of words and a co-visibility weighted pose graph. Our method has higher accuracy on both trajectory tracking and surface reconstruction compared to state-of-the-art frame-to-frame or frame-model approaches. We test our system in challenging sequences with motion blur, fast pure rotation, and large moving objects, the results demonstrate it can still successfully obtain results with high accuracy. Furthermore, the proposed approach achieves real-time speed which only uses part of the CPU computation power, and it can be applied to embedded devices such as phones, tablets, or micro aerial vehicles (MAVs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Similar content being viewed by others

Semi-direct Tracking and Mapping with RGB-D Camera

Chapter © 2019

Real-Time Dense Visual Odometry for RGB-D Cameras

Chapter © 2023

Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes

Article 07 April 2024

Notes

https://youtu.be/Gy_eA1a86cU
http://www.danielgm.net/cc/
³ The NPU dataset is public available at http://adv-ci.com/rgbd/npu/

References

Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: Computer vision–ECCV 2006. Springer, pp 404–417
Bu S, Cheng S, Liu Z, Han J (2014) Multimodal feature fusion for 3d shape recognition and retrieval. IEEE Multimedia 21(4):38–46
Article Google Scholar
Bu S, Han P, Liu Z, Li K, Han J (2014) Shift-invariant ring feature for 3d shape. Vis Comput 30(6–8):867–876
Article Google Scholar
Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimedia 16(8):2154–2167
Article Google Scholar
Bu S, Han P, Liu Z, Han J, Lin H (2015) Local deep feature learning framework for 3d shape. Comput Graph 46:117–129
Article Google Scholar
Bylow E, Sturm J, Kerl C, Kahl F, Cremers D (2013) Direct camera pose tracking and mapping with signed distance functions. In: RGB-D workshop on advanced reasoning with depth cameras (RGB-D 2013)
Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 1–9
Endres F, Hess J, Engelhard N, Sturm J, Cremers D, Burgard W (2012) An evaluation of the rgb-d slam system. In: IEEE international conference on robotics and automation (ICRA), 2012. IEEE, pp 1691–1696
Engel J, Sturm J, Cremers D (2012) Camera-based navigation of a low-cost quadrocopter. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012. IEEE, pp 2815–2821
Engel J, Sturm J, Cremers D (2012) Accurate figure flying with a quadrocopter using onboard visual and inertial sensing. IMU 320:240
Google Scholar
Engel J, Schöps T, Cremers D (2014) Lsd-slam: large-scale direct monocular slam. In: Computer Vision–ECCV 2014. Springer, pp 834–849
Gálvez-López D, Tardos JD (2011) Real-time loop detection with bags of binary words. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2011. IEEE, pp 51–58
Glocker B, Shotton J, Criminisi A, Izadi S (2015) Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding. IEEE Trans Vis Comput Graph 21(5):571–583
Article Google Scholar
Glover A, Maddern W, Warren M, Reid S, Milford M, Wyeth G (2012) Openfabmap: an open source toolbox for appearance-based loop closure detection. In: IEEE international conference on robotics and automation (ICRA), 2012, pp 4730–4735
Grisetti G, Strasdat H, Konolige K, Burgard W (2011) g2o: a general framework for graph optimization
Grzonka S, Grisetti G, Burgard W (2009) Towards a navigation system for autonomous indoor flying. In: IEEE international conference on robotics and automation, 2009. ICRA’09. IEEE, pp 2878– 2883
Han J, Pauwels EJ, De Zeeuw PM, De With PH (2012) Employing a rgb-d sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans Consum Electron 58(2):255–263
Article Google Scholar
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334
Article Google Scholar
Han J, He S, Qian X, Wang D, Guo L, Liu T (2013) An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans Circuits Syst Video Technol 23(12):2009–2021
Article Google Scholar
Han J, Zhang D, Hu X, Guo L, Ren J, Wu F (2014) Background prior based salient object detection via deep reconstruction residual. IEEE Trans Circuits Syst Video Technol 25(8):1309–1321
Google Scholar
Han J, Zhou P, Zhang D, Cheng G, Guo L, Liu Z, Bu S, Wu J (2014) Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS J Photogramm Remote Sens 89:37–48
Article Google Scholar
Han J, Zhang D, Cheng G, Guo L, Ren J (2015) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337
Article Google Scholar
Han J, Chen C, Shao L, Hu X, Han J (2015) Learning computational models of video memorability from fmri brain imaging. IEEE Trans Cybern 45(8):1692–1703
Article Google Scholar
Handa A, Whelan T, McDonald J, Davison AJ (2014) A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 1524–1531
Henry P, Krainin M, Herbst E, Ren X, Fox D (2012) Rgb-d mapping: using kinect-style depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663
Article Google Scholar
Kerl C, Sturm J, Cremers D (2013) Robust odometry estimation for rgb-d cameras. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 3748–3754
Kerl C, Sturm J, Cremers D (2013) Dense visual SLAM for RGB-D cameras. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp 2100–2106
Lee S-O, Lim H, Kim H-G, Ahn SC (2014) Rgb-d fusion: real-time robust tracking and dense mapping with rgb-d data fusion. In: IEEE/RSJ international conference on intelligent robots and systems (IROS 2014), 2014. IEEE, pp 2749–2754
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), 2010. IEEE, pp 9–14
Liu L, Shao L (2013) Learning discriminative representations from rgb-d video data. In: Proceedings of the 23rd international joint conference on artificial intelligence. AAAI Press, pp 1493–1500
Lowe DG (2004) Distinctive image features from scale-invariant keypoints,. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mur-Artal R, Tardós JD (2014) Fast relocalisation and loop closing in keyframe-based slam. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 846–853
Mur-Artal R, Montiel J, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. arXiv:1502.00956
Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A (2011) Kinectfusion: Real-time dense surface mapping and tracking. In: 10th IEEE international symposium on mixed and augmented reality (ISMAR), 2011. IEEE, pp 127–136
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: IEEE international conference on computer vision (ICCV), 2011. IEEE, pp 2564–2571
Segal A, Haehnel D, Thrun S (2009) Generalized-icp. In: Robotics: Science and Systems, vol 2
Selig J (2004) Lie groups and lie algebras in robotics. In: Computational noncommutative algebra and applications. Springer, pp 101–125
Steinbrucker F, Sturm J, Cremers D (2011) Real-time visual odometry from dense rgb-d images. In: IEEE international conference on computer vision workshops (ICCV Workshops), 2011. IEEE, pp 719–722
Steinbrucker F, Sturm J, Cremers D (2014) Volumetric 3d mapping in real-time on a cpu. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 2021–2028
Strasdat H, Davison AJ, Montiel J, Konolig K (2011) Double window optimisation for constant time visual slam. In: IEEE international conference on computer vision (ICCV), 2011. IEEE, pp 2352– 2359
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012. IEEE, pp 573–580
Stückler J, Behnke S (2014) Multi-resolution surfel maps for efficient dense 3d modeling and tracking. J Vis Commun Image Represent 25(1):137–147
Article Google Scholar
Tao D, Jin L, Wang Y, Yuan Y, Li X (2013) Person re-identification by regularized smoothing kiss metric learning. IEEE Trans Circuits Syst Video Technol 23(10):1675–1685
Article Google Scholar
Tao D, Jin L, Liu W, Li X (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimedia 15(4):833–844
Article Google Scholar
Triggs B, McLauchlan PF, Hartley RI, Fitzgibbon AW (2000) Bundle adjustment–a modern synthesis. In: Vision algorithms: theory and practice. Springer, pp 298–372
Whelan T, Kaess M, Fallon M, Johannsson H, Leonard J, McDonald J (2012) Kintinuous: spatially extended kinectfusion
Whelan T, Kaess M, Johannsson H, Fallon M, Leonard JJ, McDonald J (2015) Real-time large-scale dense rgb-d slam with volumetric fusion. Int J Robot Res 34(4–5):598–626
Article Google Scholar
Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJ (2015) Elasticfusion: dense slam without a pose graph. In: Robotics: science and systems
Wu C (2011) Siftgpu: A gpu implementation of scale invariant feature transform (sift)(2007), http://cs.unc.edu/ccwu/siftgpu
Yu J, Tao D, Li J, Cheng J (2014) Semantic preserving distance metric learning and applications. Inf Sci 281:674–686
Article MathSciNet Google Scholar
Yu M, Liu L, Shao L (2015) Structure-preserving binary representations for rgb-d action recognition. IEEE Trans Pattern Anal Mach Intell

Download references

Acknowledgments

This work is partly supported by grants from National Natural Science Foundation of China (61202185, 61473231, 61573284), the Fundamental Research Funds for the Central Universities (310201401-(JCQ01009,JCQ01012)), Open Projects Program of National Laboratory of Pattern Recognition (NLPR).

Author information

Authors and Affiliations

Northwestern Polytechnical University, 710072, Xi’an, China
Shuhui Bu, Yong Zhao, Gong Cheng & Zhenbao Liu
Information Engineering University, 450000, Zhengzhou, China
Gang Wan & Ke Li

Authors

Shuhui Bu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wan
View author publications
You can also search for this author in PubMed Google Scholar
Ke Li
View author publications
You can also search for this author in PubMed Google Scholar
Gong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhenbao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shuhui Bu or Zhenbao Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bu, S., Zhao, Y., Wan, G. et al. Semi-direct tracking and mapping with RGB-D camera for MAV. Multimed Tools Appl 76, 4445–4469 (2017). https://doi.org/10.1007/s11042-016-3524-x

Download citation

Received: 31 October 2015
Revised: 20 March 2016
Accepted: 07 April 2016
Published: 22 April 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11042-016-3524-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions