Skip to main content
Log in

Semi-direct tracking and mapping with RGB-D camera for MAV

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we present a novel semi-direct tracking and mapping (SDTAM) approach for RGB-D cameras which inherits the advantages of both direct and feature based methods, and consequently it achieves high efficiency, accuracy, and robustness. The input RGB-D frames are tracked with a direct method and keyframes are refined by minimizing a proposed measurement residual function which takes both geometric and depth information into account. A local optimization is performed to refine the local map while global optimization detects and corrects loops with the appearance based bag of words and a co-visibility weighted pose graph. Our method has higher accuracy on both trajectory tracking and surface reconstruction compared to state-of-the-art frame-to-frame or frame-model approaches. We test our system in challenging sequences with motion blur, fast pure rotation, and large moving objects, the results demonstrate it can still successfully obtain results with high accuracy. Furthermore, the proposed approach achieves real-time speed which only uses part of the CPU computation power, and it can be applied to embedded devices such as phones, tablets, or micro aerial vehicles (MAVs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://youtu.be/Gy_eA1a86cU

  2. http://www.danielgm.net/cc/

  3. 3 The NPU dataset is public available at http://adv-ci.com/rgbd/npu/

References

  1. Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: Computer vision–ECCV 2006. Springer, pp 404–417

  2. Bu S, Cheng S, Liu Z, Han J (2014) Multimodal feature fusion for 3d shape recognition and retrieval. IEEE Multimedia 21(4):38–46

    Article  Google Scholar 

  3. Bu S, Han P, Liu Z, Li K, Han J (2014) Shift-invariant ring feature for 3d shape. Vis Comput 30(6–8):867–876

    Article  Google Scholar 

  4. Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimedia 16(8):2154–2167

    Article  Google Scholar 

  5. Bu S, Han P, Liu Z, Han J, Lin H (2015) Local deep feature learning framework for 3d shape. Comput Graph 46:117–129

    Article  Google Scholar 

  6. Bylow E, Sturm J, Kerl C, Kahl F, Cremers D (2013) Direct camera pose tracking and mapping with signed distance functions. In: RGB-D workshop on advanced reasoning with depth cameras (RGB-D 2013)

  7. Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 1–9

  8. Endres F, Hess J, Engelhard N, Sturm J, Cremers D, Burgard W (2012) An evaluation of the rgb-d slam system. In: IEEE international conference on robotics and automation (ICRA), 2012. IEEE, pp 1691–1696

  9. Engel J, Sturm J, Cremers D (2012) Camera-based navigation of a low-cost quadrocopter. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012. IEEE, pp 2815–2821

  10. Engel J, Sturm J, Cremers D (2012) Accurate figure flying with a quadrocopter using onboard visual and inertial sensing. IMU 320:240

    Google Scholar 

  11. Engel J, Schöps T, Cremers D (2014) Lsd-slam: large-scale direct monocular slam. In: Computer Vision–ECCV 2014. Springer, pp 834–849

  12. Gálvez-López D, Tardos JD (2011) Real-time loop detection with bags of binary words. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2011. IEEE, pp 51–58

  13. Glocker B, Shotton J, Criminisi A, Izadi S (2015) Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding. IEEE Trans Vis Comput Graph 21(5):571–583

    Article  Google Scholar 

  14. Glover A, Maddern W, Warren M, Reid S, Milford M, Wyeth G (2012) Openfabmap: an open source toolbox for appearance-based loop closure detection. In: IEEE international conference on robotics and automation (ICRA), 2012, pp 4730–4735

  15. Grisetti G, Strasdat H, Konolige K, Burgard W (2011) g2o: a general framework for graph optimization

  16. Grzonka S, Grisetti G, Burgard W (2009) Towards a navigation system for autonomous indoor flying. In: IEEE international conference on robotics and automation, 2009. ICRA’09. IEEE, pp 2878– 2883

  17. Han J, Pauwels EJ, De Zeeuw PM, De With PH (2012) Employing a rgb-d sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans Consum Electron 58(2):255–263

    Article  Google Scholar 

  18. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334

    Article  Google Scholar 

  19. Han J, He S, Qian X, Wang D, Guo L, Liu T (2013) An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans Circuits Syst Video Technol 23(12):2009–2021

    Article  Google Scholar 

  20. Han J, Zhang D, Hu X, Guo L, Ren J, Wu F (2014) Background prior based salient object detection via deep reconstruction residual. IEEE Trans Circuits Syst Video Technol 25(8):1309–1321

    Google Scholar 

  21. Han J, Zhou P, Zhang D, Cheng G, Guo L, Liu Z, Bu S, Wu J (2014) Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS J Photogramm Remote Sens 89:37–48

    Article  Google Scholar 

  22. Han J, Zhang D, Cheng G, Guo L, Ren J (2015) Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens 53(6):3325–3337

    Article  Google Scholar 

  23. Han J, Chen C, Shao L, Hu X, Han J (2015) Learning computational models of video memorability from fmri brain imaging. IEEE Trans Cybern 45(8):1692–1703

    Article  Google Scholar 

  24. Handa A, Whelan T, McDonald J, Davison AJ (2014) A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 1524–1531

  25. Henry P, Krainin M, Herbst E, Ren X, Fox D (2012) Rgb-d mapping: using kinect-style depth cameras for dense 3d modeling of indoor environments. Int J Robot Res 31(5):647–663

    Article  Google Scholar 

  26. Kerl C, Sturm J, Cremers D (2013) Robust odometry estimation for rgb-d cameras. In: IEEE international conference on robotics and automation (ICRA), 2013. IEEE, pp 3748–3754

  27. Kerl C, Sturm J, Cremers D (2013) Dense visual SLAM for RGB-D cameras. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp 2100–2106

  28. Lee S-O, Lim H, Kim H-G, Ahn SC (2014) Rgb-d fusion: real-time robust tracking and dense mapping with rgb-d data fusion. In: IEEE/RSJ international conference on intelligent robots and systems (IROS 2014), 2014. IEEE, pp 2749–2754

  29. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), 2010. IEEE, pp 9–14

  30. Liu L, Shao L (2013) Learning discriminative representations from rgb-d video data. In: Proceedings of the 23rd international joint conference on artificial intelligence. AAAI Press, pp 1493–1500

  31. Lowe DG (2004) Distinctive image features from scale-invariant keypoints,. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  32. Mur-Artal R, Tardós JD (2014) Fast relocalisation and loop closing in keyframe-based slam. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 846–853

  33. Mur-Artal R, Montiel J, Tardos JD (2015) Orb-slam: a versatile and accurate monocular slam system. arXiv:1502.00956

  34. Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A (2011) Kinectfusion: Real-time dense surface mapping and tracking. In: 10th IEEE international symposium on mixed and augmented reality (ISMAR), 2011. IEEE, pp 127–136

  35. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: IEEE international conference on computer vision (ICCV), 2011. IEEE, pp 2564–2571

  36. Segal A, Haehnel D, Thrun S (2009) Generalized-icp. In: Robotics: Science and Systems, vol 2

  37. Selig J (2004) Lie groups and lie algebras in robotics. In: Computational noncommutative algebra and applications. Springer, pp 101–125

  38. Steinbrucker F, Sturm J, Cremers D (2011) Real-time visual odometry from dense rgb-d images. In: IEEE international conference on computer vision workshops (ICCV Workshops), 2011. IEEE, pp 719–722

  39. Steinbrucker F, Sturm J, Cremers D (2014) Volumetric 3d mapping in real-time on a cpu. In: IEEE international conference on robotics and automation (ICRA), 2014. IEEE, pp 2021–2028

  40. Strasdat H, Davison AJ, Montiel J, Konolig K (2011) Double window optimisation for constant time visual slam. In: IEEE international conference on computer vision (ICCV), 2011. IEEE, pp 2352– 2359

  41. Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of rgb-d slam systems. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2012. IEEE, pp 573–580

  42. Stückler J, Behnke S (2014) Multi-resolution surfel maps for efficient dense 3d modeling and tracking. J Vis Commun Image Represent 25(1):137–147

    Article  Google Scholar 

  43. Tao D, Jin L, Wang Y, Yuan Y, Li X (2013) Person re-identification by regularized smoothing kiss metric learning. IEEE Trans Circuits Syst Video Technol 23(10):1675–1685

    Article  Google Scholar 

  44. Tao D, Jin L, Liu W, Li X (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimedia 15(4):833–844

    Article  Google Scholar 

  45. Triggs B, McLauchlan PF, Hartley RI, Fitzgibbon AW (2000) Bundle adjustment–a modern synthesis. In: Vision algorithms: theory and practice. Springer, pp 298–372

  46. Whelan T, Kaess M, Fallon M, Johannsson H, Leonard J, McDonald J (2012) Kintinuous: spatially extended kinectfusion

  47. Whelan T, Kaess M, Johannsson H, Fallon M, Leonard JJ, McDonald J (2015) Real-time large-scale dense rgb-d slam with volumetric fusion. Int J Robot Res 34(4–5):598–626

    Article  Google Scholar 

  48. Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJ (2015) Elasticfusion: dense slam without a pose graph. In: Robotics: science and systems

  49. Wu C (2011) Siftgpu: A gpu implementation of scale invariant feature transform (sift)(2007), http://cs.unc.edu/ccwu/siftgpu

  50. Yu J, Tao D, Li J, Cheng J (2014) Semantic preserving distance metric learning and applications. Inf Sci 281:674–686

    Article  MathSciNet  Google Scholar 

  51. Yu M, Liu L, Shao L (2015) Structure-preserving binary representations for rgb-d action recognition. IEEE Trans Pattern Anal Mach Intell

Download references

Acknowledgments

This work is partly supported by grants from National Natural Science Foundation of China (61202185, 61473231, 61573284), the Fundamental Research Funds for the Central Universities (310201401-(JCQ01009,JCQ01012)), Open Projects Program of National Laboratory of Pattern Recognition (NLPR).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shuhui Bu or Zhenbao Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bu, S., Zhao, Y., Wan, G. et al. Semi-direct tracking and mapping with RGB-D camera for MAV. Multimed Tools Appl 76, 4445–4469 (2017). https://doi.org/10.1007/s11042-016-3524-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3524-x

Keywords

Navigation