Abstract
In this paper, we present a system for estimating the trajectory of a moving RGB-D camera with applications to building maps of large indoor environments. Unlike the current most researches, we propose a ‘feature model’ based RGB-D visual odometry system for a computationally-constrained mobile platform, where the ‘feature model’ is persistent and dynamically updated from new observations using a Kalman filter. In this paper, we firstly propose a mixture of Gaussians model for the depth random noise estimation, which is used to describe the spatial uncertainty of the feature point cloud. Besides, we also introduce a general depth calibration method to remove systematic errors in the depth readings of the RGB-D camera. We provide comprehensive theoretical and experimental analysis to demonstrate that our model based iterative-closest-point (ICP) algorithm can achieve much higher localization accuracy compared to the conventional ICP. The visual odometry runs at frequencies of 30 Hz or higher, on VGA images, in a single thread on a desktop CPU with no GPU acceleration required. Finally, we examine the problem of place recognition from RGB-D images, in order to form a pose-graph SLAM approach to refining the trajectory and closing loops. We evaluate the effectiveness of the system on using publicly available datasets with ground-truth data. The entire system is available for free and open-source online.
Similar content being viewed by others
References
Basso, F., Menegatti, E., & Pretto, A. (2018). Robust intrinsic and extrinsic calibration of RGB-D cameras. IEEE Transactions on Robotics, 34(5), 1315–1332.
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.
Besl, P., & McKay, N. (1992). A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 239–256.
Bradski, G. (2000). The OpenCV library. Dr. Dobb’s Journal of Software Tools, 25, 120–125.
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., et al. (2016). The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research. https://doi.org/10.1177/0278364915620033.
Dryanovski, I., Jaramillo, C., & Xiao, J. (2012). Incremental registration of RGB-D images. In IEEE International conference on robotics and automation (ICRA).
Dryanovski, I., Valenti, R. G., & Xiao, J. (2013). Fast visual odometry and mapping from RGB-D data. In International conference on robotics and automation (Vol. 10031).
Endres, F., Hess, J., Cremers, D., & Engelhard, N. (2012). An evaluation of the RGB-D SLAM system. Perception, 3(c), 1691–1696.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Garrido-Jurado, S., Munoz Salinas, R. M., Madrid-Cuevas, F., & Marín-Jiménez, M. (2014). Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6), 2280–2292. https://doi.org/10.1016/j.patcog.2014.01.005.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The kitti vision benchmark suite. In Conference on computer vision and pattern recognition (CVPR).
Google: Project Tango (2014). Retrieved from https://en.wikipedia.org/wiki/Tango_(platform).
Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2010). RGB-D mapping: Using depth cameras for dense 3d modeling of indoor environments. In RGB-D: Advanced reasoning with depth cameras workshop in conjunction with RSS.
Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2012). RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research, 31(5), 647–663.
Herrera, D., Kannala, J., & Heikkilä, J. (2010). Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 2058–2064. https://doi.org/10.1109/TPAMI.2012.125.
James Bowman, P. M. (2017). ROS camera calibration. Retrieved from http://wiki.ros.org/camera_calibration.
Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., & Aanæs, H. (2014). Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 406–413)
Karan, B. (2015). Calibration of kinect-type RGB-D sensors for robotic applications. Fme Transactions, 43, 47–54.
Kerl, C., Sturm, J., & Cremers, D. (2013). Dense visual slam for RGB-D cameras. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2100–2106). IEEE.
Kerl, C., Sturm, J., & Cremers, D. (2013). Dense visual slam for RGBD-D cameras. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2100–2106). IEEE.
Khoshelham, K., & Elberink, S. O. (2012). Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors, 12(2), 1437–1454. https://doi.org/10.3390/s120201437.
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM international symposium on mixed and augmented reality (pp. 1–10). IEEE Computer Society.
Kummerle, R., Grisetti, G., Strasdat, H., Konolige, K., & Burgard, W. (2011). g2o: A general framework for graph optimization. In ICRA. Shanghai.
Meilland, M., & Comport, A. I. (2013). On unifying key-frame and voxel-based dense visual SLAM at large scales. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3677–3683). IEEE.
Muja, M., & Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International conference on computer vision theory and application VISSAPP’09 (Vol. 340, pp. 331–340). INSTICC Press.
Mur-Artal, R., & Tardós, J. D. (2017). ORB-SLAM2: An open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics, 33(5), 1255–1262.
Newcombe, R. A., Davison, A. J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., et al. (2011). KinectFusion: Real-time dense surface mapping and tracking. In 10th IEEE international symposium on mixed and augmented reality (ISMAR) (pp. 127–136).
Nguyen, C.V., Izadi, S., & Lovell, D. (2012). Modeling Kinect sensor noise for improved 3d reconstruction and tracking. In Second international conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT) (pp. 524–530). IEEE.
Olesen, S. M., Lyder, S., Kraft, D., Krüger, N., & Jessen, J. B. (2012). Real-time extraction of surface patches with associated uncertainties by means of Kinect cameras. Journal of Real-Time Image Processing, 10, 1–14. https://doi.org/10.1007/s11554-012-0261-x.
Park, J. H., Shin, Y. D., Bae, J. H., & Baeg, M. H. (2012). Spatial uncertainty model for visual features using a Kinect\(^{TM}\) sensor. Sensors, 12(7), 8640–8662.
Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). ROS: An open-source robot operating system. In International conference on robotics and automation, Figure 1.
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In IEEE international conference on computer vision (ICCV) (pp. 2564–2571). https://doi.org/10.1109/ICCV.2011.6126544.
Segal, A., Haehnel, D., & Thrun, S. (2009). Generalized-ICP. In Robotics: Science and systems. Seattle, USA.
Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE computer society conference on computer vision and pattern recognition. Proceedings CVPR ’94 (pp. 593–600). https://doi.org/10.1109/CVPR.1994.323794.
Smisek, J., Jancosek, M., & Pajdla, T. (2011). 3D with Kinect. In IEEE international conference on computer vision workshops (ICCV workshops) (pp. 1154–1160).
Steinbrucker, F., Sturm, J., & Cremers, D. (2011). Real-time visual odometry from dense RGB-D images. In IEEE international conference on computer vision workshops (ICCV workshops) (pp. 719–722). https://doi.org/10.1109/ICCVW.2011.6130321.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the international conference on intelligent robot systems (IROS).
Stutz, D., & Geiger, A. (2018). Learning 3d shape completion from laser scan data with weak supervision. In IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society.
Teichman, A., Miller, S., & Thrun, S. (2013). Unsupervised intrinsic calibration of depth sensors via SLAM. In Robotics: Science and systems.
Wang, Y. M., Li, Y., & Zheng, J. B. (2010). A camera calibration technique based on OpenCV. In 3rd International conference on information sciences and interaction sciences (ICIS) (pp. 403–406).
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B., Davison, A. (2015). Elasticfusion: Dense slam without a pose graph. In Robotics: Science and systems.
Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1330–1334.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported in part by U.S. Army Research Office under Grant No. W911NF-09-1-0565, U.S. National Science Foundation under Grants No. IIS-0644127 and No. CBET-1160046, Federal High-Way Administration (FHWA) under Grant Nos. DTFH61-12-H-00002 and PSC-CUNY under Grant No. 65789-00-43.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yang, L., Dryanovski, I., Valenti, R.G. et al. RGB-D camera calibration and trajectory estimation for indoor mapping. Auton Robot 44, 1485–1503 (2020). https://doi.org/10.1007/s10514-020-09941-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-020-09941-w