Abstract
Accurate localization of a moving object is important in many robotic tasks. Often an elaborate motion capture system is used to realize it. While high precision is guaranteed, such a complicated system is costly and limited to specified small size workspace. This paper describes a lightweight and scalable visual-inertial approach, which leverages paper printable, known size and unknown pose, artificial landmarks, as called fiducials, to obtain motion state estimates, including pose and velocity. Visual-inertial joint optimization using incremental smoother over factor graph and the IMU preintegration technique make our method efficient and accurate. No special hardware is required except a monocular camera and an IMU, making our system lightweight and easy to deploy. Using paper printable landmarks, as well as the efficient incremental inference algorithm, renders it nearly constant-time complexity and scalable to large-scale environment. We perform an extensive evaluation of our method on public datasets and real-world experiments. Results show our method achieves accurate state estimates and is scalable to large-scale environment and robust to fast motion and changing light condition. Besides, our method has the ability to recover from intermediate failure.
Similar content being viewed by others
References
Botterill, T., Mills, S., & Green, R. (2013). Correcting scale drift by object recognition in single-camera slam. IEEE Transactions on Cybernetics, 43(6), 1767–1780.
Concha, A., Loianno, G., Kumar, V., & Civera, J. (2016). Visual-inertial direct slam. In IEEE international conference on robotics and automation (pp. 1331–1338).
Dellaert, F. (2012). Factor graphs and gtsam: A hands-on introduction. Atlanta: Georgia Institute of Technology.
Engel, J., Schps, T., & Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In European conference on computer vision (ECCV) (pp. 834–849).
Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP(99), 1–1.
Faessler, M., Mueggler, E., Schwabe, K., & Scaramuzza, D. (2014). A monocular pose estimation system based on infrared leds. In IEEE international conference on robotics and automation (pp. 907 – 913).
Fiala, M. (2005). Artag, a fiducial marker system using digital techniques. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 590–596).
Fiala, M. (2010). Designing highly reliable fiducial markers. IEEE Transactions on Pattern Analysis & Machine Intelligence, 32(7), 1317–24.
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). Svo: Fast semi-direct monocular visual odometry. In IEEE international conference on robotics and automation (pp. 15–22).
Forster, C., Carlone, L., Dellaert, F., & Scaramuzza, D. (2017). On-manifold preintegration for real-time visual-inertial odometry. IEEE Transactions on Robotics, 33(1), 1–21.
Frost, D. P., Khler, O., & Murray, D. W. (2016). Object-aware bundle adjustment for correcting monocular scale drift. In IEEE international conference on robotics and automation (pp. 4770–4776).
Furgale, P., Rehder, J., & Siegwart, R. (2014). Unified temporal and spatial calibration for multi-sensor systems. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1280–1286).
Gálvez-López, D., Salas, M., Tardós, J. D., & Montiel, J. (2016). Real-time monocular object slam. Robotics & Autonomous Systems, 75(PB), 435–449.
Hauke, S. (2012). Local accuracy and global consistency for efficient slam. London: Imperial College London.
Hauke, S., Montiel, J. M. M., & Davison, A. (2010). Scale drift-aware large scale monocular slam. In Robotics: Science and systems
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. J., & Dellaert, F. (2011). isam2: Incremental smoothing and mapping using the bayes tree. International Journal of Robotics Research, 31(2), 216–235.
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for smallar workspaces. In IEEE and ACM international symposium on mixed and augmented reality (pp. 1–10).
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., & Furgale, P. (2014). Keyframe-based visual-inertial odometry using nonlinear optimization. International Journal of Robotics Research, 34(3), 314–334.
Lim, H., & Lee, Y. S. (2009). Real-time single camera slam using fiducial markers. In Iccas-sice (pp. 177–182).
Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint kalman filter for vision-aided inertial navigation. In IEEE international conference on robotics and automation (pp. 3565–3572).
Mur-Artal, R., Montiel, J. M. M., & Tards, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5), 1147–1163.
Neunert, M., Bloesch, M., & Buchli, J. (2016). An open source, fiducial based, visual-inertial motion capture system. Epigenetics Official Journal of the Dna Methylation Society, 7(7), 710–9.
Olson, E. (2011). Apriltag: A robust and flexible visual fiducial system. In IEEE international conference on robotics and automation (pp. 3400–3407).
Qiu, K., Zhang, F., & Liu, M. (2015). Visible light communication-based indoor environment modeling and metric-free path planning. In IEEE international conference on automation science and engineering (pp. 200–205).
Sementille, A. C., & Rodello, I. (2004). A motion capture system using passive markers. In Vrcai 2004, ACM siggraph international conference on virtual reality continuum and ITS applications in industry, Nanyang technological university, Singapore (pp. 440–447).
Usenko, V., Engel, J., Stuckler, J., & Cremers, D. (2016). Direct visual-inertial odometry with stereo cameras. In IEEE international conference on robotics and automation (pp. 1885–1892).
Acknowledgements
Special thanks are given to Zheming Liu and Junlin Song for their help in data collection.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A. Background
In the following, we wish to derive the partial derivatives of the transformation \({\varvec{\pi }}\left( {\varvec{T}}\cdot {\varvec{l}}\right) \) with respect to the pose \({\varvec{T}}\). According to Hauke (2012), this can be calculated using the smooth path \(\varvec{T}\left( t\right) ={\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \):
where \({\varvec{q}={\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}}\), \({\delta {\varvec{\xi }}}= \left[ \begin{array}{cc} {\delta {\varvec{\rho }}}&{\delta {\varvec{\phi }}} \end{array}\right] ^T\), \({\delta {\varvec{\xi }}}\in \mathfrak {se}\)(3), \({\delta {\varvec{\phi }}\in \mathfrak {so}}\)(3) and \({\delta {\varvec{\rho }}} \in \mathbb {R}^3\). \({\varvec{J}}_r\) denotes the Jacobian matrix of the pinhole camera model with respect to the 3-dimension landmark point coordinates expressed in the camera frame. We will also use the exponential map property:
1.2 B. Jacobians
This section provides the Jacobians of the tag reprojection error with respect to the moving object pose \(\varvec{T}_{WB}\) and the tag pose \(\varvec{T}_{WA}\). The tag reprojection error of the \(n\mathrm{th}\) corner in the \(j\mathrm{th}\) tag at image time \(t_i\) is
1. Jacobian of the tag reprojection error with respect to the moving object pose \(\varvec{T}_{WB}\):
The reprojection error with respect to the rotational increment is:
where \(_W{\varvec{l}}^n=\left[ \begin{array}{cc} _W{\varvec{l}}^n_{1:3}&1 \end{array}\right] ^T={\varvec{T}}_{WA_j}\cdot { \varvec{l}_j^n}\). We can get the Jacobian of the tag reprojection error with respect to the moving object orientation using Eqs. (30) and (31):
The reprojection error with respect to the translational increment is:
and the Jacobian of the tag reprojection error with respect to the moving object translation is:
2. Jacobian of the tag reprojection error with respect to the tag pose \(\varvec{T}_{WA}\): The reprojection error with respect to the SE(3) increment is:
so we can get the Jacobian of the tag reprojection error with respect to the tag pose using Eq. (30):
Rights and permissions
About this article
Cite this article
He, G., Zhong, S. & Guo, J. A lightweight and scalable visual-inertial motion capture system using fiducial markers. Auton Robot 43, 1895–1915 (2019). https://doi.org/10.1007/s10514-019-09834-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-019-09834-7