A lightweight and scalable visual-inertial motion capture system using fiducial markers

He, Guoping; Zhong, Shangkun; Guo, Jifeng

doi:10.1007/s10514-019-09834-7

A lightweight and scalable visual-inertial motion capture system using fiducial markers

Published: 19 February 2019

Volume 43, pages 1895–1915, (2019)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

648 Accesses
4 Citations
Explore all metrics

Abstract

Accurate localization of a moving object is important in many robotic tasks. Often an elaborate motion capture system is used to realize it. While high precision is guaranteed, such a complicated system is costly and limited to specified small size workspace. This paper describes a lightweight and scalable visual-inertial approach, which leverages paper printable, known size and unknown pose, artificial landmarks, as called fiducials, to obtain motion state estimates, including pose and velocity. Visual-inertial joint optimization using incremental smoother over factor graph and the IMU preintegration technique make our method efficient and accurate. No special hardware is required except a monocular camera and an IMU, making our system lightweight and easy to deploy. Using paper printable landmarks, as well as the efficient incremental inference algorithm, renders it nearly constant-time complexity and scalable to large-scale environment. We perform an extensive evaluation of our method on public datasets and real-world experiments. Results show our method achieves accurate state estimates and is scalable to large-scale environment and robust to fast motion and changing light condition. Besides, our method has the ability to recover from intermediate failure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Novel Approach for Robust and Effective Pose Estimation via Visual-Inertial Fusion

Edge alignment-based visual–inertial fusion for tracking of aggressive motions

Article 01 July 2017

Markerless 3D Human Pose Tracking in the Wild with Fusion of Multiple Depth Cameras: Comparative Experimental Study with Kinect 2 and 3

Notes

References

Botterill, T., Mills, S., & Green, R. (2013). Correcting scale drift by object recognition in single-camera slam. IEEE Transactions on Cybernetics, 43(6), 1767–1780.
Article Google Scholar
Concha, A., Loianno, G., Kumar, V., & Civera, J. (2016). Visual-inertial direct slam. In IEEE international conference on robotics and automation (pp. 1331–1338).
Dellaert, F. (2012). Factor graphs and gtsam: A hands-on introduction. Atlanta: Georgia Institute of Technology.
Google Scholar
Engel, J., Schps, T., & Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In European conference on computer vision (ECCV) (pp. 834–849).
Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP(99), 1–1.
Google Scholar
Faessler, M., Mueggler, E., Schwabe, K., & Scaramuzza, D. (2014). A monocular pose estimation system based on infrared leds. In IEEE international conference on robotics and automation (pp. 907 – 913).
Fiala, M. (2005). Artag, a fiducial marker system using digital techniques. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 590–596).
Fiala, M. (2010). Designing highly reliable fiducial markers. IEEE Transactions on Pattern Analysis & Machine Intelligence, 32(7), 1317–24.
Article Google Scholar
Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). Svo: Fast semi-direct monocular visual odometry. In IEEE international conference on robotics and automation (pp. 15–22).
Forster, C., Carlone, L., Dellaert, F., & Scaramuzza, D. (2017). On-manifold preintegration for real-time visual-inertial odometry. IEEE Transactions on Robotics, 33(1), 1–21.
Article Google Scholar
Frost, D. P., Khler, O., & Murray, D. W. (2016). Object-aware bundle adjustment for correcting monocular scale drift. In IEEE international conference on robotics and automation (pp. 4770–4776).
Furgale, P., Rehder, J., & Siegwart, R. (2014). Unified temporal and spatial calibration for multi-sensor systems. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1280–1286).
Gálvez-López, D., Salas, M., Tardós, J. D., & Montiel, J. (2016). Real-time monocular object slam. Robotics & Autonomous Systems, 75(PB), 435–449.
Article Google Scholar
Hauke, S. (2012). Local accuracy and global consistency for efficient slam. London: Imperial College London.
Google Scholar
Hauke, S., Montiel, J. M. M., & Davison, A. (2010). Scale drift-aware large scale monocular slam. In Robotics: Science and systems
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. J., & Dellaert, F. (2011). isam2: Incremental smoothing and mapping using the bayes tree. International Journal of Robotics Research, 31(2), 216–235.
Article Google Scholar
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for smallar workspaces. In IEEE and ACM international symposium on mixed and augmented reality (pp. 1–10).
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., & Furgale, P. (2014). Keyframe-based visual-inertial odometry using nonlinear optimization. International Journal of Robotics Research, 34(3), 314–334.
Article Google Scholar
Lim, H., & Lee, Y. S. (2009). Real-time single camera slam using fiducial markers. In Iccas-sice (pp. 177–182).
Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint kalman filter for vision-aided inertial navigation. In IEEE international conference on robotics and automation (pp. 3565–3572).
Mur-Artal, R., Montiel, J. M. M., & Tards, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5), 1147–1163.
Article Google Scholar
Neunert, M., Bloesch, M., & Buchli, J. (2016). An open source, fiducial based, visual-inertial motion capture system. Epigenetics Official Journal of the Dna Methylation Society, 7(7), 710–9.
Google Scholar
Olson, E. (2011). Apriltag: A robust and flexible visual fiducial system. In IEEE international conference on robotics and automation (pp. 3400–3407).
Qiu, K., Zhang, F., & Liu, M. (2015). Visible light communication-based indoor environment modeling and metric-free path planning. In IEEE international conference on automation science and engineering (pp. 200–205).
Sementille, A. C., & Rodello, I. (2004). A motion capture system using passive markers. In Vrcai 2004, ACM siggraph international conference on virtual reality continuum and ITS applications in industry, Nanyang technological university, Singapore (pp. 440–447).
Usenko, V., Engel, J., Stuckler, J., & Cremers, D. (2016). Direct visual-inertial odometry with stereo cameras. In IEEE international conference on robotics and automation (pp. 1885–1892).

Download references

Acknowledgements

Special thanks are given to Zheming Liu and Junlin Song for their help in data collection.

Author information

Authors and Affiliations

Department of Aerospace Engineering, Harbin Institute of Technology, Harbin, HL451, China
Guoping He, Shangkun Zhong & Jifeng Guo

Authors

Guoping He
View author publications
You can also search for this author in PubMed Google Scholar
Shangkun Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Jifeng Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoping He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A. Background

In the following, we wish to derive the partial derivatives of the transformation ${\varvec{\pi }}\left( {\varvec{T}}\cdot {\varvec{l}}\right) $ with respect to the pose ${\varvec{T}}$. According to Hauke (2012), this can be calculated using the smooth path $\varvec{T}\left( t\right) ={\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) $:

$$\begin{aligned} \begin{aligned} \frac{\partial {\varvec{\pi }}\left( {\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}\right) }{\partial {\delta {\varvec{\xi }}}}&=\frac{\partial {\varvec{\pi }}\left( {\varvec{q}}\right) }{\partial {\varvec{q}}}{\Bigg |}_{{\varvec{q}}={\varvec{T}}\cdot {\varvec{l}}}\frac{\partial {\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}}{\partial {\delta {\varvec{\xi }}}}{\Bigg |}_{{\delta {\varvec{\xi }}}={\varvec{0}}}\\&={\varvec{J}}_r{\varvec{T}}\left[ \begin{array}{cc} {\varvec{I}}_{3\times 3} &{} -{\varvec{l}}_{1:3}^\wedge \\ {\varvec{0}}_{3\times 3} &{} {\varvec{0}}_{3\times 3} \end{array}\right] \end{aligned} \end{aligned}$$

(30)

where ${\varvec{q}={\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}}$, ${\delta {\varvec{\xi }}}= \left[ \begin{array}{cc} {\delta {\varvec{\rho }}}&{\delta {\varvec{\phi }}} \end{array}\right] ^T$, ${\delta {\varvec{\xi }}}\in \mathfrak {se}$(3), ${\delta {\varvec{\phi }}\in \mathfrak {so}}$(3) and ${\delta {\varvec{\rho }}} \in \mathbb {R}^3$. ${\varvec{J}}_r$ denotes the Jacobian matrix of the pinhole camera model with respect to the 3-dimension landmark point coordinates expressed in the camera frame. We will also use the exponential map property:

$$\begin{aligned} \mathrm{Exp}\left( -\delta {\varvec{\phi }}\right) ^T=\mathrm{Exp}\left( \delta {\varvec{\phi }}\right) \end{aligned}$$

(31)

1.2 B. Jacobians

This section provides the Jacobians of the tag reprojection error with respect to the moving object pose $\varvec{T}_{WB}$ and the tag pose $\varvec{T}_{WA}$. The tag reprojection error of the $n\mathrm{th}$ corner in the $j\mathrm{th}$ tag at image time $t_i$ is

$$\begin{aligned} \varvec{e}^{i,j,n}_{re} = \varvec{z}^{i,j,n}-{\varvec{\pi }} \left( \varvec{T}_{CB} \left( \varvec{T}_{WB}^i\right) ^{-1} \varvec{T}_{WA_j}\cdot {\varvec{l}}_j^n \right) \end{aligned}$$

(32)

1. Jacobian of the tag reprojection error with respect to the moving object pose $\varvec{T}_{WB}$:

The reprojection error with respect to the rotational increment is:

$$\begin{aligned} \begin{aligned}&\varvec{e}^{i,j,n}_{re}\left( \varvec{R}_{WB}^i\mathrm{Exp}\left( \delta {\varvec{\phi }}_R^i\right) \right) \\&\quad = \varvec{z}^{i,j,n}-\pi \left( \varvec{T}^i_{CB}\left[ \begin{array}{cc} \left( {\varvec{R}}^i_{WB}\mathrm{Exp}\left( \delta {\varvec{\phi }}_R^i\right) \right) ^T &{} -\left( {\varvec{R}}^i_{WB}\mathrm{Exp}\left( \delta {\varvec{\phi }}_R^i\right) \right) ^T_W{\varvec{p}}^i_{WB}\\ {\varvec{0}}_{3\times 3} &{} 1 \end{array} \right] \varvec{T}_{WA_j}\cdot \varvec{l}_j^n \right) \\&\quad = \varvec{z}^{i,j,n}-\pi \left( \varvec{T}^i_{CB}\left[ \begin{array}{c} \mathrm{Exp}\left( -\delta {\varvec{\phi }}_R^i\right) {\varvec{R}}^{i\;T}_{WB}\left( _W {\varvec{l}}_{j\;1:3}^n-_W{\varvec{p}}^i_{WB} \right) \\ 1 \end{array}\right] \right) \end{aligned} \end{aligned}$$

(33)

where $_W{\varvec{l}}^n=\left[ \begin{array}{cc} _W{\varvec{l}}^n_{1:3}&1 \end{array}\right] ^T={\varvec{T}}_{WA_j}\cdot { \varvec{l}_j^n}$. We can get the Jacobian of the tag reprojection error with respect to the moving object orientation using Eqs. (30) and (31):

$$\begin{aligned} \begin{aligned} \frac{\partial \varvec{e}^{i,j,n}_{re}}{\partial {\delta {\varvec{\phi }}_R^i}}&=-{\varvec{J}}_{r\;j,n}{\varvec{T}}_{CB}^i\left[ \begin{array}{c} \left( {\varvec{R}}^{i\;T}_{WB}\left( _W {\varvec{l}}_{j\;1:3}^n-_W{\varvec{p}}^i_{WB} \right) \right) ^\wedge \\ {\varvec{0}}_{1\times 3} \end{array}\right] \end{aligned} \end{aligned}$$

(34)

The reprojection error with respect to the translational increment is:

$$\begin{aligned} \begin{aligned}&\varvec{e}^{i,j,n}_{re}\left( _W{\varvec{p}}^i_{WB}+\delta _W{\varvec{p}}^i_{WB}\right) \\&\quad = \varvec{z}^{i,j,n}-\pi \left( \varvec{T}^i_{CB}\left[ \begin{array}{c} {\varvec{R}}^{i\;T}_{WB}\left( _W {\varvec{l}}_{j\;1:3}^n-_W{\varvec{p}}^i_{WB}-\delta _W{\varvec{p}}^i_{WB} \right) \\ 1 \end{array}\right] \right) \end{aligned} \end{aligned}$$

(35)

and the Jacobian of the tag reprojection error with respect to the moving object translation is:

$$\begin{aligned} \frac{\partial \varvec{e}^{i,j,n}_{re}}{\partial \delta _W{\varvec{p}}^i_{WB}}={\varvec{J}}_{r\;j,n}{\varvec{T}}_{CB}^i\left[ \begin{array}{c} {\varvec{R}}^{i\;T}_{WB}\\ {\varvec{0}}_{1\times 3} \end{array}\right] \end{aligned}$$

(36)

2. Jacobian of the tag reprojection error with respect to the tag pose $\varvec{T}_{WA}$: The reprojection error with respect to the SE(3) increment is:

$$\begin{aligned} \begin{aligned}&\varvec{e}^{i,j,n}_{re}\left( \varvec{T}_{WA_j}\mathrm{Exp}\left( \delta {\varvec{\xi }}_F^j\right) \right) \\&\quad = {\varvec{z}}^{i,j,n}-\pi \left( \varvec{T}_{CB} \left( \varvec{T}_{WB}^i\right) ^{-1} {\varvec{T}}_{WA_j}\mathrm{Exp}\left( \delta {\varvec{\xi }}_F^j\right) \cdot \varvec{l}_j^n \right) \end{aligned} \end{aligned}$$

(37)

so we can get the Jacobian of the tag reprojection error with respect to the tag pose using Eq. (30):

$$\begin{aligned} \begin{aligned} \frac{\partial \varvec{e}^{i,j,n}_{re}}{\partial {\delta {\varvec{\xi }}}}&=-{\varvec{J}}_{r\;j,n}{\varvec{T}}_{CB}^i\left( {\varvec{T}}_{WB}^i\right) ^{-1}{\varvec{T}}_{WA}^j\left[ \begin{array}{cc} {\varvec{I}}_{3\times 3} &{} -{\varvec{l}}_{j\;1:3}^\wedge \\ {\varvec{0}}_{1\times 3} &{} {\varvec{0}}_{1\times 3} \end{array}\right] \end{aligned} \end{aligned}$$

(38)

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, G., Zhong, S. & Guo, J. A lightweight and scalable visual-inertial motion capture system using fiducial markers. Auton Robot 43, 1895–1915 (2019). https://doi.org/10.1007/s10514-019-09834-7

Download citation

Received: 12 June 2017
Accepted: 12 January 2019
Published: 19 February 2019
Issue Date: 15 October 2019
DOI: https://doi.org/10.1007/s10514-019-09834-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lightweight and scalable visual-inertial motion capture system using fiducial markers

Abstract

Access this article

Similar content being viewed by others

A Novel Approach for Robust and Effective Pose Estimation via Visual-Inertial Fusion

Edge alignment-based visual–inertial fusion for tracking of aggressive motions

Markerless 3D Human Pose Tracking in the Wild with Fusion of Multiple Depth Cameras: Comparative Experimental Study with Kinect 2 and 3

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 A. Background

1.2 B. Jacobians

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lightweight and scalable visual-inertial motion capture system using fiducial markers

Abstract

Access this article

Similar content being viewed by others

A Novel Approach for Robust and Effective Pose Estimation via Visual-Inertial Fusion

Edge alignment-based visual–inertial fusion for tracking of aggressive motions

Markerless 3D Human Pose Tracking in the Wild with Fusion of Multiple Depth Cameras: Comparative Experimental Study with Kinect 2 and 3

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 A. Background

1.2 B. Jacobians

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation