Skip to main content
Log in

A lightweight and scalable visual-inertial motion capture system using fiducial markers

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Accurate localization of a moving object is important in many robotic tasks. Often an elaborate motion capture system is used to realize it. While high precision is guaranteed, such a complicated system is costly and limited to specified small size workspace. This paper describes a lightweight and scalable visual-inertial approach, which leverages paper printable, known size and unknown pose, artificial landmarks, as called fiducials, to obtain motion state estimates, including pose and velocity. Visual-inertial joint optimization using incremental smoother over factor graph and the IMU preintegration technique make our method efficient and accurate. No special hardware is required except a monocular camera and an IMU, making our system lightweight and easy to deploy. Using paper printable landmarks, as well as the efficient incremental inference algorithm, renders it nearly constant-time complexity and scalable to large-scale environment. We perform an extensive evaluation of our method on public datasets and real-world experiments. Results show our method achieves accurate state estimates and is scalable to large-scale environment and robust to fast motion and changing light condition. Besides, our method has the ability to recover from intermediate failure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. https://code.google.com/archive/p/cv2cg/.

  2. https://bitbucket.org/gtborg/gtsam/.

  3. https://bitbucket.org/adrlab/rcars/wiki/Home.

References

  • Botterill, T., Mills, S., & Green, R. (2013). Correcting scale drift by object recognition in single-camera slam. IEEE Transactions on Cybernetics, 43(6), 1767–1780.

    Article  Google Scholar 

  • Concha, A., Loianno, G., Kumar, V., & Civera, J. (2016). Visual-inertial direct slam. In IEEE international conference on robotics and automation (pp. 1331–1338).

  • Dellaert, F. (2012). Factor graphs and gtsam: A hands-on introduction. Atlanta: Georgia Institute of Technology.

    Google Scholar 

  • Engel, J., Schps, T., & Cremers, D. (2014). Lsd-slam: Large-scale direct monocular slam. In European conference on computer vision (ECCV) (pp. 834–849).

  • Engel, J., Koltun, V., & Cremers, D. (2017). Direct sparse odometry. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP(99), 1–1.

    Google Scholar 

  • Faessler, M., Mueggler, E., Schwabe, K., & Scaramuzza, D. (2014). A monocular pose estimation system based on infrared leds. In IEEE international conference on robotics and automation (pp. 907 – 913).

  • Fiala, M. (2005). Artag, a fiducial marker system using digital techniques. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 590–596).

  • Fiala, M. (2010). Designing highly reliable fiducial markers. IEEE Transactions on Pattern Analysis & Machine Intelligence, 32(7), 1317–24.

    Article  Google Scholar 

  • Forster, C., Pizzoli, M., & Scaramuzza, D. (2014). Svo: Fast semi-direct monocular visual odometry. In IEEE international conference on robotics and automation (pp. 15–22).

  • Forster, C., Carlone, L., Dellaert, F., & Scaramuzza, D. (2017). On-manifold preintegration for real-time visual-inertial odometry. IEEE Transactions on Robotics, 33(1), 1–21.

    Article  Google Scholar 

  • Frost, D. P., Khler, O., & Murray, D. W. (2016). Object-aware bundle adjustment for correcting monocular scale drift. In IEEE international conference on robotics and automation (pp. 4770–4776).

  • Furgale, P., Rehder, J., & Siegwart, R. (2014). Unified temporal and spatial calibration for multi-sensor systems. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1280–1286).

  • Gálvez-López, D., Salas, M., Tardós, J. D., & Montiel, J. (2016). Real-time monocular object slam. Robotics & Autonomous Systems, 75(PB), 435–449.

    Article  Google Scholar 

  • Hauke, S. (2012). Local accuracy and global consistency for efficient slam. London: Imperial College London.

    Google Scholar 

  • Hauke, S., Montiel, J. M. M., & Davison, A. (2010). Scale drift-aware large scale monocular slam. In Robotics: Science and systems

  • Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. J., & Dellaert, F. (2011). isam2: Incremental smoothing and mapping using the bayes tree. International Journal of Robotics Research, 31(2), 216–235.

    Article  Google Scholar 

  • Klein, G., & Murray, D. (2007). Parallel tracking and mapping for smallar workspaces. In IEEE and ACM international symposium on mixed and augmented reality (pp. 1–10).

  • Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., & Furgale, P. (2014). Keyframe-based visual-inertial odometry using nonlinear optimization. International Journal of Robotics Research, 34(3), 314–334.

    Article  Google Scholar 

  • Lim, H., & Lee, Y. S. (2009). Real-time single camera slam using fiducial markers. In Iccas-sice (pp. 177–182).

  • Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint kalman filter for vision-aided inertial navigation. In IEEE international conference on robotics and automation (pp. 3565–3572).

  • Mur-Artal, R., Montiel, J. M. M., & Tards, J. D. (2015). Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5), 1147–1163.

    Article  Google Scholar 

  • Neunert, M., Bloesch, M., & Buchli, J. (2016). An open source, fiducial based, visual-inertial motion capture system. Epigenetics Official Journal of the Dna Methylation Society, 7(7), 710–9.

    Google Scholar 

  • Olson, E. (2011). Apriltag: A robust and flexible visual fiducial system. In IEEE international conference on robotics and automation (pp. 3400–3407).

  • Qiu, K., Zhang, F., & Liu, M. (2015). Visible light communication-based indoor environment modeling and metric-free path planning. In IEEE international conference on automation science and engineering (pp. 200–205).

  • Sementille, A. C., & Rodello, I. (2004). A motion capture system using passive markers. In Vrcai 2004, ACM siggraph international conference on virtual reality continuum and ITS applications in industry, Nanyang technological university, Singapore (pp. 440–447).

  • Usenko, V., Engel, J., Stuckler, J., & Cremers, D. (2016). Direct visual-inertial odometry with stereo cameras. In IEEE international conference on robotics and automation (pp. 1885–1892).

Download references

Acknowledgements

Special thanks are given to Zheming Liu and Junlin Song for their help in data collection.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoping He.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A. Background

In the following, we wish to derive the partial derivatives of the transformation \({\varvec{\pi }}\left( {\varvec{T}}\cdot {\varvec{l}}\right) \) with respect to the pose \({\varvec{T}}\). According to Hauke (2012), this can be calculated using the smooth path \(\varvec{T}\left( t\right) ={\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \):

$$\begin{aligned} \begin{aligned} \frac{\partial {\varvec{\pi }}\left( {\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}\right) }{\partial {\delta {\varvec{\xi }}}}&=\frac{\partial {\varvec{\pi }}\left( {\varvec{q}}\right) }{\partial {\varvec{q}}}{\Bigg |}_{{\varvec{q}}={\varvec{T}}\cdot {\varvec{l}}}\frac{\partial {\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}}{\partial {\delta {\varvec{\xi }}}}{\Bigg |}_{{\delta {\varvec{\xi }}}={\varvec{0}}}\\&={\varvec{J}}_r{\varvec{T}}\left[ \begin{array}{cc} {\varvec{I}}_{3\times 3} &{} -{\varvec{l}}_{1:3}^\wedge \\ {\varvec{0}}_{3\times 3} &{} {\varvec{0}}_{3\times 3} \end{array}\right] \end{aligned} \end{aligned}$$
(30)

where \({\varvec{q}={\varvec{T}}\mathrm{Exp}\left( {\delta {\varvec{\xi }}}\right) \cdot {\varvec{l}}}\), \({\delta {\varvec{\xi }}}= \left[ \begin{array}{cc} {\delta {\varvec{\rho }}}&{\delta {\varvec{\phi }}} \end{array}\right] ^T\), \({\delta {\varvec{\xi }}}\in \mathfrak {se}\)(3), \({\delta {\varvec{\phi }}\in \mathfrak {so}}\)(3) and \({\delta {\varvec{\rho }}} \in \mathbb {R}^3\). \({\varvec{J}}_r\) denotes the Jacobian matrix of the pinhole camera model with respect to the 3-dimension landmark point coordinates expressed in the camera frame. We will also use the exponential map property:

$$\begin{aligned} \mathrm{Exp}\left( -\delta {\varvec{\phi }}\right) ^T=\mathrm{Exp}\left( \delta {\varvec{\phi }}\right) \end{aligned}$$
(31)

1.2 B. Jacobians

This section provides the Jacobians of the tag reprojection error with respect to the moving object pose \(\varvec{T}_{WB}\) and the tag pose \(\varvec{T}_{WA}\). The tag reprojection error of the \(n\mathrm{th}\) corner in the \(j\mathrm{th}\) tag at image time \(t_i\) is

$$\begin{aligned} \varvec{e}^{i,j,n}_{re} = \varvec{z}^{i,j,n}-{\varvec{\pi }} \left( \varvec{T}_{CB} \left( \varvec{T}_{WB}^i\right) ^{-1} \varvec{T}_{WA_j}\cdot {\varvec{l}}_j^n \right) \end{aligned}$$
(32)

1. Jacobian of the tag reprojection error with respect to the moving object pose \(\varvec{T}_{WB}\):

The reprojection error with respect to the rotational increment is:

$$\begin{aligned} \begin{aligned}&\varvec{e}^{i,j,n}_{re}\left( \varvec{R}_{WB}^i\mathrm{Exp}\left( \delta {\varvec{\phi }}_R^i\right) \right) \\&\quad = \varvec{z}^{i,j,n}-\pi \left( \varvec{T}^i_{CB}\left[ \begin{array}{cc} \left( {\varvec{R}}^i_{WB}\mathrm{Exp}\left( \delta {\varvec{\phi }}_R^i\right) \right) ^T &{} -\left( {\varvec{R}}^i_{WB}\mathrm{Exp}\left( \delta {\varvec{\phi }}_R^i\right) \right) ^T_W{\varvec{p}}^i_{WB}\\ {\varvec{0}}_{3\times 3} &{} 1 \end{array} \right] \varvec{T}_{WA_j}\cdot \varvec{l}_j^n \right) \\&\quad = \varvec{z}^{i,j,n}-\pi \left( \varvec{T}^i_{CB}\left[ \begin{array}{c} \mathrm{Exp}\left( -\delta {\varvec{\phi }}_R^i\right) {\varvec{R}}^{i\;T}_{WB}\left( _W {\varvec{l}}_{j\;1:3}^n-_W{\varvec{p}}^i_{WB} \right) \\ 1 \end{array}\right] \right) \end{aligned} \end{aligned}$$
(33)

where \(_W{\varvec{l}}^n=\left[ \begin{array}{cc} _W{\varvec{l}}^n_{1:3}&1 \end{array}\right] ^T={\varvec{T}}_{WA_j}\cdot { \varvec{l}_j^n}\). We can get the Jacobian of the tag reprojection error with respect to the moving object orientation using Eqs. (30) and (31):

$$\begin{aligned} \begin{aligned} \frac{\partial \varvec{e}^{i,j,n}_{re}}{\partial {\delta {\varvec{\phi }}_R^i}}&=-{\varvec{J}}_{r\;j,n}{\varvec{T}}_{CB}^i\left[ \begin{array}{c} \left( {\varvec{R}}^{i\;T}_{WB}\left( _W {\varvec{l}}_{j\;1:3}^n-_W{\varvec{p}}^i_{WB} \right) \right) ^\wedge \\ {\varvec{0}}_{1\times 3} \end{array}\right] \end{aligned} \end{aligned}$$
(34)

The reprojection error with respect to the translational increment is:

$$\begin{aligned} \begin{aligned}&\varvec{e}^{i,j,n}_{re}\left( _W{\varvec{p}}^i_{WB}+\delta _W{\varvec{p}}^i_{WB}\right) \\&\quad = \varvec{z}^{i,j,n}-\pi \left( \varvec{T}^i_{CB}\left[ \begin{array}{c} {\varvec{R}}^{i\;T}_{WB}\left( _W {\varvec{l}}_{j\;1:3}^n-_W{\varvec{p}}^i_{WB}-\delta _W{\varvec{p}}^i_{WB} \right) \\ 1 \end{array}\right] \right) \end{aligned} \end{aligned}$$
(35)

and the Jacobian of the tag reprojection error with respect to the moving object translation is:

$$\begin{aligned} \frac{\partial \varvec{e}^{i,j,n}_{re}}{\partial \delta _W{\varvec{p}}^i_{WB}}={\varvec{J}}_{r\;j,n}{\varvec{T}}_{CB}^i\left[ \begin{array}{c} {\varvec{R}}^{i\;T}_{WB}\\ {\varvec{0}}_{1\times 3} \end{array}\right] \end{aligned}$$
(36)

2. Jacobian of the tag reprojection error with respect to the tag pose \(\varvec{T}_{WA}\): The reprojection error with respect to the SE(3) increment is:

$$\begin{aligned} \begin{aligned}&\varvec{e}^{i,j,n}_{re}\left( \varvec{T}_{WA_j}\mathrm{Exp}\left( \delta {\varvec{\xi }}_F^j\right) \right) \\&\quad = {\varvec{z}}^{i,j,n}-\pi \left( \varvec{T}_{CB} \left( \varvec{T}_{WB}^i\right) ^{-1} {\varvec{T}}_{WA_j}\mathrm{Exp}\left( \delta {\varvec{\xi }}_F^j\right) \cdot \varvec{l}_j^n \right) \end{aligned} \end{aligned}$$
(37)

so we can get the Jacobian of the tag reprojection error with respect to the tag pose using Eq. (30):

$$\begin{aligned} \begin{aligned} \frac{\partial \varvec{e}^{i,j,n}_{re}}{\partial {\delta {\varvec{\xi }}}}&=-{\varvec{J}}_{r\;j,n}{\varvec{T}}_{CB}^i\left( {\varvec{T}}_{WB}^i\right) ^{-1}{\varvec{T}}_{WA}^j\left[ \begin{array}{cc} {\varvec{I}}_{3\times 3} &{} -{\varvec{l}}_{j\;1:3}^\wedge \\ {\varvec{0}}_{1\times 3} &{} {\varvec{0}}_{1\times 3} \end{array}\right] \end{aligned} \end{aligned}$$
(38)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, G., Zhong, S. & Guo, J. A lightweight and scalable visual-inertial motion capture system using fiducial markers. Auton Robot 43, 1895–1915 (2019). https://doi.org/10.1007/s10514-019-09834-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-019-09834-7

Keywords

Navigation