Accurate 3D motion tracking by combining image alignment and feature matching

Chen, Shu; Liang, Luming; Ouyang, Jianquan; Yuan, Yuan

doi:10.1007/s11042-020-08966-8

Accurate 3D motion tracking by combining image alignment and feature matching

Published: 05 May 2020

Volume 79, pages 21325–21343, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shu Chen^1,2,
Luming Liang³,
Jianquan Ouyang^1,2 &
…
Yuan Yuan^1,2

238 Accesses
4 Citations
Explore all metrics

Abstract

We presents a novel method to improve the accuracy of 3D motion tacking. In contrast to the state-of-the-art tracking approaches, where the 3D structure of target is commonly approximated by a CAD model, the proposed method establishes the target model by an online improved Structure-from-Motion technique. Furthermore, the tracking is implemented by three sequential trackers (feature-based tracker, image-alignment-based tracker and Particle Filter), which continually refine the tracking results. This coarse-to-fine method increases the accuracy of tracking. Moreover, our approach uses keyframe strategy to prevent tracking drift, the new keyframe insertion is determined by a criterion which can ensure a correct update. Thorough evaluations are performed on two public databases, the Biwi Head Pose dataset and the UPNA Head Pose Database. Comparisons illustrate that the proposed method achieves better performance with respect to other state-of-the-art tracking approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

Fig. 6

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

A review of computer vision-based approaches for physical rehabilitation and assessment

Article Open access 19 June 2021

References

Alvarez L, Weickert J, Sanchez J (2000) Reliable estimation of dense optical flow fields with large displacements. Int J Comput Vis 39(1):41–56
Article MATH Google Scholar
Ariz M, Bengoechea JJ, Villanueva A, Cabeza R (2016) A novel 2D/3D database with automatic face annotation for head tracking and pose estimation. Comput Vis Image Underst 148(3):201–210
Article Google Scholar
Arqub OA, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415
Article MathSciNet MATH Google Scholar
Arqub OA (2017) Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm-Volterra integrodifferential equations. Neural Comput Appl 28:1591–1610
Article Google Scholar
Arqub OA, AL-Smadi M, Momani S, Hayat T (2016) Numerical solutions of fuzzy differential equations using reproducing kernel Hilbert space method. Soft Comput 20:3283–3302
Article MATH Google Scholar
Baltzakis H, Pateraki M, Trahanias P (2012) Visual tracking of hands, faces and facial features. Mach Vis Appl 23(6):1141–1157
Article Google Scholar
Bregler C, Malik J, Pullen K (2004) Twist based acquisition and tracking of animal and human kinematics. Int J Comput Vis 56(3):179–194
Article Google Scholar
Brox T, Rosenhahn B, Gall J (2010) Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans Pattern Anal Mach Intell 32 (3):402–415
Article Google Scholar
Cagniart C, Boyer E, Ilic S (2010) Free-form mesh tracking: a patch-based approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1339–1346
Cai Y, Ge L, Cai J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular rgb images. In: European Conference on Computer Vision, pp 678–694
Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans Vis Comput Graph 20 (3):413–425
Article Google Scholar
Chen S, Liang W, Wu L (2013) Recovering upper-body motion using a reinitialization particle filter. J Electron Imaging 22(3):033005
Chen S, Liang L, Liang W, Foroosh H (2016) 3D pose tracking with multi-template warping and SIFT correspondences. IEEE Trans Circ Syst Video Technol 26(1):2043–2055
Article Google Scholar
Concha A, Civera J (2014) Using superpixels in monocular SLAM. In: Proceedings of International Conference on Robotics and Automation, pp 365–372
Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pat Anal Mach Intel 23(6):681–684
Article Google Scholar
DeMenthon DF, Davis LS (1995) Model-based object pose in 25 lines of code. Int J Comput Vis 15(1):123–141
Article Google Scholar
Fanelli G, Dantone M, Gall J, Fossati A, Gool LV (2013) Random forests for real time 3D face analysis. Int J Comput Vis 101(3):437–458
Article Google Scholar
Gibson S, Cook J, Howard T, Hubbold R, Oram D (2002) Accurate camera calibration for off-line, video-based augmented reality. In: IEEE and ACM International Symposium on Mixed and Augmented Reality, pp 37–46
Han S, Liu B, Wang R, Ye Y, Twigg CD, Kin K (2018) Online optical marker-based hand tracking with deep labels. ACM Trans Graph 37(4):1:1–1:10
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd ed. Cambridge University Press
Hu H, Cai Q, Wang D, Lin J, Sun M, Krahenbuhl P, Darrell T, Yu F (2019) Joint monocular 3D vehicle detection and tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp 5389–5398
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 28D:35–45
Article MathSciNet Google Scholar
Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 7122–7131
Kim J, Liu C, Sha F, Grauman K (2013) Deformable spatial pyramid matching for fast dense correspondences. In: Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp 2307–2314
Li T, Bolkart T, Black MJ, Li H, Romero J (2017) Learning a model of facial shape and expression from 4d scans. ACM Trans Graph 36(6):194:1–194:17
Li P, Qin T, Shen S (2018) Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In: European Conference on Computer Vision, pp 664–679
Lou J, Tan T, Hu W, Yang H, Maybank SJ (2012) 3-D model-based vehicle tracking. IEEE Trans Image Process 14(10):1561–1569
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110
Article Google Scholar
Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164
Article Google Scholar
Morel J, Yu G (2009) ASIFT: A new framework for fully affine invariant image comparison. SIAM J Imag Sci 2(2):438–469
Article MathSciNet MATH Google Scholar
Morency LP, Whitehill J, Movellan J (2008) Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp 1–8
Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Article Google Scholar
Nister D (2004) An efficient solution to the five-point relative pose problem. IEEE Trans Pattern Anal Mach Intell 26(6):756–777
Article Google Scholar
Opromolla R, Fasano G, Rufino G, Grassi M (2017) Pose estimation for spacecraft relative navigation using Model-Based algorithms. IEEE Trans Aerosp Electron Syst 53(1):431–447
Article Google Scholar
Orozco JGJ, Rudovic O, Pantic M (2013) Hierarchical on-line appearance-based tracking for 3D head pose, eyebrows, lips, eyelids and irises. Image and Vis Comput 31 (4):322–340
Article Google Scholar
Pauwelsm K, Rubio L, Diaz J (2013) Real-time model based rigid object pose estimation and tracking combining dense and sparse visual cues. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 2347–2354
Pham HX, Chen C, Dao LN, Pavlovic V, Cai J, Cham T (2015) Robust performance-driven 3D face tracking in long range depth scenes. arXiv
Ranjan A, Bolkart T, Sanyal S, Black MJ (2018) Generating 3d faces using convolutional mesh autoencoders. In: European Conference on Computer Vision, pp 725–741
Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph 36(6):245:1–245:17
Scheidegger S, Benjaminsson J, Rosenberg E, Krishnan A, Granstrom K (2018) Mono-camera 3d multi-object tracking using deep learning detections and PMBM filtering. In: IEEE Intelligent Vehicles Symposium, pp 433–440
Vacchetti L, Lepetit V, Fua P (2004) Stable real-time 3D tracking using online and offline information. IEEE Trans Pattern Anal Mach Intell 26(10):1385–1391
Article Google Scholar
Wan C, Probst T, Gool LV, Yao A (2019) Self-supervised 3D hand pose estimation through training by fitting. In: Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp 1339–1346
Wang Y, Liu Y, Tong X, Dai Q, Tan P (2018) Outdoor markerless motion capture with sparse handheld video cameras. IEEE Trans Vis Comput Graph 24(5):1856–1866
Article Google Scholar
Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: Large displacement optical flow with deep matching. In: Proceedings of IEEE International Conference on Computer Vision, pp 1385–1392
Xiang D, Joo H, Sheikh Y (2019) Monocular total capture: posing face, body, and hands in the wild. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 10957–10966
Xu W, Chatterjee A, Zollhoefer M, Rhodin H, Mehta D, Seidel HP, Theobalt C (2018) Monoperfcap: Human performance capture from monocular video. ACM Trans Graph 1(1):1:1–1:16
Ye Z, Ye H (2020) Particle filter algorithm based spatial motion tracking of football landing location. Multimed Tools Appl 79:5053–5063
Article Google Scholar
Zhang G, Qin X, Hua W, Wong TT, Heng PA, Bao H (2007) Robust metric reconstruction from challenging video sequences. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 1–8

Download references

Acknowledgments

This research is supported in part by the Natural Science Foundation of Hunan Province (No. 2017JJ2252).

Author information

Authors and Affiliations

School of Computer Science, Xiangtan University, Xiangtan, 411105, People’s Republic of China
Shu Chen, Jianquan Ouyang & Yuan Yuan
Key Laboratory of Intelligent Computing and Information Processing, Ministry of Education, Xiangtan, 411105, People’s Republic of China
Shu Chen, Jianquan Ouyang & Yuan Yuan
Applied Science Group, Microsoft, Redmond, WA, 98052, USA
Luming Liang

Authors

Shu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Luming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jianquan Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luming Liang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., Liang, L., Ouyang, J. et al. Accurate 3D motion tracking by combining image alignment and feature matching. Multimed Tools Appl 79, 21325–21343 (2020). https://doi.org/10.1007/s11042-020-08966-8

Download citation

Received: 07 July 2019
Revised: 15 April 2020
Accepted: 22 April 2020
Published: 05 May 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11042-020-08966-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate 3D motion tracking by combining image alignment and feature matching

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate 3D motion tracking by combining image alignment and feature matching

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

A review of computer vision-based approaches for physical rehabilitation and assessment

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation