Skip to main content
Log in

Accurate 3D motion tracking by combining image alignment and feature matching

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We presents a novel method to improve the accuracy of 3D motion tacking. In contrast to the state-of-the-art tracking approaches, where the 3D structure of target is commonly approximated by a CAD model, the proposed method establishes the target model by an online improved Structure-from-Motion technique. Furthermore, the tracking is implemented by three sequential trackers (feature-based tracker, image-alignment-based tracker and Particle Filter), which continually refine the tracking results. This coarse-to-fine method increases the accuracy of tracking. Moreover, our approach uses keyframe strategy to prevent tracking drift, the new keyframe insertion is determined by a criterion which can ensure a correct update. Thorough evaluations are performed on two public databases, the Biwi Head Pose dataset and the UPNA Head Pose Database. Comparisons illustrate that the proposed method achieves better performance with respect to other state-of-the-art tracking approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alvarez L, Weickert J, Sanchez J (2000) Reliable estimation of dense optical flow fields with large displacements. Int J Comput Vis 39(1):41–56

    Article  MATH  Google Scholar 

  2. Ariz M, Bengoechea JJ, Villanueva A, Cabeza R (2016) A novel 2D/3D database with automatic face annotation for head tracking and pose estimation. Comput Vis Image Underst 148(3):201–210

    Article  Google Scholar 

  3. Arqub OA, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415

    Article  MathSciNet  MATH  Google Scholar 

  4. Arqub OA (2017) Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm-Volterra integrodifferential equations. Neural Comput Appl 28:1591–1610

    Article  Google Scholar 

  5. Arqub OA, AL-Smadi M, Momani S, Hayat T (2016) Numerical solutions of fuzzy differential equations using reproducing kernel Hilbert space method. Soft Comput 20:3283–3302

    Article  MATH  Google Scholar 

  6. Baltzakis H, Pateraki M, Trahanias P (2012) Visual tracking of hands, faces and facial features. Mach Vis Appl 23(6):1141–1157

    Article  Google Scholar 

  7. Bregler C, Malik J, Pullen K (2004) Twist based acquisition and tracking of animal and human kinematics. Int J Comput Vis 56(3):179–194

    Article  Google Scholar 

  8. Brox T, Rosenhahn B, Gall J (2010) Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans Pattern Anal Mach Intell 32 (3):402–415

    Article  Google Scholar 

  9. Cagniart C, Boyer E, Ilic S (2010) Free-form mesh tracking: a patch-based approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1339–1346

  10. Cai Y, Ge L, Cai J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular rgb images. In: European Conference on Computer Vision, pp 678–694

  11. Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans Vis Comput Graph 20 (3):413–425

    Article  Google Scholar 

  12. Chen S, Liang W, Wu L (2013) Recovering upper-body motion using a reinitialization particle filter. J Electron Imaging 22(3):033005

  13. Chen S, Liang L, Liang W, Foroosh H (2016) 3D pose tracking with multi-template warping and SIFT correspondences. IEEE Trans Circ Syst Video Technol 26(1):2043–2055

    Article  Google Scholar 

  14. Concha A, Civera J (2014) Using superpixels in monocular SLAM. In: Proceedings of International Conference on Robotics and Automation, pp 365–372

  15. Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pat Anal Mach Intel 23(6):681–684

    Article  Google Scholar 

  16. DeMenthon DF, Davis LS (1995) Model-based object pose in 25 lines of code. Int J Comput Vis 15(1):123–141

    Article  Google Scholar 

  17. Fanelli G, Dantone M, Gall J, Fossati A, Gool LV (2013) Random forests for real time 3D face analysis. Int J Comput Vis 101(3):437–458

    Article  Google Scholar 

  18. Gibson S, Cook J, Howard T, Hubbold R, Oram D (2002) Accurate camera calibration for off-line, video-based augmented reality. In: IEEE and ACM International Symposium on Mixed and Augmented Reality, pp 37–46

  19. Han S, Liu B, Wang R, Ye Y, Twigg CD, Kin K (2018) Online optical marker-based hand tracking with deep labels. ACM Trans Graph 37(4):1:1–1:10

  20. Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd ed. Cambridge University Press

  21. Hu H, Cai Q, Wang D, Lin J, Sun M, Krahenbuhl P, Darrell T, Yu F (2019) Joint monocular 3D vehicle detection and tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp 5389–5398

  22. Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 28D:35–45

    Article  MathSciNet  Google Scholar 

  23. Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 7122–7131

  24. Kim J, Liu C, Sha F, Grauman K (2013) Deformable spatial pyramid matching for fast dense correspondences. In: Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp 2307–2314

  25. Li T, Bolkart T, Black MJ, Li H, Romero J (2017) Learning a model of facial shape and expression from 4d scans. ACM Trans Graph 36(6):194:1–194:17

  26. Li P, Qin T, Shen S (2018) Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In: European Conference on Computer Vision, pp 664–679

  27. Lou J, Tan T, Hu W, Yang H, Maybank SJ (2012) 3-D model-based vehicle tracking. IEEE Trans Image Process 14(10):1561–1569

    Google Scholar 

  28. Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  29. Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164

    Article  Google Scholar 

  30. Morel J, Yu G (2009) ASIFT: A new framework for fully affine invariant image comparison. SIAM J Imag Sci 2(2):438–469

    Article  MathSciNet  MATH  Google Scholar 

  31. Morency LP, Whitehill J, Movellan J (2008) Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp 1–8

  32. Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163

    Article  Google Scholar 

  33. Nister D (2004) An efficient solution to the five-point relative pose problem. IEEE Trans Pattern Anal Mach Intell 26(6):756–777

    Article  Google Scholar 

  34. Opromolla R, Fasano G, Rufino G, Grassi M (2017) Pose estimation for spacecraft relative navigation using Model-Based algorithms. IEEE Trans Aerosp Electron Syst 53(1):431–447

    Article  Google Scholar 

  35. Orozco JGJ, Rudovic O, Pantic M (2013) Hierarchical on-line appearance-based tracking for 3D head pose, eyebrows, lips, eyelids and irises. Image and Vis Comput 31 (4):322–340

    Article  Google Scholar 

  36. Pauwelsm K, Rubio L, Diaz J (2013) Real-time model based rigid object pose estimation and tracking combining dense and sparse visual cues. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 2347–2354

  37. Pham HX, Chen C, Dao LN, Pavlovic V, Cai J, Cham T (2015) Robust performance-driven 3D face tracking in long range depth scenes. arXiv

  38. Ranjan A, Bolkart T, Sanyal S, Black MJ (2018) Generating 3d faces using convolutional mesh autoencoders. In: European Conference on Computer Vision, pp 725–741

  39. Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph 36(6):245:1–245:17

  40. Scheidegger S, Benjaminsson J, Rosenberg E, Krishnan A, Granstrom K (2018) Mono-camera 3d multi-object tracking using deep learning detections and PMBM filtering. In: IEEE Intelligent Vehicles Symposium, pp 433–440

  41. Vacchetti L, Lepetit V, Fua P (2004) Stable real-time 3D tracking using online and offline information. IEEE Trans Pattern Anal Mach Intell 26(10):1385–1391

    Article  Google Scholar 

  42. Wan C, Probst T, Gool LV, Yao A (2019) Self-supervised 3D hand pose estimation through training by fitting. In: Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp 1339–1346

  43. Wang Y, Liu Y, Tong X, Dai Q, Tan P (2018) Outdoor markerless motion capture with sparse handheld video cameras. IEEE Trans Vis Comput Graph 24(5):1856–1866

    Article  Google Scholar 

  44. Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: Large displacement optical flow with deep matching. In: Proceedings of IEEE International Conference on Computer Vision, pp 1385–1392

  45. Xiang D, Joo H, Sheikh Y (2019) Monocular total capture: posing face, body, and hands in the wild. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 10957–10966

  46. Xu W, Chatterjee A, Zollhoefer M, Rhodin H, Mehta D, Seidel HP, Theobalt C (2018) Monoperfcap: Human performance capture from monocular video. ACM Trans Graph 1(1):1:1–1:16

  47. Ye Z, Ye H (2020) Particle filter algorithm based spatial motion tracking of football landing location. Multimed Tools Appl 79:5053–5063

    Article  Google Scholar 

  48. Zhang G, Qin X, Hua W, Wong TT, Heng PA, Bao H (2007) Robust metric reconstruction from challenging video sequences. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 1–8

Download references

Acknowledgments

This research is supported in part by the Natural Science Foundation of Hunan Province (No. 2017JJ2252).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luming Liang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Liang, L., Ouyang, J. et al. Accurate 3D motion tracking by combining image alignment and feature matching. Multimed Tools Appl 79, 21325–21343 (2020). https://doi.org/10.1007/s11042-020-08966-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08966-8

Keywords

Navigation