Abstract
We presents a novel method to improve the accuracy of 3D motion tacking. In contrast to the state-of-the-art tracking approaches, where the 3D structure of target is commonly approximated by a CAD model, the proposed method establishes the target model by an online improved Structure-from-Motion technique. Furthermore, the tracking is implemented by three sequential trackers (feature-based tracker, image-alignment-based tracker and Particle Filter), which continually refine the tracking results. This coarse-to-fine method increases the accuracy of tracking. Moreover, our approach uses keyframe strategy to prevent tracking drift, the new keyframe insertion is determined by a criterion which can ensure a correct update. Thorough evaluations are performed on two public databases, the Biwi Head Pose dataset and the UPNA Head Pose Database. Comparisons illustrate that the proposed method achieves better performance with respect to other state-of-the-art tracking approaches.
Similar content being viewed by others
References
Alvarez L, Weickert J, Sanchez J (2000) Reliable estimation of dense optical flow fields with large displacements. Int J Comput Vis 39(1):41–56
Ariz M, Bengoechea JJ, Villanueva A, Cabeza R (2016) A novel 2D/3D database with automatic face annotation for head tracking and pose estimation. Comput Vis Image Underst 148(3):201–210
Arqub OA, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415
Arqub OA (2017) Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm-Volterra integrodifferential equations. Neural Comput Appl 28:1591–1610
Arqub OA, AL-Smadi M, Momani S, Hayat T (2016) Numerical solutions of fuzzy differential equations using reproducing kernel Hilbert space method. Soft Comput 20:3283–3302
Baltzakis H, Pateraki M, Trahanias P (2012) Visual tracking of hands, faces and facial features. Mach Vis Appl 23(6):1141–1157
Bregler C, Malik J, Pullen K (2004) Twist based acquisition and tracking of animal and human kinematics. Int J Comput Vis 56(3):179–194
Brox T, Rosenhahn B, Gall J (2010) Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans Pattern Anal Mach Intell 32 (3):402–415
Cagniart C, Boyer E, Ilic S (2010) Free-form mesh tracking: a patch-based approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1339–1346
Cai Y, Ge L, Cai J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular rgb images. In: European Conference on Computer Vision, pp 678–694
Cao C, Weng Y, Zhou S, Tong Y, Zhou K (2014) Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans Vis Comput Graph 20 (3):413–425
Chen S, Liang W, Wu L (2013) Recovering upper-body motion using a reinitialization particle filter. J Electron Imaging 22(3):033005
Chen S, Liang L, Liang W, Foroosh H (2016) 3D pose tracking with multi-template warping and SIFT correspondences. IEEE Trans Circ Syst Video Technol 26(1):2043–2055
Concha A, Civera J (2014) Using superpixels in monocular SLAM. In: Proceedings of International Conference on Robotics and Automation, pp 365–372
Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pat Anal Mach Intel 23(6):681–684
DeMenthon DF, Davis LS (1995) Model-based object pose in 25 lines of code. Int J Comput Vis 15(1):123–141
Fanelli G, Dantone M, Gall J, Fossati A, Gool LV (2013) Random forests for real time 3D face analysis. Int J Comput Vis 101(3):437–458
Gibson S, Cook J, Howard T, Hubbold R, Oram D (2002) Accurate camera calibration for off-line, video-based augmented reality. In: IEEE and ACM International Symposium on Mixed and Augmented Reality, pp 37–46
Han S, Liu B, Wang R, Ye Y, Twigg CD, Kin K (2018) Online optical marker-based hand tracking with deep labels. ACM Trans Graph 37(4):1:1–1:10
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd ed. Cambridge University Press
Hu H, Cai Q, Wang D, Lin J, Sun M, Krahenbuhl P, Darrell T, Yu F (2019) Joint monocular 3D vehicle detection and tracking. In: Proceedings of IEEE International Conference on Computer Vision, pp 5389–5398
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 28D:35–45
Kanazawa A, Black MJ, Jacobs DW, Malik J (2018) End-to-end recovery of human shape and pose. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 7122–7131
Kim J, Liu C, Sha F, Grauman K (2013) Deformable spatial pyramid matching for fast dense correspondences. In: Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp 2307–2314
Li T, Bolkart T, Black MJ, Li H, Romero J (2017) Learning a model of facial shape and expression from 4d scans. ACM Trans Graph 36(6):194:1–194:17
Li P, Qin T, Shen S (2018) Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In: European Conference on Computer Vision, pp 664–679
Lou J, Tan T, Hu W, Yang H, Maybank SJ (2012) 3-D model-based vehicle tracking. IEEE Trans Image Process 14(10):1561–1569
Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110
Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164
Morel J, Yu G (2009) ASIFT: A new framework for fully affine invariant image comparison. SIAM J Imag Sci 2(2):438–469
Morency LP, Whitehill J, Movellan J (2008) Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp 1–8
Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Nister D (2004) An efficient solution to the five-point relative pose problem. IEEE Trans Pattern Anal Mach Intell 26(6):756–777
Opromolla R, Fasano G, Rufino G, Grassi M (2017) Pose estimation for spacecraft relative navigation using Model-Based algorithms. IEEE Trans Aerosp Electron Syst 53(1):431–447
Orozco JGJ, Rudovic O, Pantic M (2013) Hierarchical on-line appearance-based tracking for 3D head pose, eyebrows, lips, eyelids and irises. Image and Vis Comput 31 (4):322–340
Pauwelsm K, Rubio L, Diaz J (2013) Real-time model based rigid object pose estimation and tracking combining dense and sparse visual cues. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 2347–2354
Pham HX, Chen C, Dao LN, Pavlovic V, Cai J, Cham T (2015) Robust performance-driven 3D face tracking in long range depth scenes. arXiv
Ranjan A, Bolkart T, Sanyal S, Black MJ (2018) Generating 3d faces using convolutional mesh autoencoders. In: European Conference on Computer Vision, pp 725–741
Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph 36(6):245:1–245:17
Scheidegger S, Benjaminsson J, Rosenberg E, Krishnan A, Granstrom K (2018) Mono-camera 3d multi-object tracking using deep learning detections and PMBM filtering. In: IEEE Intelligent Vehicles Symposium, pp 433–440
Vacchetti L, Lepetit V, Fua P (2004) Stable real-time 3D tracking using online and offline information. IEEE Trans Pattern Anal Mach Intell 26(10):1385–1391
Wan C, Probst T, Gool LV, Yao A (2019) Self-supervised 3D hand pose estimation through training by fitting. In: Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp 1339–1346
Wang Y, Liu Y, Tong X, Dai Q, Tan P (2018) Outdoor markerless motion capture with sparse handheld video cameras. IEEE Trans Vis Comput Graph 24(5):1856–1866
Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: Large displacement optical flow with deep matching. In: Proceedings of IEEE International Conference on Computer Vision, pp 1385–1392
Xiang D, Joo H, Sheikh Y (2019) Monocular total capture: posing face, body, and hands in the wild. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 10957–10966
Xu W, Chatterjee A, Zollhoefer M, Rhodin H, Mehta D, Seidel HP, Theobalt C (2018) Monoperfcap: Human performance capture from monocular video. ACM Trans Graph 1(1):1:1–1:16
Ye Z, Ye H (2020) Particle filter algorithm based spatial motion tracking of football landing location. Multimed Tools Appl 79:5053–5063
Zhang G, Qin X, Hua W, Wong TT, Heng PA, Bao H (2007) Robust metric reconstruction from challenging video sequences. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp 1–8
Acknowledgments
This research is supported in part by the Natural Science Foundation of Hunan Province (No. 2017JJ2252).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, S., Liang, L., Ouyang, J. et al. Accurate 3D motion tracking by combining image alignment and feature matching. Multimed Tools Appl 79, 21325–21343 (2020). https://doi.org/10.1007/s11042-020-08966-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08966-8