Abstract
Head pose estimation plays an essential role in many high-level face analysis tasks. However, accurate and robust pose estimation with existing approaches remains challenging. In this paper, we propose a novel method for accurate three-dimensional (3D) head pose estimation with noisy depth maps and high-resolution color images that are typically produced by popular RGBD cameras such as the Microsoft Kinect. Our method combines the advantages of the high-resolution RGB image with the 3D information of the depth image. For better accuracy and robustness, features are first detected using only the color image, and then the 3D feature points used for matching are obtained by combining depth information. The outliers are then filtered with depth information using rules proposed for depth consistency, normal consistency, and re-projection consistency, which effectively eliminate the influence of depth noise. The pose parameters are then iteratively optimized using the Extended LM (Levenberg-Marquardt) method. Finally, a Kalman filter is used to smooth the parameters. To evaluate our method, we built a database of more than 10K RGBD images with ground-truth poses recorded using motion capture. Both qualitative and quantitative evaluations show that our method produces notably smaller errors than previous methods.
Similar content being viewed by others
References
Baltrusaitis T, Robinson P, Morency L (2012) 3D constrained local model for rigid and non-rigid facial tracking. Comput Vis Pattern Recognit 157(10):2610–2617
Breitenstein MD, Kuettel D, Weise T, Van Gool L, Pfister H (2008) Real-time face pose estimation from single range images. Comput Vis Pattern Recognit, pp 1–8
Cai Q, Gallup D, Zhang C, Zhang Z (2010) 3D deformable face tracking with a commodity depth camera. In: European conference on computer vision, pp 229–242
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models their training and application. Comput Vis Image Underst 61:38–59
Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. Comput Vis Pattern Recognit, pp 617–624
Fanelli G, Weise T, Gall J, Van Gool L (2011) Real time head pose estimation from consumer depth cameras. In: 33rd annual symposium of the German association for pattern recognition, pp 101–110
Liang G, Zha H, Liu H (2004) Affine correspondence based head pose estimation for a sequence of images by using a 3D model. In: International conference on automatic face and gesture recognition, pp 632–637
Liu X, Liang W, Wang Y, Li S, Pei M (2016) 3D head pose estimation with convolutional neural network trained on synthetic images. In: International conference on image processing, pp 1289–1293
Martin M, Van De Camp F, Stiefelhagen R (2014) Real time head model creation and head pose estimation on consumer depth cameras. International Conference on 3DV 1:641–648
Meyer GP, Gupta S, Frosio I, Reddy D, Kautz J (2015) Robust model-based 3d head pose estimation. In: International conference on computer vision
Mian A, Bennamoun M, Owens R (2006) Automatic 3d face detection, normalization and recognition. Third International Symposium on 3D Data Processing, Visualization, and Transmission, pp 735–742
Milborrow S, Nicolls F (2014) Active shape models with SIFT descriptors and MARS. In: International conference on computer vision theory and applications, vol 2, pp 380–387
Mor JJ (1978) The Levenberg-Marquardt algorithm: implementation and theory. Numer Anal 630:105–116
Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626
Niese R, Al-hamadi A, Michaelis B (2007) A novel method for 3D face detection and normalization. J Multimed 2:1–12
Osadchy M, Cun YL, Miller M (2007) Synergistic face detection and pose estimation with energy-based model. In: Advances in neural information processing systems, pp 1017–1024
Padeleris P, Zabulis X, Argyros AA (2012) Head pose estimation on depth data based on particle swarm optimization. Computer Vision and Pattern Recognition Workshops, pp 42–49
Papazov C, Marks TK, Jones M (2015) Real-time 3d head pose and facial landmark estimation from depth images using triangular surface patch features. Comput Vis Pattern Recognit 36(4):4722–4730
Rekik A, Ben-Hamadou A, Mahdi W (2013) 3D face pose tracking using low quality depth cameras. In: International conference on computer vision theory and applications, vol 2, pp 223–228
Rusu RB, Cousins S (2011) 3D is here: Point Cloud Library (PCL). In: International conference on robotics and automation, pp 1–4
Saeed A, Al-Hamadi A (2015) Boosted human head pose estimation using kinect camera. In: International conference on image processing, pp 1752–1756
Seemann E, Nickel K, Stiefelhagen R (2004) Head pose estimation using stereo vision for human-robot interaction. In: International conference on automatic face and gesture recognition, pp 626–631
Storer M, Urschler M, Bischof H (2009) 3d-MAM: 3d morphable appearance model for efficient fine head pose estimation from still images. In: International conference on computer vision workshops, pp 192–199
Sun Y, Yin L (2008) Automatic pose estimation of 3d facial models. In: International conference on pattern recognition, pp 1–4
Tulyakov S, Vieriu R-L, Semeniuta S, Sebe N (2014) Robust real-time extreme head pose estimation. In: International conference on pattern recognition, pp 2263–2268
Vatahska T, Bennewitz M, Behnke S (2007) Feature-based head pose estimation from images. In: International conference on humanoid robots, pp 330–335
Viola P, Platt JC, Zhang C (2007) Multiple instance boosting for object detection. Adv Neural Inf Proces Syst 18:1417–1426
Weise T, Bouaziz S, Li H, Pauly M (2011) Real time performance-based facial animation. ACM Trans Graph 30(4):77:1–77:10
Whitehill J, Movellan JR (2008) A discriminative approach to frame-by-frame head pose tracking. In: International conference on automatic face and gesture recognition, pp 1–7
Yang R, Zhang Z (2002) Model-based head pose tracking with stereovision. In: International conference on automatic face and gesture recognition, pp 255–260
Yao J, Cham WK (2004) Efficient model-based linear head motion recovery from movies. Comput Vis Pattern Recognit 2:414–421
Zabulis X, Sarmis T, Argyros AA (2009) 3D head pose estimation from multiple distant views. In: Proceedings of the British machine vision conference, pp 118.1–118.12
Acknowledgments
The authors gratefully acknowledge the editor and anonymous reviewers for their comments to help us to improve our paper, and also thank for their enormous help in revising this paper. This work is supported by 863 program of China (No. 2015AA016405), and NSF of China (Nos.61572290, 61672326), and The Fundamental Research Funds of Shandong University (No. 2015JC051).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
For the given parameter Θ and energy function Φ(Θ), we need to find ΔΘ so that Φ(Θ + ΔΘ) decreases gradually until \(\Phi (\hat \Theta )\) reaches a minimum. Then, \(\hat \Theta \) is the optimal parameter. In the LM algorithm ΔΘ is calculated as follows:
where J is the Jacobian matrix, which is \(\frac {\partial {\Phi (\Theta +{\Delta }\Theta )}}{\partial \Theta }\), and λ is the step size. In our algorithm, Jacobian matrix J degenerates into a vector, that consists of the directional derivative in each parameter direction.
Jacobian matrix J can not be calculated using partial derivatives. In the process of solving the parameters, we use a sampling method to estimate it. We choose sufficiently small increments
We then use sampling to calculate Φ(Θ) and Φ(Θ +Δ i Θ),i = 1, 2,⋯ , 6, and its directional derivative can be approached by the following formula:
For parameter λ, if the ΔΘ of an iteration makes Φ(Θ) decrease, this ΔΘ is accepted, and then we need to reduce λ. If Φ(Θ) increases, then we need to increase λ to recalculate.
Rights and permissions
About this article
Cite this article
Li, C., Zhong, F., Zhang, Q. et al. Accurate and fast 3D head pose estimation with noisy RGBD images. Multimed Tools Appl 77, 14605–14624 (2018). https://doi.org/10.1007/s11042-017-5050-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5050-x