Skip to main content
Log in

Accurate and fast 3D head pose estimation with noisy RGBD images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Head pose estimation plays an essential role in many high-level face analysis tasks. However, accurate and robust pose estimation with existing approaches remains challenging. In this paper, we propose a novel method for accurate three-dimensional (3D) head pose estimation with noisy depth maps and high-resolution color images that are typically produced by popular RGBD cameras such as the Microsoft Kinect. Our method combines the advantages of the high-resolution RGB image with the 3D information of the depth image. For better accuracy and robustness, features are first detected using only the color image, and then the 3D feature points used for matching are obtained by combining depth information. The outliers are then filtered with depth information using rules proposed for depth consistency, normal consistency, and re-projection consistency, which effectively eliminate the influence of depth noise. The pose parameters are then iteratively optimized using the Extended LM (Levenberg-Marquardt) method. Finally, a Kalman filter is used to smooth the parameters. To evaluate our method, we built a database of more than 10K RGBD images with ground-truth poses recorded using motion capture. Both qualitative and quantitative evaluations show that our method produces notably smaller errors than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Baltrusaitis T, Robinson P, Morency L (2012) 3D constrained local model for rigid and non-rigid facial tracking. Comput Vis Pattern Recognit 157(10):2610–2617

    Google Scholar 

  2. Breitenstein MD, Kuettel D, Weise T, Van Gool L, Pfister H (2008) Real-time face pose estimation from single range images. Comput Vis Pattern Recognit, pp 1–8

  3. Cai Q, Gallup D, Zhang C, Zhang Z (2010) 3D deformable face tracking with a commodity depth camera. In: European conference on computer vision, pp 229–242

  4. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685

    Article  Google Scholar 

  5. Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models their training and application. Comput Vis Image Underst 61:38–59

    Article  Google Scholar 

  6. Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. Comput Vis Pattern Recognit, pp 617–624

  7. Fanelli G, Weise T, Gall J, Van Gool L (2011) Real time head pose estimation from consumer depth cameras. In: 33rd annual symposium of the German association for pattern recognition, pp 101–110

  8. Liang G, Zha H, Liu H (2004) Affine correspondence based head pose estimation for a sequence of images by using a 3D model. In: International conference on automatic face and gesture recognition, pp 632–637

  9. Liu X, Liang W, Wang Y, Li S, Pei M (2016) 3D head pose estimation with convolutional neural network trained on synthetic images. In: International conference on image processing, pp 1289–1293

  10. Martin M, Van De Camp F, Stiefelhagen R (2014) Real time head model creation and head pose estimation on consumer depth cameras. International Conference on 3DV 1:641–648

    Google Scholar 

  11. Meyer GP, Gupta S, Frosio I, Reddy D, Kautz J (2015) Robust model-based 3d head pose estimation. In: International conference on computer vision

  12. Mian A, Bennamoun M, Owens R (2006) Automatic 3d face detection, normalization and recognition. Third International Symposium on 3D Data Processing, Visualization, and Transmission, pp 735–742

  13. Milborrow S, Nicolls F (2014) Active shape models with SIFT descriptors and MARS. In: International conference on computer vision theory and applications, vol 2, pp 380–387

  14. Mor JJ (1978) The Levenberg-Marquardt algorithm: implementation and theory. Numer Anal 630:105–116

    MathSciNet  Google Scholar 

  15. Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626

    Article  Google Scholar 

  16. Niese R, Al-hamadi A, Michaelis B (2007) A novel method for 3D face detection and normalization. J Multimed 2:1–12

    Google Scholar 

  17. Osadchy M, Cun YL, Miller M (2007) Synergistic face detection and pose estimation with energy-based model. In: Advances in neural information processing systems, pp 1017–1024

  18. Padeleris P, Zabulis X, Argyros AA (2012) Head pose estimation on depth data based on particle swarm optimization. Computer Vision and Pattern Recognition Workshops, pp 42–49

  19. Papazov C, Marks TK, Jones M (2015) Real-time 3d head pose and facial landmark estimation from depth images using triangular surface patch features. Comput Vis Pattern Recognit 36(4):4722–4730

    Google Scholar 

  20. Rekik A, Ben-Hamadou A, Mahdi W (2013) 3D face pose tracking using low quality depth cameras. In: International conference on computer vision theory and applications, vol 2, pp 223–228

  21. Rusu RB, Cousins S (2011) 3D is here: Point Cloud Library (PCL). In: International conference on robotics and automation, pp 1–4

  22. Saeed A, Al-Hamadi A (2015) Boosted human head pose estimation using kinect camera. In: International conference on image processing, pp 1752–1756

  23. Seemann E, Nickel K, Stiefelhagen R (2004) Head pose estimation using stereo vision for human-robot interaction. In: International conference on automatic face and gesture recognition, pp 626–631

  24. Storer M, Urschler M, Bischof H (2009) 3d-MAM: 3d morphable appearance model for efficient fine head pose estimation from still images. In: International conference on computer vision workshops, pp 192–199

  25. Sun Y, Yin L (2008) Automatic pose estimation of 3d facial models. In: International conference on pattern recognition, pp 1–4

  26. Tulyakov S, Vieriu R-L, Semeniuta S, Sebe N (2014) Robust real-time extreme head pose estimation. In: International conference on pattern recognition, pp 2263–2268

  27. Vatahska T, Bennewitz M, Behnke S (2007) Feature-based head pose estimation from images. In: International conference on humanoid robots, pp 330–335

  28. Viola P, Platt JC, Zhang C (2007) Multiple instance boosting for object detection. Adv Neural Inf Proces Syst 18:1417–1426

    Google Scholar 

  29. Weise T, Bouaziz S, Li H, Pauly M (2011) Real time performance-based facial animation. ACM Trans Graph 30(4):77:1–77:10

    Article  Google Scholar 

  30. Whitehill J, Movellan JR (2008) A discriminative approach to frame-by-frame head pose tracking. In: International conference on automatic face and gesture recognition, pp 1–7

  31. Yang R, Zhang Z (2002) Model-based head pose tracking with stereovision. In: International conference on automatic face and gesture recognition, pp 255–260

  32. Yao J, Cham WK (2004) Efficient model-based linear head motion recovery from movies. Comput Vis Pattern Recognit 2:414–421

    Google Scholar 

  33. Zabulis X, Sarmis T, Argyros AA (2009) 3D head pose estimation from multiple distant views. In: Proceedings of the British machine vision conference, pp 118.1–118.12

Download references

Acknowledgments

The authors gratefully acknowledge the editor and anonymous reviewers for their comments to help us to improve our paper, and also thank for their enormous help in revising this paper. This work is supported by 863 program of China (No. 2015AA016405), and NSF of China (Nos.61572290, 61672326), and The Fundamental Research Funds of Shandong University (No. 2015JC051).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Zhong.

Appendix

Appendix

For the given parameter Θ and energy function Φ(Θ), we need to find ΔΘ so that Φ(Θ + ΔΘ) decreases gradually until \(\Phi (\hat \Theta )\) reaches a minimum. Then, \(\hat \Theta \) is the optimal parameter. In the LM algorithm ΔΘ is calculated as follows:

$$ {\Delta}\Theta=-({J}^{T}{J}+\lambda{J})^{-1}{J}^{T}(\Phi(\Theta+{\Delta}\Theta)-\Phi(\Theta)) $$
(17)

where J is the Jacobian matrix, which is \(\frac {\partial {\Phi (\Theta +{\Delta }\Theta )}}{\partial \Theta }\), and λ is the step size. In our algorithm, Jacobian matrix J degenerates into a vector, that consists of the directional derivative in each parameter direction.

$$ {J}=\left[ \frac{\partial \Phi}{\partial \theta},\frac{\partial \Phi}{\partial \psi},\frac{\partial \Phi}{\partial \phi},\cdots\right]^{T} $$
(18)

Jacobian matrix J can not be calculated using partial derivatives. In the process of solving the parameters, we use a sampling method to estimate it. We choose sufficiently small increments

$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{llllll} \nabla_{\theta}=[d_{\theta},0,0,0,0,0]^{T} , \\ \nabla_{\psi}=[0,d_{\psi},0,0,0,0]^{T} , \\ \nabla_{\phi}=[0,0,d_{\phi},0,0,0]^{T} , \\ \nabla_{t_{x}}=[0,0,0,d_{t_{x}},0,0]^{T} , \\ \nabla_{t_{y}}=[0,0,0,0,d_{t_{y}},0]^{T} , \\ \nabla_{t_{z}}=[0,0,0,0,0,d_{t_{z}}]^{T} , \end{array}\right. \end{array} $$
(19)

We then use sampling to calculate Φ(Θ) and Φ(Θ +Δ i Θ),i = 1, 2,⋯ , 6, and its directional derivative can be approached by the following formula:

$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{llllll} \frac{\partial{\Phi(\Theta)}}{\partial{\theta}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{\theta})}{d_{\theta}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{\psi}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{\psi})}{d_{\psi}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{\phi}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{\phi})}{d_{\phi}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{t_{x}}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{t_{x}})}{d_{t_{x}}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{t_{y}}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{t_{y}})}{d_{t_{y}}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{t_{z}}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{t_{z}})}{d_{t_{z}}}, \end{array}\right. \end{array} $$
(20)

For parameter λ, if the ΔΘ of an iteration makes Φ(Θ) decrease, this ΔΘ is accepted, and then we need to reduce λ. If Φ(Θ) increases, then we need to increase λ to recalculate.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Zhong, F., Zhang, Q. et al. Accurate and fast 3D head pose estimation with noisy RGBD images. Multimed Tools Appl 77, 14605–14624 (2018). https://doi.org/10.1007/s11042-017-5050-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5050-x

Keywords

Navigation