Accurate and fast 3D head pose estimation with noisy RGBD images

Li, Chenglong; Zhong, Fan; Zhang, Qian; Qin, Xueying

doi:10.1007/s11042-017-5050-x

Accurate and fast 3D head pose estimation with noisy RGBD images

Published: 14 August 2017

Volume 77, pages 14605–14624, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chenglong Li¹,
Fan Zhong¹,
Qian Zhang¹ &
…
Xueying Qin¹

683 Accesses
14 Citations
Explore all metrics

Abstract

Head pose estimation plays an essential role in many high-level face analysis tasks. However, accurate and robust pose estimation with existing approaches remains challenging. In this paper, we propose a novel method for accurate three-dimensional (3D) head pose estimation with noisy depth maps and high-resolution color images that are typically produced by popular RGBD cameras such as the Microsoft Kinect. Our method combines the advantages of the high-resolution RGB image with the 3D information of the depth image. For better accuracy and robustness, features are first detected using only the color image, and then the 3D feature points used for matching are obtained by combining depth information. The outliers are then filtered with depth information using rules proposed for depth consistency, normal consistency, and re-projection consistency, which effectively eliminate the influence of depth noise. The pose parameters are then iteratively optimized using the Extended LM (Levenberg-Marquardt) method. Finally, a Kalman filter is used to smooth the parameters. To evaluate our method, we built a database of more than 10K RGBD images with ground-truth poses recorded using motion capture. Both qualitative and quantitative evaluations show that our method produces notably smaller errors than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A review on face recognition systems: recent approaches and challenges

Article 30 July 2020

Muhtahir O. Oloyede, Gerhard P. Hancke & Hermanus C. Myburgh

Fiducial Markers for Pose Estimation

Article 26 March 2021

Michail Kalaitzakis, Brennan Cain, … Nikolaos Vitzilaios

3D point cloud-based place recognition: a survey

Article Open access 07 March 2024

Kan Luo, Hongshan Yu, … Ajmal Mian

References

Baltrusaitis T, Robinson P, Morency L (2012) 3D constrained local model for rigid and non-rigid facial tracking. Comput Vis Pattern Recognit 157(10):2610–2617
Google Scholar
Breitenstein MD, Kuettel D, Weise T, Van Gool L, Pfister H (2008) Real-time face pose estimation from single range images. Comput Vis Pattern Recognit, pp 1–8
Cai Q, Gallup D, Zhang C, Zhang Z (2010) 3D deformable face tracking with a commodity depth camera. In: European conference on computer vision, pp 229–242
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Article Google Scholar
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models their training and application. Comput Vis Image Underst 61:38–59
Article Google Scholar
Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. Comput Vis Pattern Recognit, pp 617–624
Fanelli G, Weise T, Gall J, Van Gool L (2011) Real time head pose estimation from consumer depth cameras. In: 33rd annual symposium of the German association for pattern recognition, pp 101–110
Liang G, Zha H, Liu H (2004) Affine correspondence based head pose estimation for a sequence of images by using a 3D model. In: International conference on automatic face and gesture recognition, pp 632–637
Liu X, Liang W, Wang Y, Li S, Pei M (2016) 3D head pose estimation with convolutional neural network trained on synthetic images. In: International conference on image processing, pp 1289–1293
Martin M, Van De Camp F, Stiefelhagen R (2014) Real time head model creation and head pose estimation on consumer depth cameras. International Conference on 3DV 1:641–648
Google Scholar
Meyer GP, Gupta S, Frosio I, Reddy D, Kautz J (2015) Robust model-based 3d head pose estimation. In: International conference on computer vision
Mian A, Bennamoun M, Owens R (2006) Automatic 3d face detection, normalization and recognition. Third International Symposium on 3D Data Processing, Visualization, and Transmission, pp 735–742
Milborrow S, Nicolls F (2014) Active shape models with SIFT descriptors and MARS. In: International conference on computer vision theory and applications, vol 2, pp 380–387
Mor JJ (1978) The Levenberg-Marquardt algorithm: implementation and theory. Numer Anal 630:105–116
MathSciNet Google Scholar
Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626
Article Google Scholar
Niese R, Al-hamadi A, Michaelis B (2007) A novel method for 3D face detection and normalization. J Multimed 2:1–12
Google Scholar
Osadchy M, Cun YL, Miller M (2007) Synergistic face detection and pose estimation with energy-based model. In: Advances in neural information processing systems, pp 1017–1024
Padeleris P, Zabulis X, Argyros AA (2012) Head pose estimation on depth data based on particle swarm optimization. Computer Vision and Pattern Recognition Workshops, pp 42–49
Papazov C, Marks TK, Jones M (2015) Real-time 3d head pose and facial landmark estimation from depth images using triangular surface patch features. Comput Vis Pattern Recognit 36(4):4722–4730
Google Scholar
Rekik A, Ben-Hamadou A, Mahdi W (2013) 3D face pose tracking using low quality depth cameras. In: International conference on computer vision theory and applications, vol 2, pp 223–228
Rusu RB, Cousins S (2011) 3D is here: Point Cloud Library (PCL). In: International conference on robotics and automation, pp 1–4
Saeed A, Al-Hamadi A (2015) Boosted human head pose estimation using kinect camera. In: International conference on image processing, pp 1752–1756
Seemann E, Nickel K, Stiefelhagen R (2004) Head pose estimation using stereo vision for human-robot interaction. In: International conference on automatic face and gesture recognition, pp 626–631
Storer M, Urschler M, Bischof H (2009) 3d-MAM: 3d morphable appearance model for efficient fine head pose estimation from still images. In: International conference on computer vision workshops, pp 192–199
Sun Y, Yin L (2008) Automatic pose estimation of 3d facial models. In: International conference on pattern recognition, pp 1–4
Tulyakov S, Vieriu R-L, Semeniuta S, Sebe N (2014) Robust real-time extreme head pose estimation. In: International conference on pattern recognition, pp 2263–2268
Vatahska T, Bennewitz M, Behnke S (2007) Feature-based head pose estimation from images. In: International conference on humanoid robots, pp 330–335
Viola P, Platt JC, Zhang C (2007) Multiple instance boosting for object detection. Adv Neural Inf Proces Syst 18:1417–1426
Google Scholar
Weise T, Bouaziz S, Li H, Pauly M (2011) Real time performance-based facial animation. ACM Trans Graph 30(4):77:1–77:10
Article Google Scholar
Whitehill J, Movellan JR (2008) A discriminative approach to frame-by-frame head pose tracking. In: International conference on automatic face and gesture recognition, pp 1–7
Yang R, Zhang Z (2002) Model-based head pose tracking with stereovision. In: International conference on automatic face and gesture recognition, pp 255–260
Yao J, Cham WK (2004) Efficient model-based linear head motion recovery from movies. Comput Vis Pattern Recognit 2:414–421
Google Scholar
Zabulis X, Sarmis T, Argyros AA (2009) 3D head pose estimation from multiple distant views. In: Proceedings of the British machine vision conference, pp 118.1–118.12

Download references

Acknowledgments

The authors gratefully acknowledge the editor and anonymous reviewers for their comments to help us to improve our paper, and also thank for their enormous help in revising this paper. This work is supported by 863 program of China (No. 2015AA016405), and NSF of China (Nos.61572290, 61672326), and The Fundamental Research Funds of Shandong University (No. 2015JC051).

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, No.1500 Shunhua Road, High-tech Zone, Jinan, China
Chenglong Li, Fan Zhong, Qian Zhang & Xueying Qin

Authors

Chenglong Li
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xueying Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Zhong.

Appendix

For the given parameter Θ and energy function Φ(Θ), we need to find ΔΘ so that Φ(Θ + ΔΘ) decreases gradually until $\Phi (\hat \Theta )$ reaches a minimum. Then, $\hat \Theta $ is the optimal parameter. In the LM algorithm ΔΘ is calculated as follows:

$$ {\Delta}\Theta=-({J}^{T}{J}+\lambda{J})^{-1}{J}^{T}(\Phi(\Theta+{\Delta}\Theta)-\Phi(\Theta)) $$

(17)

where J is the Jacobian matrix, which is $\frac {\partial {\Phi (\Theta +{\Delta }\Theta )}}{\partial \Theta }$, and λ is the step size. In our algorithm, Jacobian matrix J degenerates into a vector, that consists of the directional derivative in each parameter direction.

$$ {J}=\left[ \frac{\partial \Phi}{\partial \theta},\frac{\partial \Phi}{\partial \psi},\frac{\partial \Phi}{\partial \phi},\cdots\right]^{T} $$

(18)

Jacobian matrix J can not be calculated using partial derivatives. In the process of solving the parameters, we use a sampling method to estimate it. We choose sufficiently small increments

$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{llllll} \nabla_{\theta}=[d_{\theta},0,0,0,0,0]^{T} , \\ \nabla_{\psi}=[0,d_{\psi},0,0,0,0]^{T} , \\ \nabla_{\phi}=[0,0,d_{\phi},0,0,0]^{T} , \\ \nabla_{t_{x}}=[0,0,0,d_{t_{x}},0,0]^{T} , \\ \nabla_{t_{y}}=[0,0,0,0,d_{t_{y}},0]^{T} , \\ \nabla_{t_{z}}=[0,0,0,0,0,d_{t_{z}}]^{T} , \end{array}\right. \end{array} $$

(19)

We then use sampling to calculate Φ(Θ) and Φ(Θ +Δ_iΘ),i = 1, 2,⋯ , 6, and its directional derivative can be approached by the following formula:

$$\begin{array}{@{}rcl@{}} \left\{\begin{array}{llllll} \frac{\partial{\Phi(\Theta)}}{\partial{\theta}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{\theta})}{d_{\theta}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{\psi}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{\psi})}{d_{\psi}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{\phi}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{\phi})}{d_{\phi}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{t_{x}}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{t_{x}})}{d_{t_{x}}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{t_{y}}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{t_{y}})}{d_{t_{y}}}, \\ \frac{\partial{\Phi(\Theta)}}{\partial{t_{z}}} \approx \frac{\Phi(\Theta)-\Phi(\Theta+\nabla_{t_{z}})}{d_{t_{z}}}, \end{array}\right. \end{array} $$

(20)

For parameter λ, if the ΔΘ of an iteration makes Φ(Θ) decrease, this ΔΘ is accepted, and then we need to reduce λ. If Φ(Θ) increases, then we need to increase λ to recalculate.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Zhong, F., Zhang, Q. et al. Accurate and fast 3D head pose estimation with noisy RGBD images. Multimed Tools Appl 77, 14605–14624 (2018). https://doi.org/10.1007/s11042-017-5050-x

Download citation

Received: 07 September 2016
Revised: 08 June 2017
Accepted: 24 July 2017
Published: 14 August 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11042-017-5050-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Accurate and fast 3D head pose estimation with noisy RGBD images

Abstract

Access this article

Similar content being viewed by others

A review on face recognition systems: recent approaches and challenges

Fiducial Markers for Pose Estimation

3D point cloud-based place recognition: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate and fast 3D head pose estimation with noisy RGBD images

Abstract

Access this article

Similar content being viewed by others

A review on face recognition systems: recent approaches and challenges

Fiducial Markers for Pose Estimation

3D point cloud-based place recognition: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation