Skip to main content

Advertisement

Log in

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Head pose estimation methods can be generally classified into two categories: model-based and appearance-based methods. The model-based approach relies on facial landmarks for three-dimensional reconstruction, aiming to achieve high-precision results. However, this method is heavily dependent on the accuracy of these landmarks. The appearance-based approach utilizes images as input and employs feature extraction and calculations to generate outcomes. While the appearance-based method boasts greater robustness, its accuracy falls short of the former. In this paper, a new and effective hybrid method is proposed. This hybrid approach combines the strengths of both methods. Unlike the conventional model-based methods, the proposed method regards the facial landmarks in 2D images as a sequence of neural network inputs and then obtains the head pose estimation results for users by neural network regression. The proposed method solves the fuzzy rotation labeling problem by using a rotation matrix representation, introducing a 6D rotation matrix representation as an intermediate state of the rotation matrix to achieve effective direct regression. Introducing face processing enhances the robustness of the model in cross-dataset scenarios. The proposed method achieves remarkable results based on imprecise face recognition and a simplistic model. The proposed method can be divided into three parts. First, the proposed method applies face processing on the input image; second, the method detects facial landmarks; and third, it converts these facial landmarks into sequences and obtains the 6D rotation representation of the head pose by regression. Extensive experiments on the publicly available BIWI, PRIMA, and DrivFace datasets show that this method is functional and performs better than other state-of-the-art methods. Compared to other methods, this approach demonstrates an average performance improvement of at least 10% across the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/yinguobing/head-pose-estimation

References

  1. Murphy-Chutorian E, Trivedi MM (2010) Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans Intell Transp Systems 11(2):300–311

    Article  Google Scholar 

  2. Strazdas D, Hintz J, Al-Hamadi A (2021) Robo-HUD: interaction concept for contactless operation of industrial cobotic systems. Appl Sci 11(12):5366

    Article  Google Scholar 

  3. Murphy-Chutorian E, Doshi A, Trivedi MM (2007) Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: 2007 IEEE Intell Transp Syst Conf, pp 709–714 IEEE

  4. Kim H, Lee S-H, Sohn M-K, Kim D-J, Ryu N (2014) Head pose estimation based on random forests with binary pattern run length matrix. In: Advances in computer science and its applications: CSA 2013, pp 255–260 Springer

  5. Khan K, Khan RU, Leonardi R, Migliorati P, Benini S (2021) Head pose estimation: a survey of the last ten years. Signal Process Image Commun 99:116479

    Article  Google Scholar 

  6. Narayanan A, Kaimal RM, Bijlani K (2016) Estimation of driver head yaw angle using a generic geometric model. IEEE Trans Intell Transp Syst 17(12):3446–3460

    Article  Google Scholar 

  7. Werner P, Saxen F, Al-Hamadi A (2017) Landmark based head pose estimation benchmark and method. In: 2017 IEEE international conference on image processing (ICIP), pp 3909–3913

  8. Barros JMD, Garcia F, Mirbach B, Varanasi K, Stricker D (2018) Combined framework for real-time head pose estimation using facial landmark detection and salient feature tracking. In: VISIGRAPP (5: VISAPP), pp 123–133

  9. Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18

    Article  Google Scholar 

  10. Malek S, Rossi S (2021) Head pose estimation using facial-landmarks classification for children rehabilitation games. Pattern Recognit Lett 152:406–412

    Article  Google Scholar 

  11. Ma B, Huang R, Qin L (2015) VoD: a novel image representation for head yaw estimation. Neurocomputing 148:455–466

    Article  Google Scholar 

  12. Jain V, Crowley JL (2013) Head pose estimation using multi-scale gaussian derivatives. In: Scandinavian conference on image analysis, pp 319–328 Springer

  13. Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG net workshop on visual observation of deictic gestures, vol 6, p 7 Citeseer

  14. Zhou Y, Gregson J (2020) WHENet: real-time fine-grained estimation for wide range head pose. arXiv:2005.10353

  15. Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083

  16. Lu J, Tan Y-P (2012) Ordinary preserving manifold analysis for human age and head pose estimation. IEEE Trans Hum Mach Syst 43(2):249–258

    Article  MathSciNet  Google Scholar 

  17. Diaz-Chito K, Del Rincon JM, Hernández-Sabaté A, Gil D (2018) Continuous head pose estimation using manifold subspace embedding and multivariate regression. IEEE Access 6:18325–18334

    Article  Google Scholar 

  18. Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter conference on applications of computer vision (WACV), pp 1–10

  19. Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135

    Article  Google Scholar 

  20. Cai Y, Yang M-L, Li J (2015) Multiclass classification based on a deep convolutional network for head pose estimation. Front Inf Technol Electr Eng 16(11):930–939

    Article  Google Scholar 

  21. Hsu H-W, Wu T-Y, Wan S, Wong WH, Lee C-Y (2018) QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans Multimedia 21(4):1035–1046

    Article  Google Scholar 

  22. Wang Y, Liang W, Shen J, Jia Y, Yu L-F (2019) A deep coarse-to-fine network for head pose estimation from synthetic data. Pattern Recognit 94:196–206

    Article  Google Scholar 

  23. Mbouna RO, Kong SG, Chun M-G (2013) Visual analysis of eye state and head pose for driver alertness monitoring. IEEE Trans Intell Transp Syst 14(3):1462–1469

    Article  Google Scholar 

  24. Wang H, Davoine F, Lepetit V, Chaillou C, Pan C (2012) 3-D head tracking via invariant keypoint learning. IEEE Trans Circuits Syst Video Technol 22(8):1113–1126

    Article  Google Scholar 

  25. Ji Q (2002) 3D face pose estimation and tracking from a monocular camera. Image Vis Comput 20(7):499–511

    Article  Google Scholar 

  26. Nikolaidis A, Pitas I (2000) Facial feature extraction and pose determination. Pattern Recognit 33(11):1783–1791

    Article  Google Scholar 

  27. Valenti R, Sebe N, Gevers T (2011) Combining head pose and eye location information for gaze estimation. IEEE Trans Image Process 21(2):802–815

    Article  MathSciNet  Google Scholar 

  28. Drouard V, Horaud R, Deleforge A, Ba S, Evangelidis G (2017) Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans Image Process 26(3):1428–1440

    Article  MathSciNet  Google Scholar 

  29. Asthana A, Zafeiriou S, Cheng S, Pantic M (2013) Robust discriminative response map fitting with constrained local models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3444–3451

  30. Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 146–155

  31. Xia J, Cao L, Zhang G, Liao J (2019) Head pose estimation in the wild assisted by facial landmarks based on convolutional neural networks. IEEE Access 7:48470–48483

    Article  Google Scholar 

  32. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874

  33. Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2D & 3D face alignment problem?(and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE international conference on computer vision, pp 1021–1030

  34. Liu L, Ke Z, Huo J, Chen J (2021) Head pose estimation through keypoints matching between reconstructed 3D face model and 2D image. Sensors 21(5):1841

    Article  Google Scholar 

  35. Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. CVPR 2011:617–624

    Google Scholar 

  36. Diaz-Chito K, Hernández-Sabaté A, López AM (2016) A reduced feature set for driver head pose estimation. Appl Soft Comput 45:98–107

    Article  Google Scholar 

  37. Hemingway EG, ÓReilly OM (2018) Perspectives on euler angle singularities, gimbal lock, and the orthogonality of applied forces and applied moments. Multibody Syst Dyn 44:31–56

  38. Hempel T, Abdelrahman AA, Al-Hamadi A (2022) 6D rotation representation for unconstrained head pose estimation. arXiv:2202.12555

  39. Saxena A, Driemeyer J, Ng AY (2009) Learning 3-D object orientation from images. In: 2009 IEEE international conference on robotics and automation, pp 794–800

  40. Zhou Y, Barnes C, Lu J, Yang J, Li H (2019) On the continuity of rotation representations in neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5745–5753

  41. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  42. Zhang H, Wang M, Liu Y, Yuan Y (2020) FDN: feature decoupling network for head pose estimation. Proc AAAI Conf Artif Intell 34:12789–12796

    Google Scholar 

  43. Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1087–1096

Download references

Funding

This work was funded by the Key Research and Development Program of Yunnan Province (202102AA100021);the National Natural Science Foundation of China (Grant Nos.62066048);Demonstration project of comprehensive government management and large-scale industrial application of the major special project of CHEOS: 89-Y50G31-9001-22/23;the Science Foundation of Yunnan Province(202101AT070167) and supported by a grant from Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province; This work has also been supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province under Grant No. 2020SE407.

Author information

Authors and Affiliations

Authors

Contributions

Na Zhao is the proposer of the idea of the paper and the author of the paper. Yaofei Ma is the one who realized the experiments of the paper. Xiaopeng Li is responsible for the layout of the thesis and drawing the experimental pictures. Shin-Jye Lee is the supervisor and director of the thesis work. Jian Wang is the retoucher of the thesis and the provider of the experimental equipment.

Corresponding authors

Correspondence to Shin-Jye Lee or Jian Wang.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability statement

All datasets used in the paper are publicly available.

Code availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Ethics approval

Not applicable.

Consent to participate

All image data in the paper are from publicly available datasets, and the participants in this figure have signed a "Consent for publication" form and have agreed to have their images published in public datasets. Details of these image datasets, including descriptions and links, are described as detailed in the text. All datasets used in the paper are publicly available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

To facilitate readers’ better understanding of the formulas employed in the paper, this paper has provided a comprehensive description of all symbols utilized in the formulas within Table 7.

Table 7 Table of symbols

The derivation formula for converting 6D rotation matrix to rotation matrix is as follows:

$$\begin{aligned} f_{\text{ map } }\left( \left[ \begin{array}{cc} \mid &{} \mid \\ \alpha _1 &{} \alpha _2 \\ \mid &{} \mid \end{array}\right] \right) =\left[ \begin{array}{ccc} \mid &{} \mid &{} \mid \\ \beta _1 &{} \beta _2 &{} \beta _3 \\ \mid &{} \mid &{} \mid \end{array}\right] \end{aligned}$$
(12)

The last column vector can be obtained from the first two column vectors by calculation. This work refers to the method in [40] and uses Gram–Schmidt orthogonalization for the two-column vectors in the 6D rotation representation matrix to obtain an orthogonal matrix that satisfies the constraints.

$$\begin{aligned} \begin{gathered} \beta _1=\frac{\alpha _1}{\left\| \alpha _1\right\| } \\ \beta _2=\frac{\gamma _2}{\left\| \gamma _2\right\| }, \gamma _2=\alpha _2-\left( \beta _1 \cdot \alpha _2\right) \beta _1 \\ \beta _3=\beta _1 \times \beta _2 \end{gathered} \end{aligned}$$
(13)

The derivation process of the rotation matrix R is as follows:

$$\begin{aligned} \begin{aligned}&R_x(\alpha )=\left[ \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} \cos \alpha &{} -\sin \alpha \\ 0 &{} \sin \alpha &{} \cos \alpha \end{array}\right] \\ \end{aligned} \end{aligned}$$
(14)
$$\begin{aligned} \begin{aligned}&R_y(\beta )=\left[ \begin{array}{ccc} \cos \beta &{} 0 &{} \sin \beta \\ 0 &{} 1 &{} 0 \\ -\sin \beta &{} 0 &{} \cos \beta \end{array}\right] \\ \end{aligned} \end{aligned}$$
(15)
$$\begin{aligned} \begin{aligned}&R_z(\gamma )=\left[ \begin{array}{ccc} \cos \gamma &{} -\sin \gamma &{} 0 \\ \sin \gamma &{} \cos \gamma &{} 0 \\ 0 &{} 0 &{} 1 \end{array}\right] \end{aligned} \end{aligned}$$
(16)

The rotation matrix is obtained by rotating according to the X-Y-Z axis:

$$\begin{aligned} \begin{gathered} R=R_z(\gamma ) * R_y(\beta ) * R_x(\alpha ) \\ =\left[ \begin{array}{ccc} \cos \gamma \cos \beta &{} \cos \gamma \sin \beta \sin \alpha -\sin \gamma \cos \alpha &{} \cos \gamma \sin \beta \cos \alpha +\sin \gamma \sin \alpha \\ \sin \gamma \cos \beta &{} \sin \gamma \sin \beta \sin \alpha +\cos \gamma \cos \alpha &{} \sin \gamma \sin \beta \cos \alpha -\cos \gamma \sin \alpha \\ -\sin \beta &{} \cos \beta \sin \alpha &{} \cos \beta \cos \alpha \end{array}\right] \end{gathered} \end{aligned}$$
(17)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, N., Ma, Y., Li, X. et al. 6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression. Multimed Tools Appl 83, 68605–68624 (2024). https://doi.org/10.1007/s11042-023-17731-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17731-6

Keywords