6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

Zhao, Na; Ma, Yaofei; Li, Xiaopeng; Lee, Shin-Jye; Wang, Jian

doi:10.1007/s11042-023-17731-6

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

Published: 26 January 2024

Volume 83, pages 68605–68624, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Na Zhao^1,2,
Yaofei Ma^1,2,
Xiaopeng Li^1,2,
Shin-Jye Lee³ &
…
Jian Wang ORCID: orcid.org/0009-0000-1652-9302⁴

263 Accesses
Explore all metrics

Abstract

Head pose estimation methods can be generally classified into two categories: model-based and appearance-based methods. The model-based approach relies on facial landmarks for three-dimensional reconstruction, aiming to achieve high-precision results. However, this method is heavily dependent on the accuracy of these landmarks. The appearance-based approach utilizes images as input and employs feature extraction and calculations to generate outcomes. While the appearance-based method boasts greater robustness, its accuracy falls short of the former. In this paper, a new and effective hybrid method is proposed. This hybrid approach combines the strengths of both methods. Unlike the conventional model-based methods, the proposed method regards the facial landmarks in 2D images as a sequence of neural network inputs and then obtains the head pose estimation results for users by neural network regression. The proposed method solves the fuzzy rotation labeling problem by using a rotation matrix representation, introducing a 6D rotation matrix representation as an intermediate state of the rotation matrix to achieve effective direct regression. Introducing face processing enhances the robustness of the model in cross-dataset scenarios. The proposed method achieves remarkable results based on imprecise face recognition and a simplistic model. The proposed method can be divided into three parts. First, the proposed method applies face processing on the input image; second, the method detects facial landmarks; and third, it converts these facial landmarks into sequences and obtains the 6D rotation representation of the head pose by regression. Extensive experiments on the publicly available BIWI, PRIMA, and DrivFace datasets show that this method is functional and performs better than other state-of-the-art methods. Compared to other methods, this approach demonstrates an average performance improvement of at least 10% across the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

6DoF Head Pose Estimation Through Explicit Bidirectional Interaction with Face Geometry

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Article 02 February 2023

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

Notes

https://github.com/yinguobing/head-pose-estimation

References

Murphy-Chutorian E, Trivedi MM (2010) Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans Intell Transp Systems 11(2):300–311
Article Google Scholar
Strazdas D, Hintz J, Al-Hamadi A (2021) Robo-HUD: interaction concept for contactless operation of industrial cobotic systems. Appl Sci 11(12):5366
Article Google Scholar
Murphy-Chutorian E, Doshi A, Trivedi MM (2007) Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: 2007 IEEE Intell Transp Syst Conf, pp 709–714 IEEE
Kim H, Lee S-H, Sohn M-K, Kim D-J, Ryu N (2014) Head pose estimation based on random forests with binary pattern run length matrix. In: Advances in computer science and its applications: CSA 2013, pp 255–260 Springer
Khan K, Khan RU, Leonardi R, Migliorati P, Benini S (2021) Head pose estimation: a survey of the last ten years. Signal Process Image Commun 99:116479
Article Google Scholar
Narayanan A, Kaimal RM, Bijlani K (2016) Estimation of driver head yaw angle using a generic geometric model. IEEE Trans Intell Transp Syst 17(12):3446–3460
Article Google Scholar
Werner P, Saxen F, Al-Hamadi A (2017) Landmark based head pose estimation benchmark and method. In: 2017 IEEE international conference on image processing (ICIP), pp 3909–3913
Barros JMD, Garcia F, Mirbach B, Varanasi K, Stricker D (2018) Combined framework for real-time head pose estimation using facial landmark detection and salient feature tracking. In: VISIGRAPP (5: VISAPP), pp 123–133
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18
Article Google Scholar
Malek S, Rossi S (2021) Head pose estimation using facial-landmarks classification for children rehabilitation games. Pattern Recognit Lett 152:406–412
Article Google Scholar
Ma B, Huang R, Qin L (2015) VoD: a novel image representation for head yaw estimation. Neurocomputing 148:455–466
Article Google Scholar
Jain V, Crowley JL (2013) Head pose estimation using multi-scale gaussian derivatives. In: Scandinavian conference on image analysis, pp 319–328 Springer
Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG net workshop on visual observation of deictic gestures, vol 6, p 7 Citeseer
Zhou Y, Gregson J (2020) WHENet: real-time fine-grained estimation for wide range head pose. arXiv:2005.10353
Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083
Lu J, Tan Y-P (2012) Ordinary preserving manifold analysis for human age and head pose estimation. IEEE Trans Hum Mach Syst 43(2):249–258
Article MathSciNet Google Scholar
Diaz-Chito K, Del Rincon JM, Hernández-Sabaté A, Gil D (2018) Continuous head pose estimation using manifold subspace embedding and multivariate regression. IEEE Access 6:18325–18334
Article Google Scholar
Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter conference on applications of computer vision (WACV), pp 1–10
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Article Google Scholar
Cai Y, Yang M-L, Li J (2015) Multiclass classification based on a deep convolutional network for head pose estimation. Front Inf Technol Electr Eng 16(11):930–939
Article Google Scholar
Hsu H-W, Wu T-Y, Wan S, Wong WH, Lee C-Y (2018) QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans Multimedia 21(4):1035–1046
Article Google Scholar
Wang Y, Liang W, Shen J, Jia Y, Yu L-F (2019) A deep coarse-to-fine network for head pose estimation from synthetic data. Pattern Recognit 94:196–206
Article Google Scholar
Mbouna RO, Kong SG, Chun M-G (2013) Visual analysis of eye state and head pose for driver alertness monitoring. IEEE Trans Intell Transp Syst 14(3):1462–1469
Article Google Scholar
Wang H, Davoine F, Lepetit V, Chaillou C, Pan C (2012) 3-D head tracking via invariant keypoint learning. IEEE Trans Circuits Syst Video Technol 22(8):1113–1126
Article Google Scholar
Ji Q (2002) 3D face pose estimation and tracking from a monocular camera. Image Vis Comput 20(7):499–511
Article Google Scholar
Nikolaidis A, Pitas I (2000) Facial feature extraction and pose determination. Pattern Recognit 33(11):1783–1791
Article Google Scholar
Valenti R, Sebe N, Gevers T (2011) Combining head pose and eye location information for gaze estimation. IEEE Trans Image Process 21(2):802–815
Article MathSciNet Google Scholar
Drouard V, Horaud R, Deleforge A, Ba S, Evangelidis G (2017) Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans Image Process 26(3):1428–1440
Article MathSciNet Google Scholar
Asthana A, Zafeiriou S, Cheng S, Pantic M (2013) Robust discriminative response map fitting with constrained local models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3444–3451
Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 146–155
Xia J, Cao L, Zhang G, Liao J (2019) Head pose estimation in the wild assisted by facial landmarks based on convolutional neural networks. IEEE Access 7:48470–48483
Article Google Scholar
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874
Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2D & 3D face alignment problem?(and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE international conference on computer vision, pp 1021–1030
Liu L, Ke Z, Huo J, Chen J (2021) Head pose estimation through keypoints matching between reconstructed 3D face model and 2D image. Sensors 21(5):1841
Article Google Scholar
Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. CVPR 2011:617–624
Google Scholar
Diaz-Chito K, Hernández-Sabaté A, López AM (2016) A reduced feature set for driver head pose estimation. Appl Soft Comput 45:98–107
Article Google Scholar
Hemingway EG, ÓReilly OM (2018) Perspectives on euler angle singularities, gimbal lock, and the orthogonality of applied forces and applied moments. Multibody Syst Dyn 44:31–56
Hempel T, Abdelrahman AA, Al-Hamadi A (2022) 6D rotation representation for unconstrained head pose estimation. arXiv:2202.12555
Saxena A, Driemeyer J, Ng AY (2009) Learning 3-D object orientation from images. In: 2009 IEEE international conference on robotics and automation, pp 794–800
Zhou Y, Barnes C, Lu J, Yang J, Li H (2019) On the continuity of rotation representations in neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5745–5753
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
Zhang H, Wang M, Liu Y, Yuan Y (2020) FDN: feature decoupling network for head pose estimation. Proc AAAI Conf Artif Intell 34:12789–12796
Google Scholar
Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1087–1096

Download references

Funding

This work was funded by the Key Research and Development Program of Yunnan Province (202102AA100021);the National Natural Science Foundation of China (Grant Nos.62066048);Demonstration project of comprehensive government management and large-scale industrial application of the major special project of CHEOS: 89-Y50G31-9001-22/23;the Science Foundation of Yunnan Province(202101AT070167) and supported by a grant from Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province; This work has also been supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province under Grant No. 2020SE407.

Author information

Authors and Affiliations

Engineering Research Center of Cyberspace, School of Software, Yunnan University, 650504, Kunming, Yunnan, China
Na Zhao, Yaofei Ma & Xiaopeng Li
Key Laboratory in Software Engineering of Yunnan Province, Yunnan University, 650091, Kunming, People’s Republic of China
Na Zhao, Yaofei Ma & Xiaopeng Li
Institute of Management of Technology, National Yang Ming Chiao Tung University, 300, Hsinchu, Taiwan
Shin-Jye Lee
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, 650504, Kunming, People’s Republic of China
Jian Wang

Authors

Na Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yaofei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Shin-Jye Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Na Zhao is the proposer of the idea of the paper and the author of the paper. Yaofei Ma is the one who realized the experiments of the paper. Xiaopeng Li is responsible for the layout of the thesis and drawing the experimental pictures. Shin-Jye Lee is the supervisor and director of the thesis work. Jian Wang is the retoucher of the thesis and the provider of the experimental equipment.

Corresponding authors

Correspondence to Shin-Jye Lee or Jian Wang.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability statement

All datasets used in the paper are publicly available.

Code availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Ethics approval

Not applicable.

Consent to participate

All image data in the paper are from publicly available datasets, and the participants in this figure have signed a "Consent for publication" form and have agreed to have their images published in public datasets. Details of these image datasets, including descriptions and links, are described as detailed in the text. All datasets used in the paper are publicly available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

To facilitate readers’ better understanding of the formulas employed in the paper, this paper has provided a comprehensive description of all symbols utilized in the formulas within Table 7.

Table 7 Table of symbols

Full size table

The derivation formula for converting 6D rotation matrix to rotation matrix is as follows:

$$\begin{aligned} f_{\text{ map } }\left( \left[ \begin{array}{cc} \mid &{} \mid \\ \alpha _1 &{} \alpha _2 \\ \mid &{} \mid \end{array}\right] \right) =\left[ \begin{array}{ccc} \mid &{} \mid &{} \mid \\ \beta _1 &{} \beta _2 &{} \beta _3 \\ \mid &{} \mid &{} \mid \end{array}\right] \end{aligned}$$

(12)

The last column vector can be obtained from the first two column vectors by calculation. This work refers to the method in [40] and uses Gram–Schmidt orthogonalization for the two-column vectors in the 6D rotation representation matrix to obtain an orthogonal matrix that satisfies the constraints.

$$\begin{aligned} \begin{gathered} \beta _1=\frac{\alpha _1}{\left\| \alpha _1\right\| } \\ \beta _2=\frac{\gamma _2}{\left\| \gamma _2\right\| }, \gamma _2=\alpha _2-\left( \beta _1 \cdot \alpha _2\right) \beta _1 \\ \beta _3=\beta _1 \times \beta _2 \end{gathered} \end{aligned}$$

(13)

The derivation process of the rotation matrix R is as follows:

$$\begin{aligned} \begin{aligned}&R_x(\alpha )=\left[ \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} \cos \alpha &{} -\sin \alpha \\ 0 &{} \sin \alpha &{} \cos \alpha \end{array}\right] \\ \end{aligned} \end{aligned}$$

(14)

$$\begin{aligned} \begin{aligned}&R_y(\beta )=\left[ \begin{array}{ccc} \cos \beta &{} 0 &{} \sin \beta \\ 0 &{} 1 &{} 0 \\ -\sin \beta &{} 0 &{} \cos \beta \end{array}\right] \\ \end{aligned} \end{aligned}$$

(15)

$$\begin{aligned} \begin{aligned}&R_z(\gamma )=\left[ \begin{array}{ccc} \cos \gamma &{} -\sin \gamma &{} 0 \\ \sin \gamma &{} \cos \gamma &{} 0 \\ 0 &{} 0 &{} 1 \end{array}\right] \end{aligned} \end{aligned}$$

(16)

The rotation matrix is obtained by rotating according to the X-Y-Z axis:

$$\begin{aligned} \begin{gathered} R=R_z(\gamma ) * R_y(\beta ) * R_x(\alpha ) \\ =\left[ \begin{array}{ccc} \cos \gamma \cos \beta &{} \cos \gamma \sin \beta \sin \alpha -\sin \gamma \cos \alpha &{} \cos \gamma \sin \beta \cos \alpha +\sin \gamma \sin \alpha \\ \sin \gamma \cos \beta &{} \sin \gamma \sin \beta \sin \alpha +\cos \gamma \cos \alpha &{} \sin \gamma \sin \beta \cos \alpha -\cos \gamma \sin \alpha \\ -\sin \beta &{} \cos \beta \sin \alpha &{} \cos \beta \cos \alpha \end{array}\right] \end{gathered} \end{aligned}$$

(17)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, N., Ma, Y., Li, X. et al. 6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression. Multimed Tools Appl 83, 68605–68624 (2024). https://doi.org/10.1007/s11042-023-17731-6

Download citation

Received: 12 April 2023
Revised: 04 November 2023
Accepted: 24 November 2023
Published: 26 January 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s11042-023-17731-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

6DoF Head Pose Estimation Through Explicit Bidirectional Interaction with Face Geometry

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Data Availability statement

Code availability

Ethics approval

Consent to participate

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now