Abstract
Head pose estimation methods can be generally classified into two categories: model-based and appearance-based methods. The model-based approach relies on facial landmarks for three-dimensional reconstruction, aiming to achieve high-precision results. However, this method is heavily dependent on the accuracy of these landmarks. The appearance-based approach utilizes images as input and employs feature extraction and calculations to generate outcomes. While the appearance-based method boasts greater robustness, its accuracy falls short of the former. In this paper, a new and effective hybrid method is proposed. This hybrid approach combines the strengths of both methods. Unlike the conventional model-based methods, the proposed method regards the facial landmarks in 2D images as a sequence of neural network inputs and then obtains the head pose estimation results for users by neural network regression. The proposed method solves the fuzzy rotation labeling problem by using a rotation matrix representation, introducing a 6D rotation matrix representation as an intermediate state of the rotation matrix to achieve effective direct regression. Introducing face processing enhances the robustness of the model in cross-dataset scenarios. The proposed method achieves remarkable results based on imprecise face recognition and a simplistic model. The proposed method can be divided into three parts. First, the proposed method applies face processing on the input image; second, the method detects facial landmarks; and third, it converts these facial landmarks into sequences and obtains the 6D rotation representation of the head pose by regression. Extensive experiments on the publicly available BIWI, PRIMA, and DrivFace datasets show that this method is functional and performs better than other state-of-the-art methods. Compared to other methods, this approach demonstrates an average performance improvement of at least 10% across the dataset.







Similar content being viewed by others
References
Murphy-Chutorian E, Trivedi MM (2010) Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans Intell Transp Systems 11(2):300–311
Strazdas D, Hintz J, Al-Hamadi A (2021) Robo-HUD: interaction concept for contactless operation of industrial cobotic systems. Appl Sci 11(12):5366
Murphy-Chutorian E, Doshi A, Trivedi MM (2007) Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: 2007 IEEE Intell Transp Syst Conf, pp 709–714 IEEE
Kim H, Lee S-H, Sohn M-K, Kim D-J, Ryu N (2014) Head pose estimation based on random forests with binary pattern run length matrix. In: Advances in computer science and its applications: CSA 2013, pp 255–260 Springer
Khan K, Khan RU, Leonardi R, Migliorati P, Benini S (2021) Head pose estimation: a survey of the last ten years. Signal Process Image Commun 99:116479
Narayanan A, Kaimal RM, Bijlani K (2016) Estimation of driver head yaw angle using a generic geometric model. IEEE Trans Intell Transp Syst 17(12):3446–3460
Werner P, Saxen F, Al-Hamadi A (2017) Landmark based head pose estimation benchmark and method. In: 2017 IEEE international conference on image processing (ICIP), pp 3909–3913
Barros JMD, Garcia F, Mirbach B, Varanasi K, Stricker D (2018) Combined framework for real-time head pose estimation using facial landmark detection and salient feature tracking. In: VISIGRAPP (5: VISAPP), pp 123–133
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18
Malek S, Rossi S (2021) Head pose estimation using facial-landmarks classification for children rehabilitation games. Pattern Recognit Lett 152:406–412
Ma B, Huang R, Qin L (2015) VoD: a novel image representation for head yaw estimation. Neurocomputing 148:455–466
Jain V, Crowley JL (2013) Head pose estimation using multi-scale gaussian derivatives. In: Scandinavian conference on image analysis, pp 319–328 Springer
Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG net workshop on visual observation of deictic gestures, vol 6, p 7 Citeseer
Zhou Y, Gregson J (2020) WHENet: real-time fine-grained estimation for wide range head pose. arXiv:2005.10353
Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083
Lu J, Tan Y-P (2012) Ordinary preserving manifold analysis for human age and head pose estimation. IEEE Trans Hum Mach Syst 43(2):249–258
Diaz-Chito K, Del Rincon JM, Hernández-Sabaté A, Gil D (2018) Continuous head pose estimation using manifold subspace embedding and multivariate regression. IEEE Access 6:18325–18334
Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter conference on applications of computer vision (WACV), pp 1–10
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Cai Y, Yang M-L, Li J (2015) Multiclass classification based on a deep convolutional network for head pose estimation. Front Inf Technol Electr Eng 16(11):930–939
Hsu H-W, Wu T-Y, Wan S, Wong WH, Lee C-Y (2018) QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans Multimedia 21(4):1035–1046
Wang Y, Liang W, Shen J, Jia Y, Yu L-F (2019) A deep coarse-to-fine network for head pose estimation from synthetic data. Pattern Recognit 94:196–206
Mbouna RO, Kong SG, Chun M-G (2013) Visual analysis of eye state and head pose for driver alertness monitoring. IEEE Trans Intell Transp Syst 14(3):1462–1469
Wang H, Davoine F, Lepetit V, Chaillou C, Pan C (2012) 3-D head tracking via invariant keypoint learning. IEEE Trans Circuits Syst Video Technol 22(8):1113–1126
Ji Q (2002) 3D face pose estimation and tracking from a monocular camera. Image Vis Comput 20(7):499–511
Nikolaidis A, Pitas I (2000) Facial feature extraction and pose determination. Pattern Recognit 33(11):1783–1791
Valenti R, Sebe N, Gevers T (2011) Combining head pose and eye location information for gaze estimation. IEEE Trans Image Process 21(2):802–815
Drouard V, Horaud R, Deleforge A, Ba S, Evangelidis G (2017) Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans Image Process 26(3):1428–1440
Asthana A, Zafeiriou S, Cheng S, Pantic M (2013) Robust discriminative response map fitting with constrained local models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3444–3451
Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 146–155
Xia J, Cao L, Zhang G, Liao J (2019) Head pose estimation in the wild assisted by facial landmarks based on convolutional neural networks. IEEE Access 7:48470–48483
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874
Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2D & 3D face alignment problem?(and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE international conference on computer vision, pp 1021–1030
Liu L, Ke Z, Huo J, Chen J (2021) Head pose estimation through keypoints matching between reconstructed 3D face model and 2D image. Sensors 21(5):1841
Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. CVPR 2011:617–624
Diaz-Chito K, Hernández-Sabaté A, López AM (2016) A reduced feature set for driver head pose estimation. Appl Soft Comput 45:98–107
Hemingway EG, ÓReilly OM (2018) Perspectives on euler angle singularities, gimbal lock, and the orthogonality of applied forces and applied moments. Multibody Syst Dyn 44:31–56
Hempel T, Abdelrahman AA, Al-Hamadi A (2022) 6D rotation representation for unconstrained head pose estimation. arXiv:2202.12555
Saxena A, Driemeyer J, Ng AY (2009) Learning 3-D object orientation from images. In: 2009 IEEE international conference on robotics and automation, pp 794–800
Zhou Y, Barnes C, Lu J, Yang J, Li H (2019) On the continuity of rotation representations in neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5745–5753
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zhang H, Wang M, Liu Y, Yuan Y (2020) FDN: feature decoupling network for head pose estimation. Proc AAAI Conf Artif Intell 34:12789–12796
Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1087–1096
Funding
This work was funded by the Key Research and Development Program of Yunnan Province (202102AA100021);the National Natural Science Foundation of China (Grant Nos.62066048);Demonstration project of comprehensive government management and large-scale industrial application of the major special project of CHEOS: 89-Y50G31-9001-22/23;the Science Foundation of Yunnan Province(202101AT070167) and supported by a grant from Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province; This work has also been supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province under Grant No. 2020SE407.
Author information
Authors and Affiliations
Contributions
Na Zhao is the proposer of the idea of the paper and the author of the paper. Yaofei Ma is the one who realized the experiments of the paper. Xiaopeng Li is responsible for the layout of the thesis and drawing the experimental pictures. Shin-Jye Lee is the supervisor and director of the thesis work. Jian Wang is the retoucher of the thesis and the provider of the experimental equipment.
Corresponding authors
Ethics declarations
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability statement
All datasets used in the paper are publicly available.
Code availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Ethics approval
Not applicable.
Consent to participate
All image data in the paper are from publicly available datasets, and the participants in this figure have signed a "Consent for publication" form and have agreed to have their images published in public datasets. Details of these image datasets, including descriptions and links, are described as detailed in the text. All datasets used in the paper are publicly available.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
To facilitate readers’ better understanding of the formulas employed in the paper, this paper has provided a comprehensive description of all symbols utilized in the formulas within Table 7.
The derivation formula for converting 6D rotation matrix to rotation matrix is as follows:
The last column vector can be obtained from the first two column vectors by calculation. This work refers to the method in [40] and uses Gram–Schmidt orthogonalization for the two-column vectors in the 6D rotation representation matrix to obtain an orthogonal matrix that satisfies the constraints.
The derivation process of the rotation matrix R is as follows:
The rotation matrix is obtained by rotating according to the X-Y-Z axis:
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, N., Ma, Y., Li, X. et al. 6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression. Multimed Tools Appl 83, 68605–68624 (2024). https://doi.org/10.1007/s11042-023-17731-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17731-6