skip to main content
10.1145/3474085.3475417acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation

Published: 17 October 2021 Publication History

Abstract

Head pose estimation is a crucial problem that involves the prediction of the Euler angles of a human head in an image. Previous approaches predict head poses through landmarks detection, which can be applied to multiple downstream tasks. However, previous landmark-based methods can not achieve comparable performance to the current landmark-free methods due to lack of modeling the complex nonlinear relationships between the geometric distribution of landmarks and head poses. Another reason for the performance bottleneck is that there exists biased underlying distribution of the 3D pose angles in the current head pose benchmarks. In this work, we propose OsGG-Net, a One-step Graph Generation Network for estimating head poses from a single image by generating a landmark-connection graph to model the 3D angle associated with the landmark distribution robustly. To further ease the angle-biased issues caused by the biased data distribution in learning the graph structure, we propose the UnBiased Head Pose Dataset, called UBHPD, and a new unbiased metric, namely UBMAE, for unbiased head pose estimation. We conduct extensive experiments on various benchmarks and UBHPD where our method achieves the state-of-the-art results in terms of the commonly-used MAE metric and our proposed UBMAE. Comprehensive ablation studies also demonstrate the effectiveness of each part in our approach.

Supplementary Material

ZIP File (mfp1450aux.zip)
This supplementary material provides detailed ablation study on the number of landmarks and more visualization results.

References

[1]
Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, and Tal Hassner. 2021. img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7617--7627.
[2]
Mikel Ariz, José J. Bengoechea, Arantxa Villanueva, and Rafael Cabeza. 2016. A novel 2D/3D database with automatic face annotation for head tracking and pose estimation. Computer Vision and Image Understanding, Vol. 148 (2016), 201--210.
[3]
Michael D. Breitenstein, Daniel Kuettel, Thibaut Weise, Luc van Gool, and Hanspeter Pfister. 2008. Real-time face pose estimation from single range images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
[4]
Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d and 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1021--1030.
[5]
Zhiwen Cao, Zongcheng Chu, Dongfang Liu, and Yingjie Chen. 2021. A Vector-Based Representation to Enhance Head Pose Estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 1188--1197.
[6]
Gabriele Fanelli, Matthias Dantone, Juergen Gall, Andrea Fossati, and Luc Van Gool. 2013. Random forests for real time 3d face analysis. International Journal of Computer Vision, Vol. 101, 3 (2013), 437--458.
[7]
Jinwei Gu, Xiaodong Yang, Shalini De Mello, and Jan Kautz. 2017. Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1531--1540.
[8]
Byungok Han, Woo-Han Yun, Jang-Hee Yoo, and Won Hwa Kim. 2020. Toward Unbiased Facial Expression Recognition in the Wild via Cross-Dataset Adaptation. IEEE Access, Vol. 8 (2020), 159172--159181.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[10]
Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee. 2019. QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss. IEEE Transactions on Multimedia, Vol. 21, 4 (2019), 1035--1046.
[11]
Bin Huang, Renwen Chen, Wang Xu, and Qinbang Zhou. 2020. Improving head pose estimation using two-stage ensembles with top-k regression. Image and Vision Computing, Vol. 93 (2020), 103827.
[12]
Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1867--1874.
[13]
Daesik Kim, YoungJoon Yoo, Jeesoo Kim, Sangkuk Lee, and Nojun Kwak. 2018. Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4167--4175.
[14]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[15]
Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. In Proceedings of the Advances in Neural Information Processing Systems Workshops (NeurIPSW) .
[16]
Martin Köstinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW). 2144--2151.
[17]
Amit Kumar, Azadeh Alavi, and Rama Chellappa. 2017. KEPLER: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG). 258--265.
[18]
Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K Duvenaud, Raquel Urtasun, and Richard Zemel. 2019. Efficient Graph Generation with Graph Recurrent Attention Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) .
[19]
Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander Gaunt. 2018. Constrained Graph Variational Autoencoders for Molecule Design. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) .
[20]
Zhaoxiang Liu, Zezhou Chen, Jinqiang Bai, Shaohua Li, and Shiguo Lian. 2019. Facial Pose Estimation by Deep Learning from Label Distributions. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 1232--1240.
[21]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3431--3440.
[22]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) .
[23]
Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2019. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 1 (2019), 121--135.
[24]
Min Ren, Yunlong Wang, Zhenan Sun, and Tieniu Tan. 2020. Dynamic Graph Representation for Occlusion Handling in Biometrics. Proceedings of the AAAI Conference on Artificial Intelligence, 11940--11947.
[25]
Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2155--215509.
[26]
Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW). 397--403.
[27]
Martin Simonovsky and Nikos Komodakis. 2018. GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders. arXiv preprint arXiv:1802.03480 (2018).
[28]
Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. 2018. Integral Human Pose Regression. In Proceedings of the European Conference on Computer Vision (ECCV). 536--553.
[29]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.
[30]
Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1521--1528.
[31]
Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2016. WIDER FACE: A Face Detection Benchmark. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5525--5533.
[32]
Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, and Yung-Yu Chuang. 2019. FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1087--1096.
[33]
Tsun-Yi Yang, Yi-Hsuan Huang, Yen-Yu Lin, Pi-Cheng Hsiu, and Yung-Yu Chuang. 2018. SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation. In Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI). 1078--1084.
[34]
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. LIFT: Learned Invariant Feature Transform. In Proceedings of the European Conference on Computer Vision (ECCV). 467--483.
[35]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, Vol. 23, 10 (2016), 1499--1503.
[36]
Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face Alignment Across Large Poses: A 3D Solution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 146--155.

Cited By

View all
  • (2024)HeadDiff: Exploring Rotation Uncertainty With Diffusion Models for Head Pose EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.337245733(1868-1882)Online publication date: 2024
  • (2024)Language-Driven Ordinal Learning for Imbalanced Head Pose EstimationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448404(4495-4499)Online publication date: 14-Apr-2024
  • (2024)Real-time 6DoF full-range markerless head pose estimation▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122293239:COnline publication date: 17-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
© 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph generation
  2. head pose estimation
  3. unbiased datasets

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HeadDiff: Exploring Rotation Uncertainty With Diffusion Models for Head Pose EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.337245733(1868-1882)Online publication date: 2024
  • (2024)Language-Driven Ordinal Learning for Imbalanced Head Pose EstimationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448404(4495-4499)Online publication date: 14-Apr-2024
  • (2024)Real-time 6DoF full-range markerless head pose estimation▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122293239:COnline publication date: 17-Apr-2024
  • (2024)Deep learning and machine learning techniques for head pose estimation: a surveyArtificial Intelligence Review10.1007/s10462-024-10936-757:10Online publication date: 12-Sep-2024
  • (2023)Constructing Spatio-Temporal Graphs for Face Forgery DetectionACM Transactions on the Web10.1145/358051217:3(1-25)Online publication date: 22-May-2023
  • (2023)Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head PoseInternational Journal of Computer Vision10.1007/s11263-023-01935-2132:4(1242-1257)Online publication date: 6-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media