research-article

OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation

Authors:

Xin MiaoAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2465 - 2473

https://doi.org/10.1145/3474085.3475417

Published: 17 October 2021 Publication History

Abstract

Head pose estimation is a crucial problem that involves the prediction of the Euler angles of a human head in an image. Previous approaches predict head poses through landmarks detection, which can be applied to multiple downstream tasks. However, previous landmark-based methods can not achieve comparable performance to the current landmark-free methods due to lack of modeling the complex nonlinear relationships between the geometric distribution of landmarks and head poses. Another reason for the performance bottleneck is that there exists biased underlying distribution of the 3D pose angles in the current head pose benchmarks. In this work, we propose OsGG-Net, a One-step Graph Generation Network for estimating head poses from a single image by generating a landmark-connection graph to model the 3D angle associated with the landmark distribution robustly. To further ease the angle-biased issues caused by the biased data distribution in learning the graph structure, we propose the UnBiased Head Pose Dataset, called UBHPD, and a new unbiased metric, namely UBMAE, for unbiased head pose estimation. We conduct extensive experiments on various benchmarks and UBHPD where our method achieves the state-of-the-art results in terms of the commonly-used MAE metric and our proposed UBMAE. Comprehensive ablation studies also demonstrate the effectiveness of each part in our approach.

Supplementary Material

ZIP File (mfp1450aux.zip)

This supplementary material provides detailed ablation study on the number of landmarks and more visualization results.

Download
741.85 KB

References

[1]

Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, and Tal Hassner. 2021. img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7617--7627.

[2]

Mikel Ariz, José J. Bengoechea, Arantxa Villanueva, and Rafael Cabeza. 2016. A novel 2D/3D database with automatic face annotation for head tracking and pose estimation. Computer Vision and Image Understanding, Vol. 148 (2016), 201--210.

Digital Library

[3]

Michael D. Breitenstein, Daniel Kuettel, Thibaut Weise, Luc van Gool, and Hanspeter Pfister. 2008. Real-time face pose estimation from single range images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.

[4]

Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d and 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1021--1030.

[5]

Zhiwen Cao, Zongcheng Chu, Dongfang Liu, and Yingjie Chen. 2021. A Vector-Based Representation to Enhance Head Pose Estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 1188--1197.

[6]

Gabriele Fanelli, Matthias Dantone, Juergen Gall, Andrea Fossati, and Luc Van Gool. 2013. Random forests for real time 3d face analysis. International Journal of Computer Vision, Vol. 101, 3 (2013), 437--458.

Digital Library

[7]

Jinwei Gu, Xiaodong Yang, Shalini De Mello, and Jan Kautz. 2017. Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1531--1540.

[8]

Byungok Han, Woo-Han Yun, Jang-Hee Yoo, and Won Hwa Kim. 2020. Toward Unbiased Facial Expression Recognition in the Wild via Cross-Dataset Adaptation. IEEE Access, Vol. 8 (2020), 159172--159181.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[10]

Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee. 2019. QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss. IEEE Transactions on Multimedia, Vol. 21, 4 (2019), 1035--1046.

[11]

Bin Huang, Renwen Chen, Wang Xu, and Qinbang Zhou. 2020. Improving head pose estimation using two-stage ensembles with top-k regression. Image and Vision Computing, Vol. 93 (2020), 103827.

[12]

Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1867--1874.

Digital Library

[13]

Daesik Kim, YoungJoon Yoo, Jeesoo Kim, Sangkuk Lee, and Nojun Kwak. 2018. Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4167--4175.

[14]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[15]

Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. In Proceedings of the Advances in Neural Information Processing Systems Workshops (NeurIPSW) .

[16]

Martin Köstinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW). 2144--2151.

[17]

Amit Kumar, Azadeh Alavi, and Rama Chellappa. 2017. KEPLER: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG). 258--265.

[18]

Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Will Hamilton, David K Duvenaud, Raquel Urtasun, and Richard Zemel. 2019. Efficient Graph Generation with Graph Recurrent Attention Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) .

Digital Library

[19]

Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander Gaunt. 2018. Constrained Graph Variational Autoencoders for Molecule Design. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) .

Digital Library

[20]

Zhaoxiang Liu, Zezhou Chen, Jinqiang Bai, Shaohua Li, and Shiguo Lian. 2019. Facial Pose Estimation by Deep Learning from Label Distributions. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 1232--1240.

[21]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3431--3440.

[22]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) .

Digital Library

[23]

Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2019. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 1 (2019), 121--135.

Digital Library

[24]

Min Ren, Yunlong Wang, Zhenan Sun, and Tieniu Tan. 2020. Dynamic Graph Representation for Occlusion Handling in Biometrics. Proceedings of the AAAI Conference on Artificial Intelligence, 11940--11947.

[25]

Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2155--215509.

[26]

Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW). 397--403.

Digital Library

[27]

Martin Simonovsky and Nikos Komodakis. 2018. GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders. arXiv preprint arXiv:1802.03480 (2018).

[28]

Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. 2018. Integral Human Pose Regression. In Proceedings of the European Conference on Computer Vision (ECCV). 536--553.

[29]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.

[30]

Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1521--1528.

Digital Library

[31]

Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2016. WIDER FACE: A Face Detection Benchmark. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5525--5533.

[32]

Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, and Yung-Yu Chuang. 2019. FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1087--1096.

[33]

Tsun-Yi Yang, Yi-Hsuan Huang, Yen-Yu Lin, Pi-Cheng Hsiu, and Yung-Yu Chuang. 2018. SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation. In Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI). 1078--1084.

Digital Library

[34]

Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. LIFT: Learned Invariant Feature Transform. In Proceedings of the European Conference on Computer Vision (ECCV). 467--483.

[35]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, Vol. 23, 10 (2016), 1499--1503.

[36]

Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face Alignment Across Large Poses: A 3D Solution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 146--155.

Cited By

Wang YLiu HFeng YLi ZWu XZhu C(2024)HeadDiff: Exploring Rotation Uncertainty With Diffusion Models for Head Pose EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.337245733(1868-1882)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3372457
Wang YYu QLin LLi ZLiu H(2024)Language-Driven Ordinal Learning for Imbalanced Head Pose EstimationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448404(4495-4499)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448404
Algabri RShin HLee S(2024)Real-time 6DoF full-range markerless head pose estimation▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122293239:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122293
Show More Cited By

Index Terms

OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
        Biometrics
        Vision for robotics
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Head and Upper Body Pose Estimation in Team Sport Videos
ACPR '13: Proceedings of the 2013 2nd IAPR Asian Conference on Pattern Recognition

We propose a head and upper body pose estimation method in low-resolution team sports videos such as for American Football or Hockey, where all players wear helmets and often lean forward. Compared to the pedestrian cases in surveillance videos, head ...
Head pose estimation for a domestic robot
HRI '11: Proceedings of the 6th international conference on Human-robot interaction

Gaze direction is an important communicative cue. In order to use this cue for human-robot interaction, software needs to be developed that enables the estimation of head pose. We began by designing an application that is able to make a good estimate of ...
Collaborative learning network for head pose estimation
Highlights
- Propose a collaborative learning framework for head pose estimation.
- Learn ...
Abstract
Head pose estimation is an important task in many real-world applications, such as human–computer interaction, driver monitoring, face localization and gaze estimation. In this paper, we present a novel collaborative learning framework ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

© 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CCF-Baidu open fund
National Natural Science Foundation of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
182
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YLiu HFeng YLi ZWu XZhu C(2024)HeadDiff: Exploring Rotation Uncertainty With Diffusion Models for Head Pose EstimationIEEE Transactions on Image Processing10.1109/TIP.2024.337245733(1868-1882)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3372457
Wang YYu QLin LLi ZLiu H(2024)Language-Driven Ordinal Learning for Imbalanced Head Pose EstimationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448404(4495-4499)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448404
Algabri RShin HLee S(2024)Real-time 6DoF full-range markerless head pose estimation▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122293239:COnline publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122293
Algabri RAbdu ALee S(2024)Deep learning and machine learning techniques for head pose estimation: a surveyArtificial Intelligence Review10.1007/s10462-024-10936-757:10Online publication date: 12-Sep-2024
https://doi.org/10.1007/s10462-024-10936-7
Shang ZXie HYu LZha ZZhang Y(2023)Constructing Spatio-Temporal Graphs for Face Forgery DetectionACM Transactions on the Web10.1145/358051217:3(1-25)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1145/3580512
Li YTan GGou C(2023)Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head PoseInternational Journal of Computer Vision10.1007/s11263-023-01935-2132:4(1242-1257)Online publication date: 6-Nov-2023
https://doi.org/10.1007/s11263-023-01935-2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten