research-article

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

Authors:
Yingjie Chen

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Diqi Chen

Intelligent Computing Research Center, Advanced Institute of Information Technology (AIIT), Peking University, Hangzhou, China

Intelligent Computing Research Center, Advanced Institute of Information Technology (AIIT), Peking University, Hangzhou, China
View Profile

,
Yizhou Wang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Tao Wang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Yun Liang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 1029–1037https://doi.org/10.1145/3474085.3475295

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1029–1037

ABSTRACT

Facial action unit (AU) recognition has attracted increasing attention due to its indispensable role in affective computing, especially in the field of affective human-computer interaction. Due to the subtle and transient nature of AU, it is challenging to capture the delicate and ambiguous motions in local facial regions among consecutive frames. Considering that context is essential to resolve ambiguity in human visual system, modeling context within or among facial images emerges as a promising approach for AU recognition task. To this end, we propose CaFGraph, a novel context-aware facial multi-graph that can model both morphological & muscular-based region-level local context and region-level temporal context. CaFGraph is the first work to construct a universal facial multi-graph structure that is independent of both task settings and dataset statistics for almost all fine-grained facial behavior analysis tasks, including but not limited to AU recognition. To make full use of the context, we then present CaFNet that learns context-aware facial graph representations via CaFGraph from facial images for multi-label AU recognition. Experiments on two widely used benchmark datasets, BP4D and DISFA, demonstrate the superiority of our CaFNet over the state-of-the-art methods.

Supplemental Material

MM21-fp709.mp4

mp4

355.9 MB

Download

References

Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, Vol. 14, 2 (1982), 143--177.Google Scholar
Mina Bishay and Ioannis Patras. 2017. Fusing multilabel deep networks for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 681--688.Google ScholarCross Ref
Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianming Zheng. 2020. GeoConv: Geodesic guided convolution for facial action unit recognition. arXiv preprint arXiv:2003.03055 (2020).Google Scholar
Yingjie Chen, Tao Wang, Han Wu, and Yizhou Wang. 2018a. A fast and accurate multi-model facial expression recognition method for affective intelligent robots. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE, 319--324.Google ScholarCross Ref
Yingjie Chen, Han Wu, Tao Wang, and Yizhou Wang. 2018b. A Comparison of Methods of Facial Expression Recognition. In 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA). IEEE, 261--268.Google ScholarCross Ref
Wen-Sheng Chu, Fernando De la Torre, and Jeffery F Cohn. 2013. Selective transfer machine for personalized facial action unit detection. 3515--3522. Google ScholarDigital Library
Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. 298--313.Google Scholar
P. Ekman and W. Friesen. 1978. Facial action coding system: A technique for the measurement of facial movement .Google Scholar
S. Eleftheriadis, O. Rudovic, and M. Pantic. 2015. Multi-conditional latent variable model for joint facial action unit detection. Google ScholarDigital Library
Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental boosting convolutional neural network for facial action unit recognition. 109--117. Google ScholarDigital Library
Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition.Google Scholar
Jun He, Dongliang Li, Bin Yang, Siming Cao, Bo Sun, and Lejun Yu. 2017. Multi view facial action unit detection based on CNN and BLSTM-RNN.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition.Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456. Google ScholarDigital Library
Bihan Jiang, Michel F Valstar, and Maja Pantic. 2011. Action unit detection using sparse appearance descriptors in space-time video volumes. IEEE, 314--321.Google Scholar
Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, Vol. 10, Jul (2009), 1755--1758. Google ScholarDigital Library
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. 2008. Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recognition, Vol. 41, 3 (2008), 833--851. Google ScholarDigital Library
Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition, Vol. 33. 8594--8601.Google Scholar
Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing.Google Scholar
Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection.Google Scholar
Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.Google ScholarDigital Library
S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. DISFA: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing (2013). Google ScholarDigital Library
Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019. Local relationship learning with person-specific shape regularization for facial action unit detection. 11917--11926.Google Scholar
Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.Google Scholar
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop .Google Scholar
Yanting Pei, Yaping Huang, Qi Zou, Xingyuan Zhang, and Song Wang. 2019. Effects of image degradation and degradation removal to cnn-based image classification. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarCross Ref
Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep Disturbance-Disentangled Learning for Facial Expression Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841. Google ScholarDigital Library
Nishant Sankaran, Deen Dayal Mohan, Srirangaraj Setlur, Venugopal Govindaraju, and Dennis Fedorishin. 2019. Representation learning through cross-modality supervision. IEEE, 1--8.Google Scholar
Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment.Google Scholar
Zhiwen Shao, Lixin Zou, Jianfei Cai, Yunsheng Wu, and Lizhuang Ma. 2020. Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection. arXiv preprint arXiv:2001.01168 (2020).Google Scholar
Michel Valstar and Maja Pantic. 2006. Fully automatic facial action unit detection and temporal analysis. In Conference on Computer Vision and Pattern Recognition Workshop. IEEE, 149--149. Google ScholarDigital Library
Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310. Google ScholarDigital Library
Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the Frequency Domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1740--1749.Google ScholarCross Ref
Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, and Shiliang Pu. 2021. Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition. arXiv preprint arXiv:2102.12154 (2021).Google Scholar
Huiyuan Yang, Taoyue Wang, and Lijun Yin. 2020. Adaptive Multimodal Fusion for Facial Action Units Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2982--2990. Google ScholarDigital Library
Uldis Zarins. 2018. Anatomy of Facial Expressions .Exonicus, Incorporated.Google Scholar
X. Zhang and M. H. Mahoor. 2014. Simultaneous detection of multiple facial action units via hierarchical task structure learning. In Proceedings of International Conference on Pattern Recognition . Google ScholarDigital Library
Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. BP4D-spontaneous: A high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.Google ScholarCross Ref
Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection.Google Scholar
Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, and Honggang Zhang. 2015. Joint patch and multi-label learning for facial action unit detection. 2207--2216.Google Scholar
Linyi Zhou, Xijian Fan, Yingjie Ma, Tardi Tjahjadi, and Qiaolin Ye. 2020. Uncertainty-aware Cross-dataset Facial Expression Recognition via Regularized Conditional Alignment. In Proceedings of the 28th ACM International Conference on Multimedia. 2964--2972. Google ScholarDigital Library
Junjie Zhu, Bingjun Luo, Sicheng Zhao, Shihui Ying, Xibin Zhao, and Yue Gao. 2020. IExpressNet: Facial Expression Recognition with Incremental Classes. In Proceedings of the 28th ACM International Conference on Multimedia. 2899--2908. Google ScholarDigital Library

Index Terms

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Recognizing action units for facial expression analysis
Multimodal interface for human-machine communication

Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more ...
Read More
Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity
FGR '02: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition

Previous work suggests that Gabor-wavelet-based methods can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) and single action units (AUs) of the Facial Action Coding System (FACS). This paper evaluates a ...
Read More
Recognizing Action Units for Facial Expression Analysis

Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
context-aware
facial action unit
multi-graph representation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 338
  Total Downloads
- Downloads (Last 12 months)89
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Recognizing action units for facial expression analysis

Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity

Recognizing Action Units for Facial Expression Analysis