ABSTRACT
Facial action unit (AU) recognition has attracted increasing attention due to its indispensable role in affective computing, especially in the field of affective human-computer interaction. Due to the subtle and transient nature of AU, it is challenging to capture the delicate and ambiguous motions in local facial regions among consecutive frames. Considering that context is essential to resolve ambiguity in human visual system, modeling context within or among facial images emerges as a promising approach for AU recognition task. To this end, we propose CaFGraph, a novel context-aware facial multi-graph that can model both morphological & muscular-based region-level local context and region-level temporal context. CaFGraph is the first work to construct a universal facial multi-graph structure that is independent of both task settings and dataset statistics for almost all fine-grained facial behavior analysis tasks, including but not limited to AU recognition. To make full use of the context, we then present CaFNet that learns context-aware facial graph representations via CaFGraph from facial images for multi-label AU recognition. Experiments on two widely used benchmark datasets, BP4D and DISFA, demonstrate the superiority of our CaFNet over the state-of-the-art methods.
Supplemental Material
- Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, Vol. 14, 2 (1982), 143--177.Google Scholar
- Mina Bishay and Ioannis Patras. 2017. Fusing multilabel deep networks for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 681--688.Google ScholarCross Ref
- Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianming Zheng. 2020. GeoConv: Geodesic guided convolution for facial action unit recognition. arXiv preprint arXiv:2003.03055 (2020).Google Scholar
- Yingjie Chen, Tao Wang, Han Wu, and Yizhou Wang. 2018a. A fast and accurate multi-model facial expression recognition method for affective intelligent robots. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE, 319--324.Google ScholarCross Ref
- Yingjie Chen, Han Wu, Tao Wang, and Yizhou Wang. 2018b. A Comparison of Methods of Facial Expression Recognition. In 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA). IEEE, 261--268.Google ScholarCross Ref
- Wen-Sheng Chu, Fernando De la Torre, and Jeffery F Cohn. 2013. Selective transfer machine for personalized facial action unit detection. 3515--3522. Google ScholarDigital Library
- Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. 298--313.Google Scholar
- P. Ekman and W. Friesen. 1978. Facial action coding system: A technique for the measurement of facial movement .Google Scholar
- S. Eleftheriadis, O. Rudovic, and M. Pantic. 2015. Multi-conditional latent variable model for joint facial action unit detection. Google ScholarDigital Library
- Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental boosting convolutional neural network for facial action unit recognition. 109--117. Google ScholarDigital Library
- Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition.Google Scholar
- Jun He, Dongliang Li, Bin Yang, Siming Cao, Bo Sun, and Lejun Yu. 2017. Multi view facial action unit detection based on CNN and BLSTM-RNN.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456. Google ScholarDigital Library
- Bihan Jiang, Michel F Valstar, and Maja Pantic. 2011. Action unit detection using sparse appearance descriptors in space-time video volumes. IEEE, 314--321.Google Scholar
- Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, Vol. 10, Jul (2009), 1755--1758. Google ScholarDigital Library
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. 2008. Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recognition, Vol. 41, 3 (2008), 833--851. Google ScholarDigital Library
- Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition, Vol. 33. 8594--8601.Google Scholar
- Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing.Google Scholar
- Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection.Google Scholar
- Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.Google ScholarDigital Library
- S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. DISFA: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing (2013). Google ScholarDigital Library
- Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019. Local relationship learning with person-specific shape regularization for facial action unit detection. 11917--11926.Google Scholar
- Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop .Google Scholar
- Yanting Pei, Yaping Huang, Qi Zou, Xingyuan Zhang, and Song Wang. 2019. Effects of image degradation and degradation removal to cnn-based image classification. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarCross Ref
- Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep Disturbance-Disentangled Learning for Facial Expression Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841. Google ScholarDigital Library
- Nishant Sankaran, Deen Dayal Mohan, Srirangaraj Setlur, Venugopal Govindaraju, and Dennis Fedorishin. 2019. Representation learning through cross-modality supervision. IEEE, 1--8.Google Scholar
- Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment.Google Scholar
- Zhiwen Shao, Lixin Zou, Jianfei Cai, Yunsheng Wu, and Lizhuang Ma. 2020. Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection. arXiv preprint arXiv:2001.01168 (2020).Google Scholar
- Michel Valstar and Maja Pantic. 2006. Fully automatic facial action unit detection and temporal analysis. In Conference on Computer Vision and Pattern Recognition Workshop. IEEE, 149--149. Google ScholarDigital Library
- Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310. Google ScholarDigital Library
- Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the Frequency Domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1740--1749.Google ScholarCross Ref
- Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, and Shiliang Pu. 2021. Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition. arXiv preprint arXiv:2102.12154 (2021).Google Scholar
- Huiyuan Yang, Taoyue Wang, and Lijun Yin. 2020. Adaptive Multimodal Fusion for Facial Action Units Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2982--2990. Google ScholarDigital Library
- Uldis Zarins. 2018. Anatomy of Facial Expressions .Exonicus, Incorporated.Google Scholar
- X. Zhang and M. H. Mahoor. 2014. Simultaneous detection of multiple facial action units via hierarchical task structure learning. In Proceedings of International Conference on Pattern Recognition . Google ScholarDigital Library
- Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. BP4D-spontaneous: A high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.Google ScholarCross Ref
- Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection.Google Scholar
- Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, and Honggang Zhang. 2015. Joint patch and multi-label learning for facial action unit detection. 2207--2216.Google Scholar
- Linyi Zhou, Xijian Fan, Yingjie Ma, Tardi Tjahjadi, and Qiaolin Ye. 2020. Uncertainty-aware Cross-dataset Facial Expression Recognition via Regularized Conditional Alignment. In Proceedings of the 28th ACM International Conference on Multimedia. 2964--2972. Google ScholarDigital Library
- Junjie Zhu, Bingjun Luo, Sicheng Zhao, Shihui Ying, Xibin Zhao, and Yue Gao. 2020. IExpressNet: Facial Expression Recognition with Incremental Classes. In Proceedings of the 28th ACM International Conference on Multimedia. 2899--2908. Google ScholarDigital Library
Index Terms
- CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition
Recommendations
Recognizing action units for facial expression analysis
Multimodal interface for human-machine communicationMost automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more ...
Evaluation of Gabor-Wavelet-Based Facial Action Unit Recognition in Image Sequences of Increasing Complexity
FGR '02: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture RecognitionPrevious work suggests that Gabor-wavelet-based methods can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) and single action units (AUs) of the Facial Action Coding System (FACS). This paper evaluates a ...
Recognizing Action Units for Facial Expression Analysis
Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more ...
Comments