Abstract
The task of detecting facial action units (AU) often utilizes discrete expression categories, such as Angry, Disgust, and Happy, as auxiliary information to enhance performance. However, these categories are unable to capture the subtle transformations of AUs. Additionally, existing works suffer from overfitting due to the limited availability of AU datasets. This paper proposes a novel fine-grained global expression representation encoder to capture continuous and subtle global facial expressions and improve AU detection. The facial expression representation effectively reduces overfitting by isolating facial expressions from other factors such as identity, background, head pose, and illumination. To further address overfitting, a local AU features module transforms the global expression representation into local facial features for each AU. Finally, the local AU features are fed into an AU classifier to determine the occurrence of each AU. Our proposed method outperforms previous works and achieves state-of-the-art performances on both in-the-lab and in-the-wild datasets. This is in contrast to most existing works that only focus on in-the-lab datasets. Our method specifically addresses the issue of overfitting from limited data, which contributes to its superior performance.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen J, Wang C, Wang K et al (2022) Lightweight network architecture using difference saliency maps for facial action unit detection. App Intell 1–22
Chen Y, Song G, Shao Z et al (2022) Geoconv: geodesic guided convolution for facial action unit recognition. Pattern Recogn 122:108–355
Chen ZM, Wei XS, Wang P et al (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5177–5186
Choi Y, Uh Y, Yoo J et al (2020) Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8188–8197
Cui Z, Song T, Wang Y et al (2020) Knowledge augmented deep neural networks for joint facial expression and action unit recognition. Adv Neural Inf Process Syst 33
Ekman P, Friesen W (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press Palo Alto 12
Ertugrul IÖ, Jeni LA, Cohn JF (2019) Pattnet: patch-attentive deep network for action unit detection. In: BMVC, p 114
Geng Z, Cao C, Tulyakov S (2019) 3d guided fine-grained face manipulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9821–9830
He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu X, Zhi R, Zhou C (2023) Drop-relationship learning for semi-supervised facial action unit recognition. Neurocomputing p 126361
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision, pp 1501–1510
Jacob GM, Stenger B (2021) Facial action unit detection with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7680–7689
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25
Li G, Zhu X, Zeng Y et al (2019) Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI conference on artificial intelligence, pp 8594–8601
Li L, Wang S, Zhang Z et al (2021) Write-a-speaker: text-based emotional and rhythmic talking-head generation. In: Proceedings of the AAAI conference on artificial intelligence, pp 1911–1920
Li W, Abtahi F, Zhu Z et al (2018) Eac-net: deep nets with enhancing and cropping for facial action unit detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(11):2583–2596
Liu M, Li S, Shan S et al (2015) Au-inspired deep networks for facial expression feature learning. Neurocomputing 159:126–136
Liu S, Wang H (2023) Talking face generation via facial anatomy. ACM Trans Multimedia Comput Commun Appl 19(3)
Luo C, Song S, Xie W et al (2022) Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. In: Raedt LD (ed) Proceedings of international joint conference on artificial intelligence, pp 1239–1246
Ma C, Chen L, Yong J (2019) Au r-cnn: encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47
Mavadati SM, Mahoor MH, Bartlett K et al (2013) Disfa: a spontaneous facial action intensity database. IEEE Trans Affect Comput 4(2):151–160
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Niu X, Han H, Yang S et al (2019) Local relationship learning with person-specific shape regularization for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11,917–11,926
Onal Ertugrul I, Yang L, Jeni LA et al (2019) D-pattnet: dynamic patch-attentive deep network for action unit detection. Frontiers in Computer Science 1:11
Pantic M, Rothkrantz L (2004) Facial action recognition for facial expression analysis from static face images. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 34:1449–1461
Paysan P, Knothe R, Amberg B et al (2009) A 3d face model for pose and illumination invariant face recognition. In: IEEE international conference on advanced video and signal based surveillance, pp 296–301
Rubinow DR, Post RM (1992) Impaired recognition of affect in facial expression in depressed patients. Biological psychiatry 31(9):947–953
Shang Z, Du C, Li B et al (2023) Mma-net: multi-view mixed attention mechanism for facial action unit detection. Pattern Recognition Letters
Shao Z, Liu Z, Cai J et al (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European conference on computer vision (ECCV), pp 705–720
Shao Z, Liu Z, Cai J et al (2019) Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing
Shao Z, Liu Z, Cai J et al (2021) Jaa-net: joint facial action unit detection and face alignment via adaptive attention. International Journal of Computer Vision 129(2):321–340
Song W, Shi S, Dong Y et al (2022) Heterogeneous spatio-temporal relation learning network for facial action unit detection. Pattern Recognition Letters 164:268–275
Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022
Vemulapalli R, Agarwala A (2019) A compact embedding for facial expression similarity. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5683–5692
Wang S, Peng G (2019) Weakly supervised dual learning for facial action unit recognition. IEEE Transactions on Multimedia 21(12):3218–3230
Wang S, Chang Y, Wang C (2021) Dual learning for joint facial landmark detection and action unit recognition. IEEE Transactions on Affective Computing
Xiang X, Tran TD (2017) Linear disentangled representation learning for facial actions. IEEE Transactions on Circuits and Systems for Video Technology 28(12):3539–3544
Yan J, Wang J, Li Q et al (2022) Weakly supervised regional and temporal learning for facial action unit recognition. IEEE Transactions on Multimedia
Yan J, Wang J, Li Q et al (2022) Weakly supervised regional and temporal learning for facial action unit recognition. IEEE Transactions on Multimedia pp 1–1
Yan W, Li S, Que C et al (2020) Raf-au database: in-the-wild facial expressions with subjective emotion judgement and objective au annotations. In: Proceedings of the Asian Conference on Computer Vision (ACCV)
Yang B, Wu J, Ikeda K et al (2023) Deep learning pipeline for spotting macro-and micro-expressions in long video sequences based on action units and optical flow. Pattern Recogn Lett 165:63–74
Yang H, Yin L, Zhou Y et al (2021) Exploiting semantic embedding and visual feature for facial action unit detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,482–10,491
Yang L, Ertugrul IO, Cohn JF et al (2019) Facs3d-net: 3d convolution based spatiotemporal representation for action unit detection. In: 2019 8th International conference on affective computing and intelligent interaction (ACII), pp 538–544
Yao G, Yuan Y, Shao T et al (2021) One-shot face reenactment using appearance adaptive normalization. In: Proceedings of the AAAI conference on artificial intelligence, pp 3172–3180
You R, Guo Z, Cui L et al (2020) Cross-modality attention with semantic graph embedding for multi-label classification. In: Proceedings of the AAAI conference on artificial intelligence, pp 12,709–12,716
Zhang W, Ji X, Chen K et al (2021) Learning a facial expression embedding disentangled from identity. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6759–6768
Zhang X, Yin L, Cohn JF et al (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing 32(10):692–706
Zhang Z, Girard JM, Wu Y et al (2016) Multimodal spontaneous emotion corpus for human behavior analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3438–3446
Zhao K, Chu WS, De la Torre F et al (2015) Joint patch and multi-label learning for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2207–2216
Zhao K, Chu WS, Martinez AM (2018) Learning facial action units from web images with scalable weakly supervised clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2090–2099
Zhi R, Liu M, Zhang D (2020) A comprehensive survey on automatic facial action unit analysis. The Visual Computer 36(5):1067–1093
Zhong L, Liu Q, Yang P et al (2015) Learning multiscale active facial patches for expression analysis. IEEE Transactions on Cybernetics 45(8):1499–1510
Funding
This work is supported by the 2022 Hangzhou Key Science and Technology Innovation Program (No. 2022AIZD0054) and the Key Research and Development Program of Zhejiang Province (No. 2022C01011).
Author information
Authors and Affiliations
Contributions
Conceptualization[Rudong An], [Yu Ding], [Wei Zhang], [Hao Zeng], [Zhigang Deng], [Aobo Jin]; Methodology: [Rudong An], [Wei Zhang], [Hao Zeng], [Yu Ding], [Wei Chen]; Investigation: [Rudong An]; Data curation: [Aobo Jin], [Wei Chen]; Writing-review and editing: [Rudong An],[Wei Zhang], [Hao Zeng], [Yu Ding], [Zhigang Deng], [Aobo Jin], [Wei Chen];
Corresponding author
Ethics declarations
Competing Interests
All authors declare that they have no conflicts of interest.
Ethics approval:
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
An, R., Jin, A., Chen, W. et al. Learning facial expression-aware global-to-local representation for robust action unit detection. Appl Intell 54, 1405–1425 (2024). https://doi.org/10.1007/s10489-023-05154-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05154-7