Skip to main content
Log in

Learning facial expression-aware global-to-local representation for robust action unit detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The task of detecting facial action units (AU) often utilizes discrete expression categories, such as Angry, Disgust, and Happy, as auxiliary information to enhance performance. However, these categories are unable to capture the subtle transformations of AUs. Additionally, existing works suffer from overfitting due to the limited availability of AU datasets. This paper proposes a novel fine-grained global expression representation encoder to capture continuous and subtle global facial expressions and improve AU detection. The facial expression representation effectively reduces overfitting by isolating facial expressions from other factors such as identity, background, head pose, and illumination. To further address overfitting, a local AU features module transforms the global expression representation into local facial features for each AU. Finally, the local AU features are fed into an AU classifier to determine the occurrence of each AU. Our proposed method outperforms previous works and achieves state-of-the-art performances on both in-the-lab and in-the-wild datasets. This is in contrast to most existing works that only focus on in-the-lab datasets. Our method specifically addresses the issue of overfitting from limited data, which contributes to its superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

All data used in the paper are released and available, including BP4D [49], DISFA [22], BP4D+ [50] and RAF-AU [42].

Notes

  1. https://arxiv.org/pdf/2210.15160v2.pdf

References

  1. Chen J, Wang C, Wang K et al (2022) Lightweight network architecture using difference saliency maps for facial action unit detection. App Intell 1–22

  2. Chen Y, Song G, Shao Z et al (2022) Geoconv: geodesic guided convolution for facial action unit recognition. Pattern Recogn 122:108–355

    Article  Google Scholar 

  3. Chen ZM, Wei XS, Wang P et al (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5177–5186

  4. Choi Y, Uh Y, Yoo J et al (2020) Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8188–8197

  5. Cui Z, Song T, Wang Y et al (2020) Knowledge augmented deep neural networks for joint facial expression and action unit recognition. Adv Neural Inf Process Syst 33

  6. Ekman P, Friesen W (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press Palo Alto 12

  7. Ertugrul IÖ, Jeni LA, Cohn JF (2019) Pattnet: patch-attentive deep network for action unit detection. In: BMVC, p 114

  8. Geng Z, Cao C, Tulyakov S (2019) 3d guided fine-grained face manipulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9821–9830

  9. He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  10. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  11. Hu X, Zhi R, Zhou C (2023) Drop-relationship learning for semi-supervised facial action unit recognition. Neurocomputing p 126361

  12. Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE international conference on computer vision, pp 1501–1510

  13. Jacob GM, Stenger B (2021) Facial action unit detection with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7680–7689

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25

  15. Li G, Zhu X, Zeng Y et al (2019) Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI conference on artificial intelligence, pp 8594–8601

  16. Li L, Wang S, Zhang Z et al (2021) Write-a-speaker: text-based emotional and rhythmic talking-head generation. In: Proceedings of the AAAI conference on artificial intelligence, pp 1911–1920

  17. Li W, Abtahi F, Zhu Z et al (2018) Eac-net: deep nets with enhancing and cropping for facial action unit detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(11):2583–2596

    Article  PubMed  Google Scholar 

  18. Liu M, Li S, Shan S et al (2015) Au-inspired deep networks for facial expression feature learning. Neurocomputing 159:126–136

    Article  Google Scholar 

  19. Liu S, Wang H (2023) Talking face generation via facial anatomy. ACM Trans Multimedia Comput Commun Appl 19(3)

  20. Luo C, Song S, Xie W et al (2022) Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. In: Raedt LD (ed) Proceedings of international joint conference on artificial intelligence, pp 1239–1246

  21. Ma C, Chen L, Yong J (2019) Au r-cnn: encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47

    Article  Google Scholar 

  22. Mavadati SM, Mahoor MH, Bartlett K et al (2013) Disfa: a spontaneous facial action intensity database. IEEE Trans Affect Comput 4(2):151–160

    Article  Google Scholar 

  23. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  24. Niu X, Han H, Yang S et al (2019) Local relationship learning with person-specific shape regularization for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11,917–11,926

  25. Onal Ertugrul I, Yang L, Jeni LA et al (2019) D-pattnet: dynamic patch-attentive deep network for action unit detection. Frontiers in Computer Science 1:11

    Article  Google Scholar 

  26. Pantic M, Rothkrantz L (2004) Facial action recognition for facial expression analysis from static face images. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 34:1449–1461

    Article  Google Scholar 

  27. Paysan P, Knothe R, Amberg B et al (2009) A 3d face model for pose and illumination invariant face recognition. In: IEEE international conference on advanced video and signal based surveillance, pp 296–301

  28. Rubinow DR, Post RM (1992) Impaired recognition of affect in facial expression in depressed patients. Biological psychiatry 31(9):947–953

    Article  CAS  PubMed  Google Scholar 

  29. Shang Z, Du C, Li B et al (2023) Mma-net: multi-view mixed attention mechanism for facial action unit detection. Pattern Recognition Letters

  30. Shao Z, Liu Z, Cai J et al (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European conference on computer vision (ECCV), pp 705–720

  31. Shao Z, Liu Z, Cai J et al (2019) Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing

  32. Shao Z, Liu Z, Cai J et al (2021) Jaa-net: joint facial action unit detection and face alignment via adaptive attention. International Journal of Computer Vision 129(2):321–340

    Article  Google Scholar 

  33. Song W, Shi S, Dong Y et al (2022) Heterogeneous spatio-temporal relation learning network for facial action unit detection. Pattern Recognition Letters 164:268–275

    Article  ADS  Google Scholar 

  34. Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

  35. Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022

  36. Vemulapalli R, Agarwala A (2019) A compact embedding for facial expression similarity. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5683–5692

  37. Wang S, Peng G (2019) Weakly supervised dual learning for facial action unit recognition. IEEE Transactions on Multimedia 21(12):3218–3230

    Article  CAS  Google Scholar 

  38. Wang S, Chang Y, Wang C (2021) Dual learning for joint facial landmark detection and action unit recognition. IEEE Transactions on Affective Computing

  39. Xiang X, Tran TD (2017) Linear disentangled representation learning for facial actions. IEEE Transactions on Circuits and Systems for Video Technology 28(12):3539–3544

    Article  Google Scholar 

  40. Yan J, Wang J, Li Q et al (2022) Weakly supervised regional and temporal learning for facial action unit recognition. IEEE Transactions on Multimedia

  41. Yan J, Wang J, Li Q et al (2022) Weakly supervised regional and temporal learning for facial action unit recognition. IEEE Transactions on Multimedia pp 1–1

  42. Yan W, Li S, Que C et al (2020) Raf-au database: in-the-wild facial expressions with subjective emotion judgement and objective au annotations. In: Proceedings of the Asian Conference on Computer Vision (ACCV)

  43. Yang B, Wu J, Ikeda K et al (2023) Deep learning pipeline for spotting macro-and micro-expressions in long video sequences based on action units and optical flow. Pattern Recogn Lett 165:63–74

    Article  ADS  Google Scholar 

  44. Yang H, Yin L, Zhou Y et al (2021) Exploiting semantic embedding and visual feature for facial action unit detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10,482–10,491

  45. Yang L, Ertugrul IO, Cohn JF et al (2019) Facs3d-net: 3d convolution based spatiotemporal representation for action unit detection. In: 2019 8th International conference on affective computing and intelligent interaction (ACII), pp 538–544

  46. Yao G, Yuan Y, Shao T et al (2021) One-shot face reenactment using appearance adaptive normalization. In: Proceedings of the AAAI conference on artificial intelligence, pp 3172–3180

  47. You R, Guo Z, Cui L et al (2020) Cross-modality attention with semantic graph embedding for multi-label classification. In: Proceedings of the AAAI conference on artificial intelligence, pp 12,709–12,716

  48. Zhang W, Ji X, Chen K et al (2021) Learning a facial expression embedding disentangled from identity. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6759–6768

  49. Zhang X, Yin L, Cohn JF et al (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing 32(10):692–706

    Article  Google Scholar 

  50. Zhang Z, Girard JM, Wu Y et al (2016) Multimodal spontaneous emotion corpus for human behavior analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3438–3446

  51. Zhao K, Chu WS, De la Torre F et al (2015) Joint patch and multi-label learning for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2207–2216

  52. Zhao K, Chu WS, Martinez AM (2018) Learning facial action units from web images with scalable weakly supervised clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2090–2099

  53. Zhi R, Liu M, Zhang D (2020) A comprehensive survey on automatic facial action unit analysis. The Visual Computer 36(5):1067–1093

    Article  Google Scholar 

  54. Zhong L, Liu Q, Yang P et al (2015) Learning multiscale active facial patches for expression analysis. IEEE Transactions on Cybernetics 45(8):1499–1510

    Article  PubMed  Google Scholar 

Download references

Funding

This work is supported by the 2022 Hangzhou Key Science and Technology Innovation Program (No. 2022AIZD0054) and the Key Research and Development Program of Zhejiang Province (No. 2022C01011).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization[Rudong An], [Yu Ding], [Wei Zhang], [Hao Zeng], [Zhigang Deng], [Aobo Jin]; Methodology: [Rudong An], [Wei Zhang], [Hao Zeng], [Yu Ding], [Wei Chen]; Investigation: [Rudong An]; Data curation: [Aobo Jin], [Wei Chen]; Writing-review and editing: [Rudong An],[Wei Zhang], [Hao Zeng], [Yu Ding], [Zhigang Deng], [Aobo Jin], [Wei Chen];

Corresponding author

Correspondence to Yu Ding.

Ethics declarations

Competing Interests

All authors declare that they have no conflicts of interest.

Ethics approval:

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, R., Jin, A., Chen, W. et al. Learning facial expression-aware global-to-local representation for robust action unit detection. Appl Intell 54, 1405–1425 (2024). https://doi.org/10.1007/s10489-023-05154-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05154-7

Keywords

Navigation