ABSTRACT
The detection of Facial Action Units (AUs) refers to the use of computer vision techniques or machine learning methods to identify or detect specific facial expressions. The combination of different AUs can describe and quantify changes in facial expressions, making it a very important task in facial attribute analysis. In recent years, with the development of deep learning, significant progress has been made in facial AU detection. However, most research has focused on single tasks, training specific models for facial AU detection. This approach overlooks the relationship between AUs and other facial attributes, leading to limited noise resistance and adaptability. In this paper, a multi-task learning method is proposed for facial AU detection. Specifically, the model simultaneously learns facial AU detection, facial landmark detection, and facial emotion recognition. By sharing the underlying network, the model can learn more general feature representations, thereby improving its generalization capability. Additionally, the landmark coordinates obtained from facial landmark detection provide attention maps for AUs, which can help avoid interference from other facial information and improve detection performance. The proposed method achieved competitive results on the widely used BP4D dataset.
- Shao Z, Liu Z, Cai J, Facial action unit detection using attention and relation learning[J]. IEEE transactions on affective computing, 2019, 13(3): 1274-1289.Google Scholar
- Li W, Abtahi F, Zhu Z, Eac-net: Deep nets with enhancing and cropping for facial action unit detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(11): 2583-2596.Google Scholar
- Kaili Z, Chu W S, Zhang H. Deep region and multi-label learning for facial action unit detection[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3391-3399.Google Scholar
- Ma C, Chen L, Yong J. Au r-cnn: Encoding expert prior knowledge into r-cnn for action unit detection[J]. neurocomputing, 2019, 355: 35-47.Google Scholar
- Zhang K, Zhang Z, Li Z, Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE signal processing letters, 2016, 23(10): 1499-1503.Google ScholarCross Ref
- Zhang Z, Luo P, Loy C C, Facial landmark detection by deep multi-task learning[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13. Springer International Publishing, 2014: 94-108.Google Scholar
- Xu D, Ouyang W, Wang X, Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 675-684.Google Scholar
- Li W, Abtahi F, Zhu Z, Eac-net: A region-based deep enhancing and crop** approach for facial action unit detection[C]//2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017: 103-110.Google Scholar
- Shao Z, Liu Z, Cai J, Facial action unit detection using attention and relation learning[J]. IEEE transactions on affective computing, 2019, 13(3): 1274-1289.Google Scholar
- Li G, Zhu X, Zeng Y, Semantic relationships guided representation learning for facial action unit recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 8594-8601.Google Scholar
- Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- Li Y, Tarlow D, Brockschmidt M, Gated graph sequence neural networks[J]. arXiv preprint arXiv:1511.05493, 2015.Google Scholar
- Tang Y, Zeng W, Zhao D, Piap-df: Pixel-interested and anti person-specific facial action unit detection net with discrete feedback learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 12899-12908.Google Scholar
- Shao Z, Liu Z, Cai J, Jaa-net: joint facial action unit detection and face alignment via adaptive attention[J]. International Journal of Computer Vision, 2021, 129: 321-340.Google ScholarDigital Library
- Zhang X, Yin L, Cohn J F, Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database[J]. Image and Vision Computing, 2014, 32(10): 692-706.Google ScholarCross Ref
- Zhang X, Yin L, Cohn J F, A high-resolution spontaneous 3d dynamic facial expression database[C]//2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, 2013: 1-6.Google Scholar
- Kaili Z, Chu W S, Zhang H. Deep region and multi-label learning for facial action unit detection[C]//In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3391-3399.Google Scholar
- Yang H, Yin L, Zhou Y, Exploiting semantic embedding and visual feature for facial action unit detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 10482-10491.Google Scholar
Index Terms
- Facial Action Unit Detection based on Multi-task Learning
Recommendations
Task-dependent multi-task multiple kernel learning for facial action unit detection
Facial action unit (AU) detection from images and videos is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, task-dependent multi-task multiple kernel learning (TD-MTMKL), to ...
Facial Action Unit Detection with Local Key Facial Sub-region based Multi-label Classification for Micro-expression Analysis
FME'21: Proceedings of the 1st Workshop on Facial Micro-Expression: Advanced Techniques for Facial Expressions Generation and SpottingMicro-expressions describe unconscious facial movements which reflect a person's psychological state even when there is an attempt to conceal it. Often used in psychological and forensic applications, their manual recognition requires professional ...
Learning facial expression-aware global-to-local representation for robust action unit detection
AbstractThe task of detecting facial action units (AU) often utilizes discrete expression categories, such as Angry, Disgust, and Happy, as auxiliary information to enhance performance. However, these categories are unable to capture the subtle ...
Comments