Measuring the intensity of spontaneous facial action units with dynamic Bayesian network
Introduction
Facial expression is one of the most common nonverbal communication media that individuals use in their daily social interactions. Analyzing facial expression will provide powerful information to describe the emotional states and psychological patterns of individuals. In the last two decades automatic facial expression recognition has gained more attention in several applications in developmental psychology, social robotics, affective online tutoring environment and intelligent Human–Computer Interaction (HCI) design [1], [2].
Facial Action Coding System (FACS) is one of the well-known approaches for describing and analyzing facial expressions [3]. FACS describes all possible facial expressions based on a set of anatomical facial muscle movements, called Action Unit (AU). For instance, AU12 or lip corner puller specifies contractions that occur on the face by Orbicularis oculi muscle [3]. FACS is also capable of representing the dynamics of every facial behavior by annotating the intensity of each AU in a five ordinal scale (i.e. scales A–E that indicate the barely visible to maximum intensity of each AU). AU intensity can describe the occurrence of spontaneous facial expressions in more detail. The general relationship between the scale of evidence and the A–B–C–D–E intensity scoring, as well as some AU samples is illustrated in Fig. 1. Generally, the A level refers to a trace of the action; B, slight evidence; C, marked or pronounced; D, severe or extreme; and E, maximum evidence. For example, “AU12B” indicates AU12 with a B intensity level. Manual FACS coding is an intensive and time consuming task and designing an automatic system which can specify the list of AUs and their intensities would help the community to analyze spontaneous facial behaviors accurately and efficiently.
Majority of the existing literature has been focusing on two types of facial expression studies. The first category is concerned with analyzing and classifying prototypic facial expressions (aka as six basic expressions: happy, sad, disgust, anger, surprise, fear). These studies are mostly designed to recognize the basic expressions that can represent the human emotions. These expressions of emotions are known to be similar among different cultures [39]. The second category of facial behavior analysis specifies expressions by a set of AUs where the goal is to represent and recognize facial AUs defined by FACS. The latter approach can comprehensively describe a wider range of facial expressions. AU-based analyzers are also capable for representing the prototypic facial expressions as a combination of AUs. For instance ‘fear’ can be represented by combination of AU1, 2, 4, 5, and 25 [3].
Most of the existing studies have been focused on prototypic facial expressions and detecting the occurrence of AUs in posed facial expressions. In many real world application we need to analyze spontaneous facial expression, such as categorizing pain related facial expressions [7] and measuring the engagement of students for online tutoring applications. Spontaneous facial behavior analysis can be very challenging due to several factors, such as out-of-plane head motion and different poses, subtle facial expressions and intra-subject variability in dynamics and timing of different facial actions. In addition it has been shown that the dynamics and patterns of spontaneous facial expressions can be very different from the posed ones.
Analyzing spontaneous facial expressions, especially for intensity measurement, is not as robust and accurate as the posed one, because of the aforementioned challenging factors. In early works for automatic AU intensity measurement, Bartlett et al. [13] measured the intensity of AUs in posed and spontaneous facial expressions by using Gabor wavelet and support vector machines. The mean correlation with human-coded intensity of their automated face recognition system for posed and spontaneous facial behavior is 0.63 and 0.3, respectively. These quantitative results demonstrate that recognizing spontaneous expressions is more challenging than posed expressions.
In the area of spontaneous facial actions recognition, there are very few works on detecting or measuring the intensity of spontaneous facial actions [5], [15]. To the best of the authors׳ knowledge, most of the current studies, including [5], [15], [14], analyze spontaneous facial actions statically and individually. In other words, the dependencies among multilevel AU intensities as well as the temporal information are ignored. The semantic and dynamic relationships among facial actions are crucial for understanding and analyzing spontaneous expression. In fact, the coordination and synchronized spatiotemporal interactions between facial actions produce a meaningful facial expression. Tong et al. [25] employed dynamic Bayesian Network (DBN) to model the dependencies among AUs and achieved improvement over single image-driven methods, especially for recognizing AUs that are difficult to detect but have strong relationships with other AUs. However, their work [25] focuses on AU detection of posed expression.
Following the idea in [25], in this paper, we introduce a framework based on DBN to systematically model the spatiotemporal dependencies among multi-AU intensity levels in multiple frames, in order to measure the intensity of spontaneous facial actions. The proposed probabilistic framework is capable of recognizing multilevel AU intensities in spontaneous facial expressions. Denver Intensity of Spontaneous Facial Action (DISFA) database [9] is employed in this study. DISFA database is publicly available for analyzing the AU intensities and their dynamics. For every frame in DISFA, the intensity of every AU within the scale of 0 (absence of an AU) to 5 (maximum intensity) has been provided. To demonstrate the effectiveness of the proposed model, rigorous experiments are performed on DISFA database. The experimental results as well as the detailed analysis on the improvements are reported in this paper.
Section snippets
Related works
Given the significant role of faces in human׳s emotional and social life, automating the analysis of facial expression has gained great attention in both academia and industry. An automated facial expression recognition system usually consists of two key stages: feature extraction and machine learning algorithm design for classification. Commonly used features that represent facial gestures or facial movements include optical flow [35], [41], explicit feature measurement (e.g., length of
AU intensity observation extraction
In this section we describe our AU intensity image observation extraction method, which consists of face registration, facial image representation, dimensionality reduction and SVM classification.
AU intensity correlation analysis
Measuring the intensity of AUs in a single frame of video is difficult due to the variety, ambiguity, and dynamic nature of facial actions. This is especially true for spontaneous facial expressions. Moreover, when AUs occur in a combination, they may be non-additive, which means that the appearance of an AU in a combination is different from its stand-alone appearance. Fig. 3 demonstrates an example of the non-additive effect: when AU12 (lip corner puller) appears alone, the lip corners are
Experimental results
In our experiments we utilized the DISFA database for evaluating the performance of automatic measurement of the intensity of spontaneous action units. First we introduce the contents of DISFA and then the results of the proposed system for measuring the intensity of 12 AUs of this database are reported.
Conclusions and future work
Due to the richness, ambiguity, and dynamic nature of facial actions, individually and statically recognizing each AU intensity is not always accurate and reliable for spontaneous facial expressions. Hence, improving the recognition system׳s efficiency not only requires improving the observation extraction accuracy, but more importantly, requires exploiting the spatiotemporal interactions among facial actions, since it is these coherent, coordinated, and synchronized interactions that produce a
Conflict of interest
None declared.
Acknowledgements
This work was mainly accomplished when the first author visited Rensselaer Polytechnic Institute (RPI) as a visiting student, and was partially supported by National Natural Science Foundation of China Funded Project 61402129, Heilongjiang Postdoctoral Science Foundation Funded Project LBH-Z14090, awards IIP-1111568 and BCS-1052781 from the National Science Foundation to the University of Denver.
Yongqiang Li received the B.S., M.S. and Ph.D. degrees in instrument science and technology from Harbin Institute of Technology, Harbin, China, in 2007, 2009 and 2014, respectively. He is currently an Assistant Professor at Harbin Institute of Technology. He worked as a visiting student at Rensselaer Polytechnic Institute, Troy, USA, from September 2010–September 2012. His areas of research include computer vision, pattern recognition, and human-computer interaction.
References (47)
- et al.
Event related potentials and the perception of intensity in facial expressions
Neuropsychologia
(2006) - et al.
Detection, tracking, and classification of action units in facial expression
Robot. Auton. Syst.
(2000) - et al.
Facial expression recognition based on local binary patternsa comprehensive study
Image Vis. Comput.
(2009) - et al.
Recognizing faces with PCA and ICA
Comput. Vis. Image Underst.
(2003) - et al.
Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders
J. Neurosci. Methods
(2011) - et al.
Regressionbased intensity estimation of facial action units
Image Vis. Comput.
(2012) - C. Breazeal, Sociable machines: expressive social exchange between humans and robots (Sc.D. dissertation), Department...
- F. Dornaika, B. Raducanu, Facial Expression Recognition for hci Applications, Prentice Hall Computer Applications in...
- P. Ekman, W.V. Friesen, J. C. Hager, Facial Action Coding System, A Human Face, Salt Lake City, UT,...
- M.S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, J. Movellan, Recognizing facial expression: machine...
DISFAa spontaneous facial action intensity database
IEEE Trans. Affect. Comput.
Appearance manifold of facial expression
Computer Vision in Human–Computer Interaction
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
An overview of statistical learning theory
IEEE Trans. Neural Netw.
Facial action unit recognition by exploiting their dynamic and semantic relationships
IEEE Trans. Pattern Anal. Mach. Intell
Estimating the dimension of a model
Ann. Stat.
Cited by (40)
Hybrid Dynamic Bayesian network method for performance analysis of safety barriers considering multi-maintenance strategies
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :To satisfy such a requirement, the DBN (Naili et al., 2019) is integrated as structural optimization (Codetta-Raiteri et al., 2012). DBN-based methods have been successfully applied in machine learning, e.g. performing learning from observation tasks, fault classification and parameter learning (Ontañón et al., 2014; Zheng et al., 2020; Li et al., 2015), risk analysis, e.g. performing dynamic and quantitative risk assessment, predict and diagnose incidents in offshore drilling process (Zhang et al., 2018; Wu et al., 2016), and system performance evaluation, e.g. analyzing the reliability and availability (Cai et al., 2013a). DBNs have also received considerably increasing attention nowadays in the field of uncertainty issues and dynamic characteristic by considering the variation of conditions.
Tractable learning of Bayesian networks from partially observed data
2019, Pattern RecognitionCitation Excerpt :Bayesian networks (BNs) [1,2] are probabilistic graphical models that provide a compact and self-explanatory representation of multidimensional probability distributions. BNs have been successfully applied in several machine learning problems including supervised classification [3–6] and clustering [7,8]. A BN includes two components.
Trend-Aware Supervision: On Learning Invariance for Semi-Supervised Facial Action Unit Intensity Estimation
2024, Proceedings of the AAAI Conference on Artificial IntelligenceWhat Happens in Face During a Facial Expression? Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
2024, Information Systems FrontiersUsing Data Mining Techniques to Analyze Facial Expression Motion Vectors
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Yongqiang Li received the B.S., M.S. and Ph.D. degrees in instrument science and technology from Harbin Institute of Technology, Harbin, China, in 2007, 2009 and 2014, respectively. He is currently an Assistant Professor at Harbin Institute of Technology. He worked as a visiting student at Rensselaer Polytechnic Institute, Troy, USA, from September 2010–September 2012. His areas of research include computer vision, pattern recognition, and human-computer interaction.
S. Mohammad Mavadati received the B.Sc. degree in electronics engineering from the Shahrood University of Technology, Iran, in September 2007, and the M.Sc. degree in telecommunication engineering from Yazd University, Iran, in March 2010. He is currently working toward the Ph.D. degree and is a Graduate Research Assistant in the Department of Electrical and Computer Engineering at the University of Denver. His research interests include automatic analysis of facial expression, pattern classification, and computer vision. He is a student member of both the IEEE and the IEEE Signal Processing Society.
Mohammad H. Mahoor received the B.S. degree in electronics from the Abadan Institute of Technology, Iran, in 1996, the M.S. degree in biomedical engineering from the Sharif University of Technology, Iran, in 1998, and the Ph.D. degree in electrical and computer engineering from the University of Miami, Florida, in 2007. He joined the University of Denver (DU) as an Assistant Professor of computer engineering in September 2008. He has authored or coauthored more than 60 refereed research publications. He is the director of image processing and computer vision laboratory at DU. His research interests include affective computing and developing automated systems for facial expression recognition. He is a member of the IEEE.
Yongping Zhao received the Ph.D. degree in electrical engineering from Harbin Institute of Technology, Harbin, China. He is currently a Professor with the Department of Instrument Science and Technology at Harbin Institute of Technology, Harbin, China. His areas of research include signal processing, system integration and pattern recognition.
Qiang Ji received his Ph.D. degree in electrical engineering from the University of Washington. He is currently a Professor with the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute (RPI). He recently served as a Program Director at the National Science Foundation (NSF), where he managed NSF׳s computer vision and machine learning programs. He also held teaching and research positions with the Beckman Institute at University of Illinois at Urbana-Champaign, the Robotics Institute at Carnegie Mellon University, the Department of Computer Science at University of Nevada at Reno, and the US Air Force Research Laboratory. Prof. Ji currently serves as the Director of the Intelligent Systems Laboratory (ISL) at RPI.
Prof. Ji׳s research interests are in computer vision, probabilistic graphical models, information fusion, and their applications in various fields. He has published over 160 papers in peer-reviewed journals and conferences. His research has been supported by major governmental agencies including NSF, NIH, DARPA, ONR, ARO, and AFOSR as well as by major companies including Honda and Boeing. Prof. Ji is an Editor on several related IEEE and international journals and he has served as a General Chair, Program Chair, Technical Area Chair, and Program Committee Member in numerous international conferences/workshops. Prof. Ji is a Fellow of IEEE and IAPR.