Measuring the intensity of spontaneous facial action units with dynamic Bayesian network

doi:10.1016/j.patcog.2015.04.022

Pattern Recognition

Volume 48, Issue 11, November 2015, Pages 3417-3427

https://doi.org/10.1016/j.patcog.2015.04.022 Get rights and content

Highlights

•
This paper aims to measure the intensity of spontaneous facial action unit.
•
The proposed framework models the correlation of AU intensities to improve the recognition accuracy.
•
Advanced machine learning methods are introduced to learn the parameters of the proposed model.
•
Plenty of experiments on DISFA database demonstrate the effectiveness of the proposed method.

Abstract

Automatic facial expression analysis has received great attention in different applications over the last two decades. Facial Action Coding System (FACS), which describes all possible facial expressions based on a set of facial muscle movements called Action Unit (AU), has been used extensively to model and analyze facial expressions. FACS describes methods for coding the intensity of AUs, and AU intensity measurement is important in some studies in behavioral science and developmental psychology. However, in majority of the existing studies in the area of facial expression recognition, the focus has been on basic expression recognition or facial action unit detection. There are very few investigations on measuring the intensity of spontaneous facial actions. In addition, the few studies on AU intensity recognition usually try to measure the intensity of facial actions statically and individually, ignoring the dependencies among multilevel AU intensities as well as the temporal information. However, these spatiotemporal interactions among facial actions are crucial for understanding and analyzing spontaneous facial expressions, since these coherent, coordinated, and synchronized interactions are that produce a meaningful facial display. In this paper, we propose a framework based on Dynamic Bayesian Network (DBN) to systematically model the dynamic and semantic relationships among multilevel AU intensities. Given the extracted image observations, the AU intensity recognition is accomplished through probabilistic inference by systematically integrating the image observations with the proposed DBN model. Experiments on Denver Intensity of Spontaneous Facial Action (DISFA) database demonstrate the superiority of our method over single image-driven methods in AU intensity measurement.

Introduction

Facial expression is one of the most common nonverbal communication media that individuals use in their daily social interactions. Analyzing facial expression will provide powerful information to describe the emotional states and psychological patterns of individuals. In the last two decades automatic facial expression recognition has gained more attention in several applications in developmental psychology, social robotics, affective online tutoring environment and intelligent Human–Computer Interaction (HCI) design [1], [2].

Facial Action Coding System (FACS) is one of the well-known approaches for describing and analyzing facial expressions [3]. FACS describes all possible facial expressions based on a set of anatomical facial muscle movements, called Action Unit (AU). For instance, AU12 or lip corner puller specifies contractions that occur on the face by Orbicularis oculi muscle [3]. FACS is also capable of representing the dynamics of every facial behavior by annotating the intensity of each AU in a five ordinal scale (i.e. scales A–E that indicate the barely visible to maximum intensity of each AU). AU intensity can describe the occurrence of spontaneous facial expressions in more detail. The general relationship between the scale of evidence and the A–B–C–D–E intensity scoring, as well as some AU samples is illustrated in Fig. 1. Generally, the A level refers to a trace of the action; B, slight evidence; C, marked or pronounced; D, severe or extreme; and E, maximum evidence. For example, “AU12B” indicates AU12 with a B intensity level. Manual FACS coding is an intensive and time consuming task and designing an automatic system which can specify the list of AUs and their intensities would help the community to analyze spontaneous facial behaviors accurately and efficiently.

Majority of the existing literature has been focusing on two types of facial expression studies. The first category is concerned with analyzing and classifying prototypic facial expressions (aka as six basic expressions: happy, sad, disgust, anger, surprise, fear). These studies are mostly designed to recognize the basic expressions that can represent the human emotions. These expressions of emotions are known to be similar among different cultures [39]. The second category of facial behavior analysis specifies expressions by a set of AUs where the goal is to represent and recognize facial AUs defined by FACS. The latter approach can comprehensively describe a wider range of facial expressions. AU-based analyzers are also capable for representing the prototypic facial expressions as a combination of AUs. For instance ‘fear’ can be represented by combination of AU1, 2, 4, 5, and 25 [3].

Most of the existing studies have been focused on prototypic facial expressions and detecting the occurrence of AUs in posed facial expressions. In many real world application we need to analyze spontaneous facial expression, such as categorizing pain related facial expressions [7] and measuring the engagement of students for online tutoring applications. Spontaneous facial behavior analysis can be very challenging due to several factors, such as out-of-plane head motion and different poses, subtle facial expressions and intra-subject variability in dynamics and timing of different facial actions. In addition it has been shown that the dynamics and patterns of spontaneous facial expressions can be very different from the posed ones.

Analyzing spontaneous facial expressions, especially for intensity measurement, is not as robust and accurate as the posed one, because of the aforementioned challenging factors. In early works for automatic AU intensity measurement, Bartlett et al. [13] measured the intensity of AUs in posed and spontaneous facial expressions by using Gabor wavelet and support vector machines. The mean correlation with human-coded intensity of their automated face recognition system for posed and spontaneous facial behavior is 0.63 and 0.3, respectively. These quantitative results demonstrate that recognizing spontaneous expressions is more challenging than posed expressions.

In the area of spontaneous facial actions recognition, there are very few works on detecting or measuring the intensity of spontaneous facial actions [5], [15]. To the best of the authors׳ knowledge, most of the current studies, including [5], [15], [14], analyze spontaneous facial actions statically and individually. In other words, the dependencies among multilevel AU intensities as well as the temporal information are ignored. The semantic and dynamic relationships among facial actions are crucial for understanding and analyzing spontaneous expression. In fact, the coordination and synchronized spatiotemporal interactions between facial actions produce a meaningful facial expression. Tong et al. [25] employed dynamic Bayesian Network (DBN) to model the dependencies among AUs and achieved improvement over single image-driven methods, especially for recognizing AUs that are difficult to detect but have strong relationships with other AUs. However, their work [25] focuses on AU detection of posed expression.

Following the idea in [25], in this paper, we introduce a framework based on DBN to systematically model the spatiotemporal dependencies among multi-AU intensity levels in multiple frames, in order to measure the intensity of spontaneous facial actions. The proposed probabilistic framework is capable of recognizing multilevel AU intensities in spontaneous facial expressions. Denver Intensity of Spontaneous Facial Action (DISFA) database [9] is employed in this study. DISFA database is publicly available for analyzing the AU intensities and their dynamics. For every frame in DISFA, the intensity of every AU within the scale of 0 (absence of an AU) to 5 (maximum intensity) has been provided. To demonstrate the effectiveness of the proposed model, rigorous experiments are performed on DISFA database. The experimental results as well as the detailed analysis on the improvements are reported in this paper.

Section snippets

Related works

Given the significant role of faces in human׳s emotional and social life, automating the analysis of facial expression has gained great attention in both academia and industry. An automated facial expression recognition system usually consists of two key stages: feature extraction and machine learning algorithm design for classification. Commonly used features that represent facial gestures or facial movements include optical flow [35], [41], explicit feature measurement (e.g., length of

AU intensity observation extraction

In this section we describe our AU intensity image observation extraction method, which consists of face registration, facial image representation, dimensionality reduction and SVM classification.

AU intensity correlation analysis

Measuring the intensity of AUs in a single frame of video is difficult due to the variety, ambiguity, and dynamic nature of facial actions. This is especially true for spontaneous facial expressions. Moreover, when AUs occur in a combination, they may be non-additive, which means that the appearance of an AU in a combination is different from its stand-alone appearance. Fig. 3 demonstrates an example of the non-additive effect: when AU12 (lip corner puller) appears alone, the lip corners are

Experimental results

In our experiments we utilized the DISFA database for evaluating the performance of automatic measurement of the intensity of spontaneous action units. First we introduce the contents of DISFA and then the results of the proposed system for measuring the intensity of 12 AUs of this database are reported.

Conclusions and future work

Due to the richness, ambiguity, and dynamic nature of facial actions, individually and statically recognizing each AU intensity is not always accurate and reliable for spontaneous facial expressions. Hence, improving the recognition system׳s efficiency not only requires improving the observation extraction accuracy, but more importantly, requires exploiting the spatiotemporal interactions among facial actions, since it is these coherent, coordinated, and synchronized interactions that produce a

Conflict of interest

None declared.

Acknowledgements

This work was mainly accomplished when the first author visited Rensselaer Polytechnic Institute (RPI) as a visiting student, and was partially supported by National Natural Science Foundation of China Funded Project 61402129, Heilongjiang Postdoctoral Science Foundation Funded Project LBH-Z14090, awards IIP-1111568 and BCS-1052781 from the National Science Foundation to the University of Denver.

Yongqiang Li received the B.S., M.S. and Ph.D. degrees in instrument science and technology from Harbin Institute of Technology, Harbin, China, in 2007, 2009 and 2014, respectively. He is currently an Assistant Professor at Harbin Institute of Technology. He worked as a visiting student at Rensselaer Polytechnic Institute, Troy, USA, from September 2010–September 2012. His areas of research include computer vision, pattern recognition, and human-computer interaction.

References (47)

R. Sprengelmeyer et al.
Event related potentials and the perception of intensity in facial expressions
Neuropsychologia
(2006)
J.J.J. Lien et al.
Detection, tracking, and classification of action units in facial expression
Robot. Auton. Syst.
(2000)
C. Shan et al.
Facial expression recognition based on local binary patternsa comprehensive study
Image Vis. Comput.
(2009)
B.A. Draper et al.
Recognizing faces with PCA and ICA
Comput. Vis. Image Underst.
(2003)
J. Hamm et al.
Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders
J. Neurosci. Methods
(2011)
A. Savran et al.
Regressionbased intensity estimation of facial action units
Image Vis. Comput.
(2012)
C. Breazeal, Sociable machines: expressive social exchange between humans and robots (Sc.D. dissertation), Department...
F. Dornaika, B. Raducanu, Facial Expression Recognition for hci Applications, Prentice Hall Computer Applications in...
P. Ekman, W.V. Friesen, J. C. Hager, Facial Action Coding System, A Human Face, Salt Lake City, UT,...
M.S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, J. Movellan, Recognizing facial expression: machine...

M.H. Mahoor, S. Cadavid, D.S. Messinger, J.F. Cohn, A framework for automated measurement of the intensity of non-posed...

P. Lucy, J.F. Cohn, K.M. Prkachin, P. Solomon, I. Matthrews, Painful data: the UNBC-McMaster shoulder pain expression...

S.M. Mavadati et al.

DISFAa spontaneous facial action intensity database

IEEE Trans. Affect. Comput.

(2013)

M.S. Bartlett, G.C. Littlewort, C. Lainscsek, I. Fasel, M.G Frank, J.R. Movellan, Fully automatic facial action...

S.M. Mavadati, M.H. Mahoor, K. Bartlett, P. Trinh, Automatic detection of non-posed facial action units, in: Proceeding...

N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Computer Vision and Pattern Recognition...

Y. Tian, T. Kanade, J.F. Cohn, Evaluation of Gabor-wavelet-based facial action unit recognition in image sequences of...

Y. Chang, C. Hu, M. Turk, Manifold of facial expression, in: IEEE International Workshop on Analysis and Modeling of...

C. Shan et al.

Appearance manifold of facial expression

Computer Vision in Human–Computer Interaction

(2005)

M. Belkin et al.

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Comput.

(2003)

V.N. Vapnik

An overview of statistical learning theory

IEEE Trans. Neural Netw.

(1999)

Y. Tong et al.

Facial action unit recognition by exploiting their dynamic and semantic relationships

IEEE Trans. Pattern Anal. Mach. Intell

(2007)

G. Schwarz

Estimating the dimension of a model

Ann. Stat.

(1978)

Cited by (40)

Hybrid Dynamic Bayesian network method for performance analysis of safety barriers considering multi-maintenance strategies
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
To satisfy such a requirement, the DBN (Naili et al., 2019) is integrated as structural optimization (Codetta-Raiteri et al., 2012). DBN-based methods have been successfully applied in machine learning, e.g. performing learning from observation tasks, fault classification and parameter learning (Ontañón et al., 2014; Zheng et al., 2020; Li et al., 2015), risk analysis, e.g. performing dynamic and quantitative risk assessment, predict and diagnose incidents in offshore drilling process (Zhang et al., 2018; Wu et al., 2016), and system performance evaluation, e.g. analyzing the reliability and availability (Cai et al., 2013a). DBNs have also received considerably increasing attention nowadays in the field of uncertainty issues and dynamic characteristic by considering the variation of conditions.
Safety barriers play a critical role in preventing unintentional hydrocarbon flow leaking from reservoir to external environment or another formation during different offshore operation stages, and such a leakage has the potential to trigger cascading events and may lead to catastrophic consequences. The present study aims at the development of a hybrid DBN-based approach for dynamic performance analysis of safety barriers in the prevention of subsea downhole leakage incidents. Events in operation, such as different types of maintenances and process demand are taken into account to enhance the safety barrier performance. These factors could be analyzed by reflecting inspecting and repair activities of safety barriers with multistate-based multiphase Markov process. In order to obtain a dynamic and synthetic risk analysis of subsea downhole leakage, a dynamic Bayesian network-based model is proposed, incorporating the failure analysis of safety barriers and downhole multiple leakage pathways. Such analysis allows determining the dynamic risk characteristic of leakage events, and key safety barriers under different maintenance scenarios. Dynamic performance of such safety barriers is evaluated with respect to four aspects: preventive maintenance and imperfect repair, degradation effects, process demand and maintenance cost. The approach is tested through the application to a case study with an offshore oil and gas well. The results the importance of safety barrier performance in controlling the expected leakage scenarios.
Tractable learning of Bayesian networks from partially observed data
2019, Pattern Recognition
Citation Excerpt :
Bayesian networks (BNs) [1,2] are probabilistic graphical models that provide a compact and self-explanatory representation of multidimensional probability distributions. BNs have been successfully applied in several machine learning problems including supervised classification [3–6] and clustering [7,8]. A BN includes two components.
The majority of real-world problems require addressing incomplete data. The use of the structural expectation-maximization algorithm is the most common approach toward learning Bayesian networks from incomplete datasets. However, its main limitation is its demanding computational cost, caused mainly by the need to make an inference at each iteration of the algorithm. In this paper, we propose a new method with the purpose of guaranteeing the efficiency of the learning process while improving the performance of the structural expectation-maximization algorithm. We address the first objective by applying an upper bound to the treewidth of the models to limit the complexity of the inference. To achieve this, we use an efficient heuristic to search the space of the elimination orders. For the second objective, we study the advantages of directly computing the score with respect to the observed data rather than an expectation of the score, and provide a strategy to efficiently perform these computations in the proposed method. We perform exhaustive experiments on synthetic and real-world datasets of varied dimensionalities, including datasets with thousands of variables and hundreds of thousands of instances. The experimental results support our claims empirically.
Trend-Aware Supervision: On Learning Invariance for Semi-Supervised Facial Action Unit Intensity Estimation
2024, Proceedings of the AAAI Conference on Artificial Intelligence
What Happens in Face During a Facial Expression? Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
2024, Information Systems Frontiers
Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
An Intelligent Decision Framework For Facial Expression Recognition Based On Deep Learning
2023, Research Square

View all citing articles on Scopus

S. Mohammad Mavadati received the B.Sc. degree in electronics engineering from the Shahrood University of Technology, Iran, in September 2007, and the M.Sc. degree in telecommunication engineering from Yazd University, Iran, in March 2010. He is currently working toward the Ph.D. degree and is a Graduate Research Assistant in the Department of Electrical and Computer Engineering at the University of Denver. His research interests include automatic analysis of facial expression, pattern classification, and computer vision. He is a student member of both the IEEE and the IEEE Signal Processing Society.

Mohammad H. Mahoor received the B.S. degree in electronics from the Abadan Institute of Technology, Iran, in 1996, the M.S. degree in biomedical engineering from the Sharif University of Technology, Iran, in 1998, and the Ph.D. degree in electrical and computer engineering from the University of Miami, Florida, in 2007. He joined the University of Denver (DU) as an Assistant Professor of computer engineering in September 2008. He has authored or coauthored more than 60 refereed research publications. He is the director of image processing and computer vision laboratory at DU. His research interests include affective computing and developing automated systems for facial expression recognition. He is a member of the IEEE.

Yongping Zhao received the Ph.D. degree in electrical engineering from Harbin Institute of Technology, Harbin, China. He is currently a Professor with the Department of Instrument Science and Technology at Harbin Institute of Technology, Harbin, China. His areas of research include signal processing, system integration and pattern recognition.

Qiang Ji received his Ph.D. degree in electrical engineering from the University of Washington. He is currently a Professor with the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute (RPI). He recently served as a Program Director at the National Science Foundation (NSF), where he managed NSF׳s computer vision and machine learning programs. He also held teaching and research positions with the Beckman Institute at University of Illinois at Urbana-Champaign, the Robotics Institute at Carnegie Mellon University, the Department of Computer Science at University of Nevada at Reno, and the US Air Force Research Laboratory. Prof. Ji currently serves as the Director of the Intelligent Systems Laboratory (ISL) at RPI.

Prof. Ji׳s research interests are in computer vision, probabilistic graphical models, information fusion, and their applications in various fields. He has published over 160 papers in peer-reviewed journals and conferences. His research has been supported by major governmental agencies including NSF, NIH, DARPA, ONR, ARO, and AFOSR as well as by major companies including Honda and Boeing. Prof. Ji is an Editor on several related IEEE and international journals and he has served as a General Chair, Program Chair, Technical Area Chair, and Program Committee Member in numerous international conferences/workshops. Prof. Ji is a Fellow of IEEE and IAPR.

View full text