Multi-modal emotion analysis from facial expressions and electroencephalogram
Graphical abstract
Introduction
Emotions are a central part of human communication. They are fundamental to humans, impacting on our perception and everyday activities such as communication, learning and decision-making. It is widely agreed that they are a multi-modal procedure involving facial expressions, speech, gestures and some physical characteristics, as shown in Fig. 1, and should have a key role in human–computer interactions [10], [31], [52]. Application scenarios include analyzing emotions while the person is watching emotional movies or advertisements, playing video games, driving a car, is under health monitoring or crime investigation, or is participating in interactive tutoring.
While computers are expected to naturally interact with humans, an emotion recognition technique should be able to process, extract and analyze a variety of cues through multi-modal procedure. Recently, multi-modal emotion recognition has gained significant scientific interests [3], [32], [44], [45]. These works utilized various kinds of channels involving facial expressions, speech and physiological signals for emotion recognition. Among these, facial expression is an intuitive measurement for computers to understand human beings’ emotions, while electroencephalogram (EEG) is an internal measure from the brain, making an interesting alternative for multi-modal emotions recognition. So far, there are few works attempting to consider facial expression and EEG together for spontaneous emotion recognition [21]. This paper proposes a new approach for multi-modal emotion recognition fusing facial expression and EEG for recognizing emotions from long continuous videos.
Facial expression is probably the most important non-verbal communication channel. Facial expressions have been directly linked to the emotional state experienced by the sender [10] and have been shown to be an important source of information regarding the emotional state of others. They can reveal how people are feeling and what their attitude and behavioral intentions are.
For recent decades, research on facial expression analysis has been developed from posed (acted) to spontaneous facial expressions [52], from isolated to continuous [19], from obvious to subtle expressions [39]. The recent studies [38], [48] have extensively investigated spontaneous facial expressions, because spontaneous facial expressions are more relative to true emotion of human beings than acted facial expressions. Technically, geometry-based and appearance-based features are two common ways to analyze spontaneous facial expressions [18], [26], [49]. Specifically, geometric-based feature approach represents the face geometry, such as the shapes and the locations of facial landmarks, which are obtained by an active shape model or active appearance model. On the other hand, appearance-based feature method describes the skin texture of faces, such as wrinkles and furrows [15], [30], [53]. However, as indicator of emotions, facial expression itself may not provide sufficiently informative characteristics of human beings’ affective status [45], [52]. The true expression is affected by the context of the social situation, such as different cultures [9]. As a result, one needs to use information from different modalities to increase the accuracy.
Recently, research on physiological signals has been conducted to recognize emotions [22], [47], since the physiological signals, such as EEG and electromyogram (EMG), can reveal emotion through physical changes. Kolodyazhniy et al. [22] used the features from peripheral physiological signals to represent neutral, fear, and sadness responses to movie excerpts. In [47], Takahashi et al. collected EEG and peripheral physiological signals from 12 participants and classified their response to emotional videos into five classes: joy, sadness, disgust, fear, and relaxed. While conveying important affective information, EEG signals are difficult to control voluntarily [6]. Moreover, EEG, which reflects the cortical electrical activity, has been proved to provide informative characteristics in responses to the emotion states [29], [34], [46], [54].
In recent years, several researches have made efforts to fuse facial expression and physiological signals. In [3], 93% and 89% accuracies were obtained when using facial expressions for recognizing amusement and sadness, respectively. On the other hand, the accuracy for classifying these emotions with physiological signals (including heart rate, systolic blood pressure, skin conductance level, etc.) was 82%. Combining facial expression and physiological signals improved the accuracies to 94% and 98% for amusement and sadness, respectively. In [6], Chang et al. obtained recognition rate of 90% and 88.33% for facial expression and physiological signals (including skin conductivity, finger temperature and heart rate), respectively, while combining the modalities resulted in a rate of 95%. These results indicate that physiological signals can substantially contribute to multi-modal emotion recognition. In [51], Wesley et al. used the combination of a physiological and visual information channel for user studies, where they used the thermal imaging system to obtain a physiological signal from the face. Furthermore, In [40], Pavlidis et al. applied the work of [51] at a longitudinal human performance study. According to [29], [34], [46], [54], among physiological signals EEG holds relevant information for emotion detection, suggesting it to be a suitable supplement to facial expressions. As far as we know, there are few works involving facial expression with EEG for emotion recognition. In [21], Koelstra et al. used facial expressions together with EEG for emotion classification and implicit affective tagging, but they did not consider arousal and valence classes based on emotion keywords.
In this paper, a new approach for multi-modal emotion recognition is proposed by fusing facial expression and EEG. These modalities are used to classify emotions while users are watching videos with emotional content. The paper’s contributions include four parts: (1) emotion recognition from expressions with a new percentage feature; (2) extraction and selection of spectral power and spectral power difference features for EEG; (3) fusion of facial expressions and EEG for valence and arousal recognition on the challenging MAHNOB-HCI database; and (4) a comparison of our approach to human performance for emotion recognition and analysis.
The paper is organized as follows. In Section 2, we briefly introduce the used database, named MAHNOB-HCI database. In Section 3, we present the methods for extracting and fusing facial expression and EEG features. In Section 4, we present the experimental protocol and results of facial expression analysis, EEG classification and multi-modal emotion recognition. Section 5 concludes the paper by giving a short discussion about the results and future work.
Section snippets
Database
Different ways of defining expressions and emotions could be used, depending on the problems, e.g., prototypical expressions including happiness, sadness, surprise, fear, disgust and anger, or using two main dimensions: arousal and valence. The dimension of valence ranges from highly positive to highly negative, whereas the dimension of arousal ranges from calming or soothing to exciting or agitating. This two-dimensional model of valence and arousal [43] integrates the discrete emotional
Facial expression analysis
Facial movements have been studied for emotion (affect) recognition and action unit (facial muscle action) detection. For emotion recognition and action unit detection, the features of facial images have played an important role. Many different kinds of features were used to describe facial expressions, and most of them can be generally categorized into the geometry-based and the appearance-based features. The former ones represent the face geometry, such as the shapes and the locations of
Experiments
The MAHNOB-HCI database includes 527 videos recorded from 27 participants. Five videos were excluded due to missing or corrupted EEG data, so finally there were 522 videos included in the experiments. In our experiment, we employ leave-one-participant-out cross validation. At each step of cross validation, we consider the samples of one participant as the test set, and others as the training set. We report the average classification accuracy over 27 folds.
For obtaining the EPF feature in
Discussion and conclusion
In this paper, multi-modal emotion recognition by combining facial expressions and EEG was studied. For facial expression analysis, four kinds of common feature descriptors were first investigated. Next, each frame in the test video was recognized. Finally, the percentage of each category of recognized frames was utilized as expression percentage features for valence and arousal recognition. For EEG based emotion recognition, spectral power (SP) and spectral power difference (SPD) features were
Acknowledgments
The authors gratefully acknowledge the Academy of Finland, Infotech Oulu, Nokia Foundation, and Tekes (grant 40297/11) for their support for this work.
References (54)
- et al.
Real-time classification of evoked emotions using facial feature tracking and physiological responses
Int. J. Hum. Comput. Stud.
(2008) - et al.
EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
J. Neurosci. Methods
(2004) - et al.
Towards a dynamic expression recognition system under facial occlusion
Pattern Recognit. Lett.
(2012) - et al.
Fusion of facial expressions and EEG for implicit affective tagging
Image Vision Comput.
(2013) - et al.
Subtle facial expression recognition using motion magnification
Pattern Recognit. Lett.
(2009) - et al.
Floating search methods in feature selection
Pattern Recognit. Lett.
(1994) - et al.
Spontaneous facial expression recognition: a robust metric learning approach
Pattern Recognit.
(2014) - et al.
Face description with local binary patterns: Application to face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2006) Estimating the posterior probabilities using the k-nearest neighbor rule
Neural Comput.
(2005)- et al.
Local ordinal contrast pattern histograms for spatiotemporal, lip-based speaker authentication
IEEE Trans. Inf. Forensics Security
(2012)
Libsvm: a library for support vector machines
ACM Trans. Intell. Syst. Technol.
Emotion recognition with consideration of facial expression and physiological signals
Proceeding of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology
Asymmetrical brain activity discriminates between positive and negative affective stimuli in human infants
Science
Inferring emotion from facial expression in social context. A role of self-construal?
J. Eur. Psychol. Students
Expression and the nature of emotion
Two view learning: SVM-2K, theory and practice
Proceeding of Neural Information Processing Systems
The world of emotions is not two-dimensional
Psychol. Sci.
Canonical correlation analysis: an overview with application to learning methods
Neural Comput.
Locality preserving projections
Proceedings of Neural Information Processing Systems
Expression recognition in videos using a weighted component-based feature descriptor
Proceedings of the 17th Scandinavian Conference on Image Analysis
Spatiotemporal local monogenic binary patterns for facial expression recognition
IEEE Signal Process. Lett.
Action unit detection using sparse appearance descriptors in space-time video volumes
Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition
Continuous pain intensity estimation from facial expressions
Proceedings of International Symposium on Advances in Visual Computing
On combining classifiers
IEEE Trans. Pattern Anal. Mach. Intell.
An affective computing approach to physiological emotion specificity: toward subject independent and stimulus independent classification of film induced emotions
Psychophysiology
EEG-based recognition of video-induced emotions: selecting subject-independent feature set
Proceedings of IEEE International Conference on Engineering in Medicine and Biology Society
Multimodal emotion recognition by combining physiological signals and facial expressions: a preliminary study
Proceedings of IEEE International Conference on Engineering in Medicine and Biology Society
Cited by (89)
Cross-subject EEG emotion recognition using multi-source domain manifold feature selection
2023, Computers in Biology and MedicineCharacterizing player responses to surprising events in 2D platform games
2023, Entertainment ComputingCitation Excerpt :For example, Martinez et al. [31] proposed a model for the classification of human facial expressions of emotions (including surprise) in pictures using precise, detailed detection of facial landmarks. Huang et al. [32] proposed an alternative approach for multimodal emotion recognition of surprise by fusing facial expression and electroencephalogram (EEG) data for recognizing emotions from long continuous videos. Shaker et al. [33] presented different types of information collected from game context, player preferences, and perception of the game, as well as the user, features extracted from video recordings of the super mario game and ran experiments with the aim to analyze the player behavior and emotions while playing video games.
A randomized deep neural network for emotion recognition with landmarks detection
2023, Biomedical Signal Processing and ControlBimodal Emotion Recognition Based on Vocal and Facial Features
2023, Procedia Computer ScienceA ResNet deep learning based facial recognition design for future multimedia applications
2022, Computers and Electrical Engineering
- 1
Equal contributions.