Micro-expression recognition using 3D DenseNet fused Squeeze-and-Excitation Networks
Introduction
Facial expression is a significant part of people’s daily communication, and it conveys abundant emotions. Ref. [1] showed that emotional information transmission is composed of 55% of facial expressions, 38% of voices, and 7% of language. Even with people of different cultural backgrounds and skin colors, they can still communicate emotionally through facial expressions though the language is not clear. Facial expressions originate from the differences in texture and geometry caused by the movement of facial muscle tissues, the regularity of facial muscle movement is universal to all human beings. Therefore, facial expression occupies an irreplaceable position at the beginning of human existence.
Generally, expressions can be divided into macro-expression and micro-expression by their muscle movement range. Macro-expression lasts for a longer time (i.e., 3/4 to 2 s) and possesses a large range of muscle movement. Therefore, it is easy to find the existence of macro-expression through naked eyes. However, in psychological [2], experts have proved that macro-expression is deceptive in various life situations, so it cannot represent people’s true feelings. On the contrary, the duration of micro-expression is short (i.e., 1/25 to 1/5 s), and the motion range is relatively weak [3]. Therefore, it is a challenge for human beings to discover and recognize micro-expressions with naked eyes. Compared with macro-expressions, micro-expressions appear unconsciously on the face and can reveal the true feelings in one’s heart. In 1966, Haggard, etc. firstly discovered micro-expression, they believed that micro-expression is related to people’s self-protection mechanism [4]. Since then, increasing numbers of researchers began to focus on micro-expression recognition (MER) [5], [6], [7]. After decades of research and development, micro-expression began to be widely applied in medical treatment [8], lie detection [9],security systems [10], etc.
To realize the application of micro-expression recognition, researchers have begun to study the relevant theoretical knowledge extremely early. In 1997, Ekman [11] established a Facial Motion Coding System (FACS) to describe the relationship between facial muscle motion and facial expression. According to the anatomical characteristics of facial muscles, FACS divides them into several independent motion units (AUs) to describe the intensity and position of facial expressions. A facial expression consists of single or multiple different facial motion units, such as happy usually consists of AU6+AU12. The micro-expression AU keeps low intensity, and in most cases, only a single motion unit is found to change. In 2002, Ekman developed a micro-expression training tool METT to train people’s ability to recognize micro-expressions,he believed that individuals who use METT training programs can improve their ability to recognize micro-expressions by 30% to 40% within 1.5 h. In 2009, Polikovsky et al. [12] proposed a micro-expression database, which uses a 3D Gradient histogram to extract facial motion features for micro-expression, thus MER in machine learning has become increasingly popular. In 2011, Pfister et al. [13] established the SMIC spontaneous micro-expression database. The images in SMIC database are closer to the micro-expression in the real environment, which makes the research of MER more reliable.
Till now, researchers have proposed various methods to achieve micro-expression classification. In the early stage [1], [14], MER researches mainly include spatiotemporal Local Binary Pattern (LBP), Local Binary Patterns from Three Orthogonal Planes (LBP-TOP) , and Directional Mean Optical Flow Feature, etc. However, all of them are based on time-consuming hand-crafted features. Their main drawback is that they can only extract shallow high-dimensional features from the original video, and lack effective information that can be used to express abstract further features.
With the rapid promotion of computer hardware, deep learning received extensive attention, and its wide applications in various fields demonstrate its striking efficiency. Based on deep learning methods, Ref. [15] proposed an Enriched Long-term Recurrent Convolutional Network (ELRCN), the network first applied the CNN module to encode each micro-expression frame into a feature vector, and then used the long short-term memory network (LSTM) to perform predictions. The method achieves 60.98% on CASME dataset with leave-one-subject-out cross-validation (LOSOCV) protocol. Notably, Ref. [15] achieved a relatively high recognition rate, but it was still lower than quite a few traditional extraction methods [16], [17], [18]. This is due to the limitation of the small sample size in MER. Concretely, deep learning methods depend on a large-scale dataset to extract deeper features. To increase the amount of data size, Ref. [19] implemented data augmentation to generate additional composite images from existing datasets, and the results outperformed massive traditional methods in MER, proving that proper preprocessing can improve recognition for deep learning methods.
In addition to the limitations of datasets, overfitting problems, redundant parameters, and computations also significantly restrain micro-expression recognition based on deep learning. To reduce unnecessary computation and improve the generalization ability of the model, we choose DenseNet to improve the extraction of micro-expressions. In addition, DenseNet can effectively alleviate the problem of common vanishing gradient with its unique connection mode.
This paper proposes a new and robust feature learning model with effective preprocessing, which can efficiently represent the subtle facial muscle movement in the MER process. The main contributions of this paper are summarized as follows:
- 1.
Extend the collected apex frames (i.e., Data augmentation) from three public datasets (i.e., SMIC, CAS(ME)2, and CASME-II) to alleviate the small sample size limitation, and then exploit Euler video magnification to better represent the facial muscle movement details.
- 2.
Perform the three-dimensional extension of the convolution kernel and pooling layer in the DenseNet model (i.e., 2D-DenseNet to 3D-DenseNet), which can better capture the spatial and temporal information in the video sequences, thus extract deeper facial muscle features.
- 3.
The attention mechanism is integrated by squeezing and exciting the 3D DenseNet channels, thus the network (i.e., 3D SE-DenseNet) can adaptively assign feature weights, strengthen the extraction of effective features and suppress the useless features. Moreover, we also experimentally compare the variant of 3D SE-DenseNet to observe the impact of SENet in different critical positions of DenseNet.
The rest chapters of this paper are organized as follows: In Section 2, we briefly review the related work. Section 3 presents the main methods in this paper. The overview of the datasets and the experimental settings is described in Section 4. Section 5 compares and analyzes our results with other representative methods as well as performs various contrast experiments. Finally, we make a brief conclusion of our approach in Section 6.
Section snippets
Related work
Most methods in literatures combine image preprocessing and feature extraction. In the following subsections, quite a few representative preprocessing techniques and deep learning based feature extractors utilized in MER will be discussed and explained.
Proposed method
Although CNN, LSTM and their improved variants have obtained impressive performance for micro-expression recognition [25], [26], [27], [33], the problem of gradient disappearance to some extent will still occur in long-term tasks (i.e., CASME-II). Therefore, this paper tends to solve the problem of gradient disappearance for micro-expression recognition in long video sequences and to alleviate the small sample size limitation.
The framework proposed in this paper is demonstrated in Fig. 1, which
Dataset
Experiments are performed on three standard micro- expression datasets including the Spontaneous micro-expression corpus (SMIC) [33], Chinese Academy of Sciences Macro and Micro-expressions (CAS(ME)2) [2] and Chinese Academy of Sciences Micro-expression-II (CASME-II) [34]. More details regarding these datasets will be described below.
Recognition performance
The cross-validation method (i.e., hold-out protocol) is widely used to evaluate the prediction performance, especially the performance of the trained model on new data, which can reduce the over-fitting to a certain extent. As shown in Table 3 on three public datasets(i.e., SMIC, CASME-II, and CAS(ME)2), this paper compared our two deep models (i.e., 3D-DenseNet and SE-DenseNet) with the representative state-of-the-art deep learning-based methods [21], [29], [30], [32], [33]. As shown in
Conclusion
In this paper, we proposed a novel micro-expression recognition approach based on a densely connected convolutional network with SENet (3D SE-DenseNet). The proposed DenseNet model uses three-dimensional processing (i.e., 3D-DenseNet) and SE-block to adaptively assign weights to feature channels to enhance the learning ability, and to model the spatiotemporal deformation of the micro-expression sequence. Firstly, the augmented and EVM-amplified video sequences are computed from apex frames.
CRediT authorship contribution statement
Linqin Cai: Conceptualization, Methodology, Software, Investigation, Writing – review & editing. Hao Li: Experiment, Data processing, Writing – original draft, Visualization, Data curation. Wei Dong: Experiment, Software, Validation. Haodu Fang: Data preprocessing, Validation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (34)
- et al.
Review of micro-expression spotting and recognition in video sequences
Virtual Real. Intell. Hardw.
(2021) - et al.
Micro-expression identification and categorization using a facial dynamics map
IEEE Trans. Affect. Comput.
(2017) - et al.
Micro-expression identification and categorization using a facial dynamics map
IEEE Trans. Affect. Comput.
(2017) - et al.
Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns
Neurocomputing
(2016) - et al.
Eulerian video magnification for revealing subtle changes in the world
ACM Trans. Graph.
(2012) - et al.
Micro-attention for micro-expression recognition
Neurocomputing
(2020) - et al.
A spontaneous micro-expression database: Inducement, collection and baseline
- et al.
CAS(ME)2: A database for spontaneous macro-expression and micro-expression spotting and recognition
IEEE Trans. Affect. Comput.
(2018) - et al.
Learning bases of activity for facial expression recognition
IEEE Trans. Image Process.
(2017) - et al.
Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy
Face recognition and micro-expression recognition based on discriminant tensor subspace analysis plus extreme learning machine
Neural Process. Lett.
Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns
Neurocomputing
Diagnosing clinical manifestation of apathy using machine learning and micro-facial expressions detection
Hiding true emotions: micro-expressions in eyes retrospectively concealed by mouth movements
Sci. Rep.
Facial age and expression synthesis using ordinal ranking adversarial networks
IEEE Trans. Inf. Forensics Secur.
Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor
Cited by (16)
EDGCNet: Joint dynamic hyperbolic graph convolution and dual squeeze-and-attention for 3D point cloud segmentation
2024, Expert Systems with ApplicationsDeep3DCANN: A Deep 3DCNN-ANN framework for spontaneous micro-expression recognition
2023, Information SciencesFacial Deepfake Detection Using Gaussian Processes
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Challenges and Emerging Trends for Machine Reading of the Mind from Facial Expressions
2024, SN Computer ScienceFrom methods to datasets: a detailed study on facial emotion recognition
2023, Applied Intelligence