Facial expression recognition plays an important role in the field involving human-computer interactions. Given the wide use of convolutional neural networks or other neural network models in automatic image classification systems, high-level features can be automatically learned by hierarchical neural networks. However, the training of CNNs requires large amounts of training data to permit adequate generalization. The traditional scale-invariant feature transform (SIFT) does not need large learning samples to obtain features. In this paper, we proposed a feature extraction method for use in the facial expressions recognition from a single image frame. The hybrid features use a combination of SIFT and deep learning features of different levels extracted from a CNN model. The combined features are adopted to classify expressions using support vector machines. The performance of proposed method is tested using the publicly available extended Cohn-Kanade (CK+) database. To evaluate the generalization ability of our method, several experiments are designed and carried out in a cross-database environment. Compared with the 76.57% accuracy obtained using SIFT-bag of features (BoF) features and the 92.87% accuracy obtained using CNN features, we achieve a FER accuracy of 94.82% using the proposed hybrid SIFT-CNN features. The results of additional cross-database experiments also demonstrate the considerable potential of combining shallow features with deep learning features, and these results are more promising than state-of-the-art models. Combining shallow and deep learning features is effective when the training data are not sufficient to obtain a deep model with considerable generalization ability.

The work was supported by the State Key Program of the National Natural Science of China (61432004, 71571058, 61461045). This work was partially supported by the China Postdoctoral Science Foundation funded project (2017T100447). This research has been partially supported by the National Natural Science Foundation of China under Grant No. 61472117. This work was also supported by the foundational application research of Qinghai Province Science and Technology Fund (No. 2016-ZJ-743). This work was also supported by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).
The authors declare that they have no conflicts of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
