Abstract
Automatic facial expression recognition plays a crucial role in computer vision, and pattern recognition. However, most existing deep learning-based facial expression classifiers usually obtain high average accuracy but have poor recognition accuracy for difficult expressions, like fear and disgust. In this paper, we propose a novel end-to-end architecture termed two-stream inter-class variation enhancement network, which learns the high-level semantic features and subtle inter-class variations in a joint fashion. More precisely, the global feature extraction network is used to extract spatial-channel semantic features, and the variations between different expressions are modeled by a distinction-reinforced network. The outputs of these two streams are weighted integrated in the expression classification network. In addition, a class balanced-weighted cross-entropy loss is designed to further improve feature discrimination. Experiment results indicate that the proposed network can significantly improve the recognition of difficult expressions and achieve a satisfactory average recognition accuracy of 73.67% on FER2013, 86.17% on RAFDB, 98.19% on CK+, and 98.85% on Oulu-CASIA, which outperforms the other state-of-the-art methods.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Ji, Y., Yang, Y., Shen, F., Shen, H.T., Li, X.: A survey of human action analysis in HRI applications. IEEE Trans. Circuits Syst. Video Technol. 30(7), 2114–2128 (2020). https://doi.org/10.1109/TCSVT.2019.2912988
Jeong, M., Ko, B.C.: Driver’s facial expression recognition in real-time for safe driving. Sensors (2018). https://doi.org/10.3390/s18124270
Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3359–3368 (2018). https://doi.org/10.1109/CVPR.2018.00354
Kong, F.: Facial expression recognition method based on deep convolutional neural network combined with improved LBP features. Pers. Ubiquitous Comput. 23(3–4), 531–539 (2019). https://doi.org/10.1007/s00779-019-01238-9
Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 18(12), 2528–2536 (2016). https://doi.org/10.1109/TMM.2016.2598092
Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2. NIPS’14, pp. 1988–1996. MIT Press, Cambridge (2014)
Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2019). https://doi.org/10.1109/TIP.2018.2886767
Zou, W., Zhang, D., Lee, D.J.: A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl. Intell. 52, 2918–2929 (2021)
Lin, F., Hong, R., Zhou, W., Li, H.: Facial expression recognition with data augmentation and compact feature learning. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 1957–1961 (2018). https://doi.org/10.1109/ICIP.2018.8451039
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593 (2017). https://doi.org/10.1109/CVPR.2017.277
Goodfellow, I.J., Erhan, D., Luc Carrier, P., Courville, A.: Challenges in representation learning: a report on three machine learning contests. Neural Netw. 64, 59–63 (2015). https://doi.org/10.1016/j.neunet.2014.09.005. (Special Issue on “Deep Learning of Representations”)
Li, S., Deng, W.: Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28(1), 356–370 (2019). https://doi.org/10.1109/TIP.2018.2868382
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262
Taini, M., Zhao, G., Li, S.Z., Pietikainen, M.: Facial expression recognition from near-infrared video sequences. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008). https://doi.org/10.1109/ICPR.2008.4761697
Shao, J., Qian, Y.: Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 355, 82–92 (2019). https://doi.org/10.1016/j.neucom.2019.05.005
Zhao, S., Cai, H., Liu, H., Zhang, J., Chen, S.: Feature selection mechanism in CNNS for facial expression recognition. In: BMVC (2018)
Zhang, H., Su, W., Yu, J., Wang, Z.: Identity-expression dual branch network for facial expression recognition. IEEE Trans. Cogn. Dev. Syst. 13(4), 898–911 (2021). https://doi.org/10.1109/TCDS.2020.3034807
Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 36(2), 405–412 (2020)
Wang, Z.: A new clustering method based on morphological operations. Expert Syst. Appl. 145, 113102 (2020)
Minaee, S., Minaei, M., Abdolrashidi, A.: Deep-emotion: facial expression recognition using attentional convolutional network. Sensors (2021). https://doi.org/10.3390/s21093046
Liang, X., Xu, L., Zhang, W., Zhang, Y., Liu, J., Liu, Z.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. 1–14 (2022)
Saurav, S., Gidde, P., Saini, R., Singh, S.: Dual integrated convolutional neural network for real-time facial expression recognition in the wild. Vis. Comput. 1–14 (2021)
Liu, X., Zhou, F.: Improved curriculum learning using SSM for facial expression recognition. Vis. Comput. 36(8), 1635–1649 (2020)
Zhao, G., Yang, H., Yu, M.: Expression recognition method based on a lightweight convolutional neural network. IEEE Access 8, 38528–38537 (2020). https://doi.org/10.1109/ACCESS.2020.2964752
Georgescu, M.-I., Ionescu, R.T., Popescu, M.: Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7, 64827–64836 (2019). https://doi.org/10.1109/ACCESS.2019.2917266
Xie, S., Hu, H.: Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans. Multimed. 21(1), 211–220 (2019). https://doi.org/10.1109/TMM.2018.2844085
Wang, H., Wei, S., Fang, B.: Facial expression recognition using iterative fusion of MO-HOG and deep features. J. Supercomput. 76(5), 3211–3221 (2020)
Riaz, M.N., Shen, Y., Sohail, M., Guo, M.: exnet: an efficient approach for emotion recognition in the wild. Sensors (2020). https://doi.org/10.3390/s20041087
Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2020)
Gan, Y., Chen, J., Yang, Z., Xu, L.: Multiple attention network for facial expression recognition. IEEE Access 8, 7383–7393 (2020)
Chen, W., Zhang, D., Li, M., Lee, D.-J.: Stcam: spatial-temporal and channel attention module for dynamic facial expression recognition. IEEE Trans. Affect. Comput. (2020). https://doi.org/10.1109/TAFFC.2020.3027340
Sun, X., Xia, P., Ren, F.: Multi-attention based deep neural network with hybrid features for dynamic sequential facial expression recognition. Neurocomputing 444, 378–389 (2021). https://doi.org/10.1016/j.neucom.2019.11.127
Li, J., Jin, K., Zhou, D., Kubota, N., Ju, Z.: Attention mechanism-based CNN for facial expression recognition. Neurocomputing 411, 340–350 (2020). https://doi.org/10.1016/j.neucom.2020.06.014
Huang, M., Zhang, X., Lan, X., Wang, H., Tang, Y.: Convolution by multiplication: accelerated two-stream Fourier domain convolutional neural network for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1431–1442 (2022). https://doi.org/10.1109/TCSVT.2021.3073558
Wu, M., Su, W., Chen, L., Liu, Z., Cao, W., Hirota, K.: Weight-adapted convolution neural network for facial expression recognition in human–robot interaction. IEEE Trans. Syst. Man Cybern. Syst. 51(3), 1473–1484 (2021). https://doi.org/10.1109/TSMC.2019.2897330
Xia, Y., Zheng, W., Wang, Y., Yu, H., Dong, J., Wang, F.-Y.: Local and global perception generative adversarial network for facial expression synthesis. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1443–1452 (2022). https://doi.org/10.1109/TCSVT.2021.3074032
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/TIP.2021.3093397
Zhu, D., Tian, G., Zhu, L., Wang, W., Wang, B., Li, C.: Lkrnet: a dual-branch network based on local key regions for facial expression recognition. SIViP 15(2), 263–270 (2021)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015). https://doi.org/10.1109/CVPR.2015.7298682
Shao, J., Cheng, Q.: E-FCNN for tiny facial expression recognition. Appl. Intell. 51, 549–559 (2020)
Tsai, K.-Y., Tsai, Y.-W., Lee, Y.-C., Ding, J.-J., Chang, R.Y.: Frontalization and adaptive exponential ensemble rule for deep-learning-based facial expression recognition system. Signal Process. Image Commun. 96, 116321 (2021). https://doi.org/10.1016/j.image.2021.116321
Liu, X., Jin, L., Han, X., Lu, J., You, J., Kong, L.: Identity-aware facial expression recognition in compressed video. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7508–7514 (2021). https://doi.org/10.1109/ICPR48806.2021.9412820
Zeng, J., Shan, S., Chen, X.: Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 222–237 (2018)
Wu, W., Yin, Y., Wang, Y., Wang, X., Xu, D.: Facial expression recognition for different pose faces based on special landmark detection. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1524–1529 (2018). https://doi.org/10.1109/ICPR.2018.8545725
Ming, Z., Chazalon, J., Muzzamil Luqman, M., Visani, M., Burie, J.-C.: Facelivenet: end-to-end networks combining face verification with interactive facial expression-based liveness detection. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3507–3512 (2018). https://doi.org/10.1109/ICPR.2018.8545274
Li, Y., Zeng, J., Shan, S., Chen, X.: Patch-gated CNN for occlusion-aware facial expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2209–2214 (2018). https://doi.org/10.1109/ICPR.2018.8545853
Wang, W., Sun, Q., Chen, T., Cao, C., Zheng, Z., Xu, G., Qiu, H., Fu, Y.: A Fine-Grained Facial Expression Database for End-to-End Multi-Pose Facial Expression Recognition (2019)
Liu, C., Liu, X., Chen, C., Wang, Q.: Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02483-5
Xie, S., Hu, H., Chen, Y.: Facial expression recognition with two-branch disentangled generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2359–2371 (2021). https://doi.org/10.1109/TCSVT.2020.3024201
Ali, K., Hughes, C.E.: Facial expression recognition by using a disentangled identity-invariant expression representation. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9460–9467 (2021). https://doi.org/10.1109/ICPR48806.2021.9412172
Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2168–2177 (2018). https://doi.org/10.1109/CVPR.2018.00231
Xie, W., Shen, L., Duan, J.: Adaptive weighting of handcrafted feature losses for facial expression recognition. IEEE Trans. Cybern. 51(5), 2787–2800 (2021). https://doi.org/10.1109/TCYB.2019.2925095
Tian, Y., Wen, Z., Xie, W., Zhang, X., Shen, L., Duan, J.: Outlier-suppressed triplet loss with adaptive class-aware margins for facial expression recognition. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 46–50 (2019). https://doi.org/10.1109/ICIP.2019.8802918
Li, K., Jin, Y., Akram, M.W., Han, R., Chen, J.: Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis. Comput. 36(2), 391–404 (2020)
Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 302–309 (2018). https://doi.org/10.1109/FG.2018.00051
Meng, Z., Liu, P., Cai, J., Han, S., Tong, Y.: Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 558–565 (2017). https://doi.org/10.1109/FG.2017.140
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference On Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593. IEEE (2017)
Farzaneh, A.H., Qi, X.: Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1631–1639 (2020). https://doi.org/10.1109/CVPRW50498.2020.00211
Li, Y., Lu, Y., Chen, B., Zhang, Z., Li, J., Lu, G., Zhang, D.: Learning informative and discriminative features for facial expression recognition in the wild. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3178–3189 (2022). https://doi.org/10.1109/TCSVT.2021.3103760
An, F., Liu, Z.: Facial expression recognition algorithm based on parameter adaptive initialization of CNN and LSTM. Vis. Comput. 36(3), 483–498 (2020)
Zhou, B., Cui, Q., Wei, X.-S., Chen, Z.-M.: BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9716–9725 (2020). https://doi.org/10.1109/CVPR42600.2020.00974
Li, Y., Gao, Y., Chen, B., Zhang, Z., Lu, G., Zhang, D.: Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3190–3202 (2022). https://doi.org/10.1109/TCSVT.2021.3103782
Xu, Z., Huang, S., Zhang, Y., Tao, D.: Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1100–1113 (2018). https://doi.org/10.1109/TPAMI.2016.2637331
Zhong, L., Bai, C., Li, J., Chen, T., Li, S., Liu, Y.: A graph-structured representation with BRNN for static-based facial expression recognition. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019), pp. 1–5 (2019). https://doi.org/10.1109/FG.2019.8756615
Gogić, I., Manhart, M., Pandžić, I.S., Ahlberg, J.: Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. 36(1), 97–112 (2020)
Yu, Z., Liu, G., Liu, Q., Deng, J.: Spatio-temporal convolutional features with nested LSTM for facial expression recognition. Neurocomputing 317, 50–57 (2018). https://doi.org/10.1016/j.neucom.2018.07.028
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 3–19. Springer, Cham (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Cui, Y., Jia, M., Lin, T.-Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9260–9269 (2019). https://doi.org/10.1109/CVPR.2019.00949
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020). https://doi.org/10.1109/TIP.2019.2956143
Qian, D., Zhou, L., Wang, Y., Wu, C.: Expression recognition based on multiple feature fusion-based convolutional neural network. In: 2021 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA), pp. 66–72 (2021). https://doi.org/10.1109/CogSIMA51574.2021.9475948
Florea, C., Florea, L.M., Badea, M.-S., Vertan, C., Racoviteanu, A.: Annealed label transfer for face expression recognition. In: BMVC (2019)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollr, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https://doi.org/10.1109/TPAMI.2018.2858826
Ni, T., Zhang, C., Gu, X.: Transfer model collaborating metric learning and dictionary learning for cross-domain facial expression recognition. IEEE Trans. Comput. Soc. Syst. 8(5), 1213–1222 (2021). https://doi.org/10.1109/TCSS.2020.3013938
Chen, T., Pu, T., Wu, H., Xie, Y., Liu, L., Lin, L.: Cross-domain facial expression recognition: A unified evaluation benchmark and adversarial graph learning. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3131222
Ji, Y., Hu, Y., Yang, Y., Shen, H.T.: Region attention enhanced unsupervised cross-domain facial emotion recognition. IEEE Trans. Knowl. Data Eng. (2021). https://doi.org/10.1109/TKDE.2021.3136606
Li, Y., Zhang, Z., Chen, B., Lu, G., Zhang, D.: Deep margin-sensitive representation learning for cross-domain facial expression recognition. IEEE Trans. Multim. (2022). https://doi.org/10.1109/TMM.2022.3141604
Acknowledgements
This work was supported by the Special Project on Basic Research of Frontier Leading Technology of Jiangsu Province of China (Grant No. BK20192004C) and the Natural Science Foundation of Jiangsu Province of China (Grant No. BK20181269).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, Q., Zhang, Z., Da, F. et al. Two-stream inter-class variation enhancement network for facial expression recognition. Vis Comput 39, 5209–5227 (2023). https://doi.org/10.1007/s00371-022-02655-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02655-3