Skip to main content
Log in

Two-stream inter-class variation enhancement network for facial expression recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Automatic facial expression recognition plays a crucial role in computer vision, and pattern recognition. However, most existing deep learning-based facial expression classifiers usually obtain high average accuracy but have poor recognition accuracy for difficult expressions, like fear and disgust. In this paper, we propose a novel end-to-end architecture termed two-stream inter-class variation enhancement network, which learns the high-level semantic features and subtle inter-class variations in a joint fashion. More precisely, the global feature extraction network is used to extract spatial-channel semantic features, and the variations between different expressions are modeled by a distinction-reinforced network. The outputs of these two streams are weighted integrated in the expression classification network. In addition, a class balanced-weighted cross-entropy loss is designed to further improve feature discrimination. Experiment results indicate that the proposed network can significantly improve the recognition of difficult expressions and achieve a satisfactory average recognition accuracy of 73.67% on FER2013, 86.17% on RAFDB, 98.19% on CK+, and 98.85% on Oulu-CASIA, which outperforms the other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ji, Y., Yang, Y., Shen, F., Shen, H.T., Li, X.: A survey of human action analysis in HRI applications. IEEE Trans. Circuits Syst. Video Technol. 30(7), 2114–2128 (2020). https://doi.org/10.1109/TCSVT.2019.2912988

    Article  Google Scholar 

  2. Jeong, M., Ko, B.C.: Driver’s facial expression recognition in real-time for safe driving. Sensors (2018). https://doi.org/10.3390/s18124270

    Article  Google Scholar 

  3. Zhang, F., Zhang, T., Mao, Q., Xu, C.: Joint pose and expression modeling for facial expression recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3359–3368 (2018). https://doi.org/10.1109/CVPR.2018.00354

  4. Kong, F.: Facial expression recognition method based on deep convolutional neural network combined with improved LBP features. Pers. Ubiquitous Comput. 23(3–4), 531–539 (2019). https://doi.org/10.1007/s00779-019-01238-9

    Article  Google Scholar 

  5. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 18(12), 2528–2536 (2016). https://doi.org/10.1109/TMM.2016.2598092

    Article  Google Scholar 

  6. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2. NIPS’14, pp. 1988–1996. MIT Press, Cambridge (2014)

  7. Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2019). https://doi.org/10.1109/TIP.2018.2886767

    Article  MathSciNet  Google Scholar 

  8. Zou, W., Zhang, D., Lee, D.J.: A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl. Intell. 52, 2918–2929 (2021)

    Article  Google Scholar 

  9. Lin, F., Hong, R., Zhou, W., Li, H.: Facial expression recognition with data augmentation and compact feature learning. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 1957–1961 (2018). https://doi.org/10.1109/ICIP.2018.8451039

  10. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593 (2017). https://doi.org/10.1109/CVPR.2017.277

  11. Goodfellow, I.J., Erhan, D., Luc Carrier, P., Courville, A.: Challenges in representation learning: a report on three machine learning contests. Neural Netw. 64, 59–63 (2015). https://doi.org/10.1016/j.neunet.2014.09.005. (Special Issue on “Deep Learning of Representations”)

  12. Li, S., Deng, W.: Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28(1), 356–370 (2019). https://doi.org/10.1109/TIP.2018.2868382

    Article  MathSciNet  MATH  Google Scholar 

  13. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262

  14. Taini, M., Zhao, G., Li, S.Z., Pietikainen, M.: Facial expression recognition from near-infrared video sequences. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008). https://doi.org/10.1109/ICPR.2008.4761697

  15. Shao, J., Qian, Y.: Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 355, 82–92 (2019). https://doi.org/10.1016/j.neucom.2019.05.005

    Article  Google Scholar 

  16. Zhao, S., Cai, H., Liu, H., Zhang, J., Chen, S.: Feature selection mechanism in CNNS for facial expression recognition. In: BMVC (2018)

  17. Zhang, H., Su, W., Yu, J., Wang, Z.: Identity-expression dual branch network for facial expression recognition. IEEE Trans. Cogn. Dev. Syst. 13(4), 898–911 (2021). https://doi.org/10.1109/TCDS.2020.3034807

    Article  Google Scholar 

  18. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 36(2), 405–412 (2020)

    Article  Google Scholar 

  19. Wang, Z.: A new clustering method based on morphological operations. Expert Syst. Appl. 145, 113102 (2020)

    Article  Google Scholar 

  20. Minaee, S., Minaei, M., Abdolrashidi, A.: Deep-emotion: facial expression recognition using attentional convolutional network. Sensors (2021). https://doi.org/10.3390/s21093046

    Article  Google Scholar 

  21. Liang, X., Xu, L., Zhang, W., Zhang, Y., Liu, J., Liu, Z.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. 1–14 (2022)

  22. Saurav, S., Gidde, P., Saini, R., Singh, S.: Dual integrated convolutional neural network for real-time facial expression recognition in the wild. Vis. Comput. 1–14 (2021)

  23. Liu, X., Zhou, F.: Improved curriculum learning using SSM for facial expression recognition. Vis. Comput. 36(8), 1635–1649 (2020)

    Article  Google Scholar 

  24. Zhao, G., Yang, H., Yu, M.: Expression recognition method based on a lightweight convolutional neural network. IEEE Access 8, 38528–38537 (2020). https://doi.org/10.1109/ACCESS.2020.2964752

    Article  Google Scholar 

  25. Georgescu, M.-I., Ionescu, R.T., Popescu, M.: Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7, 64827–64836 (2019). https://doi.org/10.1109/ACCESS.2019.2917266

    Article  Google Scholar 

  26. Xie, S., Hu, H.: Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans. Multimed. 21(1), 211–220 (2019). https://doi.org/10.1109/TMM.2018.2844085

    Article  Google Scholar 

  27. Wang, H., Wei, S., Fang, B.: Facial expression recognition using iterative fusion of MO-HOG and deep features. J. Supercomput. 76(5), 3211–3221 (2020)

    Article  Google Scholar 

  28. Riaz, M.N., Shen, Y., Sohail, M., Guo, M.: exnet: an efficient approach for emotion recognition in the wild. Sensors (2020). https://doi.org/10.3390/s20041087

    Article  Google Scholar 

  29. Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2020)

    Article  Google Scholar 

  30. Gan, Y., Chen, J., Yang, Z., Xu, L.: Multiple attention network for facial expression recognition. IEEE Access 8, 7383–7393 (2020)

    Article  Google Scholar 

  31. Chen, W., Zhang, D., Li, M., Lee, D.-J.: Stcam: spatial-temporal and channel attention module for dynamic facial expression recognition. IEEE Trans. Affect. Comput. (2020). https://doi.org/10.1109/TAFFC.2020.3027340

    Article  Google Scholar 

  32. Sun, X., Xia, P., Ren, F.: Multi-attention based deep neural network with hybrid features for dynamic sequential facial expression recognition. Neurocomputing 444, 378–389 (2021). https://doi.org/10.1016/j.neucom.2019.11.127

    Article  Google Scholar 

  33. Li, J., Jin, K., Zhou, D., Kubota, N., Ju, Z.: Attention mechanism-based CNN for facial expression recognition. Neurocomputing 411, 340–350 (2020). https://doi.org/10.1016/j.neucom.2020.06.014

    Article  Google Scholar 

  34. Huang, M., Zhang, X., Lan, X., Wang, H., Tang, Y.: Convolution by multiplication: accelerated two-stream Fourier domain convolutional neural network for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1431–1442 (2022). https://doi.org/10.1109/TCSVT.2021.3073558

    Article  Google Scholar 

  35. Wu, M., Su, W., Chen, L., Liu, Z., Cao, W., Hirota, K.: Weight-adapted convolution neural network for facial expression recognition in human–robot interaction. IEEE Trans. Syst. Man Cybern. Syst. 51(3), 1473–1484 (2021). https://doi.org/10.1109/TSMC.2019.2897330

    Article  Google Scholar 

  36. Xia, Y., Zheng, W., Wang, Y., Yu, H., Dong, J., Wang, F.-Y.: Local and global perception generative adversarial network for facial expression synthesis. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1443–1452 (2022). https://doi.org/10.1109/TCSVT.2021.3074032

    Article  Google Scholar 

  37. Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/TIP.2021.3093397

    Article  Google Scholar 

  38. Zhu, D., Tian, G., Zhu, L., Wang, W., Wang, B., Li, C.: Lkrnet: a dual-branch network based on local key regions for facial expression recognition. SIViP 15(2), 263–270 (2021)

    Article  Google Scholar 

  39. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015). https://doi.org/10.1109/CVPR.2015.7298682

  40. Shao, J., Cheng, Q.: E-FCNN for tiny facial expression recognition. Appl. Intell. 51, 549–559 (2020)

    Article  Google Scholar 

  41. Tsai, K.-Y., Tsai, Y.-W., Lee, Y.-C., Ding, J.-J., Chang, R.Y.: Frontalization and adaptive exponential ensemble rule for deep-learning-based facial expression recognition system. Signal Process. Image Commun. 96, 116321 (2021). https://doi.org/10.1016/j.image.2021.116321

    Article  Google Scholar 

  42. Liu, X., Jin, L., Han, X., Lu, J., You, J., Kong, L.: Identity-aware facial expression recognition in compressed video. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7508–7514 (2021). https://doi.org/10.1109/ICPR48806.2021.9412820

  43. Zeng, J., Shan, S., Chen, X.: Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 222–237 (2018)

  44. Wu, W., Yin, Y., Wang, Y., Wang, X., Xu, D.: Facial expression recognition for different pose faces based on special landmark detection. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1524–1529 (2018). https://doi.org/10.1109/ICPR.2018.8545725

  45. Ming, Z., Chazalon, J., Muzzamil Luqman, M., Visani, M., Burie, J.-C.: Facelivenet: end-to-end networks combining face verification with interactive facial expression-based liveness detection. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3507–3512 (2018). https://doi.org/10.1109/ICPR.2018.8545274

  46. Li, Y., Zeng, J., Shan, S., Chen, X.: Patch-gated CNN for occlusion-aware facial expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2209–2214 (2018). https://doi.org/10.1109/ICPR.2018.8545853

  47. Wang, W., Sun, Q., Chen, T., Cao, C., Zheng, Z., Xu, G., Qiu, H., Fu, Y.: A Fine-Grained Facial Expression Database for End-to-End Multi-Pose Facial Expression Recognition (2019)

  48. Liu, C., Liu, X., Chen, C., Wang, Q.: Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02483-5

    Article  Google Scholar 

  49. Xie, S., Hu, H., Chen, Y.: Facial expression recognition with two-branch disentangled generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2359–2371 (2021). https://doi.org/10.1109/TCSVT.2020.3024201

    Article  Google Scholar 

  50. Ali, K., Hughes, C.E.: Facial expression recognition by using a disentangled identity-invariant expression representation. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9460–9467 (2021). https://doi.org/10.1109/ICPR48806.2021.9412172

  51. Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2168–2177 (2018). https://doi.org/10.1109/CVPR.2018.00231

  52. Xie, W., Shen, L., Duan, J.: Adaptive weighting of handcrafted feature losses for facial expression recognition. IEEE Trans. Cybern. 51(5), 2787–2800 (2021). https://doi.org/10.1109/TCYB.2019.2925095

    Article  Google Scholar 

  53. Tian, Y., Wen, Z., Xie, W., Zhang, X., Shen, L., Duan, J.: Outlier-suppressed triplet loss with adaptive class-aware margins for facial expression recognition. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 46–50 (2019). https://doi.org/10.1109/ICIP.2019.8802918

  54. Li, K., Jin, Y., Akram, M.W., Han, R., Chen, J.: Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis. Comput. 36(2), 391–404 (2020)

    Article  Google Scholar 

  55. Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 302–309 (2018). https://doi.org/10.1109/FG.2018.00051

  56. Meng, Z., Liu, P., Cai, J., Han, S., Tong, Y.: Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 558–565 (2017). https://doi.org/10.1109/FG.2017.140

  57. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference On Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593. IEEE (2017)

  58. Farzaneh, A.H., Qi, X.: Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1631–1639 (2020). https://doi.org/10.1109/CVPRW50498.2020.00211

  59. Li, Y., Lu, Y., Chen, B., Zhang, Z., Li, J., Lu, G., Zhang, D.: Learning informative and discriminative features for facial expression recognition in the wild. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3178–3189 (2022). https://doi.org/10.1109/TCSVT.2021.3103760

    Article  Google Scholar 

  60. An, F., Liu, Z.: Facial expression recognition algorithm based on parameter adaptive initialization of CNN and LSTM. Vis. Comput. 36(3), 483–498 (2020)

    Article  Google Scholar 

  61. Zhou, B., Cui, Q., Wei, X.-S., Chen, Z.-M.: BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9716–9725 (2020). https://doi.org/10.1109/CVPR42600.2020.00974

  62. Li, Y., Gao, Y., Chen, B., Zhang, Z., Lu, G., Zhang, D.: Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3190–3202 (2022). https://doi.org/10.1109/TCSVT.2021.3103782

  63. Xu, Z., Huang, S., Zhang, Y., Tao, D.: Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1100–1113 (2018). https://doi.org/10.1109/TPAMI.2016.2637331

    Article  Google Scholar 

  64. Zhong, L., Bai, C., Li, J., Chen, T., Li, S., Liu, Y.: A graph-structured representation with BRNN for static-based facial expression recognition. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019), pp. 1–5 (2019). https://doi.org/10.1109/FG.2019.8756615

  65. Gogić, I., Manhart, M., Pandžić, I.S., Ahlberg, J.: Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. 36(1), 97–112 (2020)

    Article  Google Scholar 

  66. Yu, Z., Liu, G., Liu, Q., Deng, J.: Spatio-temporal convolutional features with nested LSTM for facial expression recognition. Neurocomputing 317, 50–57 (2018). https://doi.org/10.1016/j.neucom.2018.07.028

    Article  Google Scholar 

  67. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 3–19. Springer, Cham (2018)

    Chapter  Google Scholar 

  68. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  69. Cui, Y., Jia, M., Lin, T.-Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9260–9269 (2019). https://doi.org/10.1109/CVPR.2019.00949

  70. Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020). https://doi.org/10.1109/TIP.2019.2956143

    Article  MATH  Google Scholar 

  71. Qian, D., Zhou, L., Wang, Y., Wu, C.: Expression recognition based on multiple feature fusion-based convolutional neural network. In: 2021 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA), pp. 66–72 (2021). https://doi.org/10.1109/CogSIMA51574.2021.9475948

  72. Florea, C., Florea, L.M., Badea, M.-S., Vertan, C., Racoviteanu, A.: Annealed label transfer for face expression recognition. In: BMVC (2019)

  73. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollr, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  74. Ni, T., Zhang, C., Gu, X.: Transfer model collaborating metric learning and dictionary learning for cross-domain facial expression recognition. IEEE Trans. Comput. Soc. Syst. 8(5), 1213–1222 (2021). https://doi.org/10.1109/TCSS.2020.3013938

    Article  Google Scholar 

  75. Chen, T., Pu, T., Wu, H., Xie, Y., Liu, L., Lin, L.: Cross-domain facial expression recognition: A unified evaluation benchmark and adversarial graph learning. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3131222

  76. Ji, Y., Hu, Y., Yang, Y., Shen, H.T.: Region attention enhanced unsupervised cross-domain facial emotion recognition. IEEE Trans. Knowl. Data Eng. (2021). https://doi.org/10.1109/TKDE.2021.3136606

    Article  Google Scholar 

  77. Li, Y., Zhang, Z., Chen, B., Lu, G., Zhang, D.: Deep margin-sensitive representation learning for cross-domain facial expression recognition. IEEE Trans. Multim. (2022). https://doi.org/10.1109/TMM.2022.3141604

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Special Project on Basic Research of Frontier Leading Technology of Jiangsu Province of China (Grant No. BK20192004C) and the Natural Science Foundation of Jiangsu Province of China (Grant No. BK20181269).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feipeng Da.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Q., Zhang, Z., Da, F. et al. Two-stream inter-class variation enhancement network for facial expression recognition. Vis Comput 39, 5209–5227 (2023). https://doi.org/10.1007/s00371-022-02655-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02655-3

Keywords

Navigation