Skip to main content
Log in

Multi-level spatial and semantic enhancement network for expression recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Facial expression recognition (FER) on real world databases is an active and challenging research topic. Existing CNN-based facial expression classifiers usually have good performance on common expressions, including happy and surprise, but have lower accuracy on difficult expressions, such as disgust and fear. Two main factors are responsible for this problem. Firstly, intra-class variation makes classification of difficult expressions more complex than other expressions. Secondly, severe data imbalance of difficult expressions in most FER datasets leads to overfitting during training. In this work, a new network architecture is proposed to address the intra-class variation problem. The proposed model consists of a spatial enhancement module and a semantic aggregation module to enhance fine-level expression features and high-level semantic features. To alleviate the data imbalance problem, an iterative learning method is introduced to collect difficult expression samples. New samples with inconsistent labels are classified by using a fuzzy clustering algorithm. The proposed FER framework has been evaluated on three real world expression datasets. Experimental results demonstrate that the proposed method significantly improved the recognition accuracy of difficult expressions and achieved top performance compared with state-of-the-art works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Du S, Tao Y, Martinez A (2014) Compound facial expressions of emotion. Proc Natl Acad Sci 111:1454–1462

    Article  Google Scholar 

  2. Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. IEEE/CVF conference on computer vision and pattern recognition, pp 2584–2593

  3. Zhang F, Zhang T, Mao Q, Xu C (2018) Joint pose and expression modeling for facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition, pp 3359–3368

  4. Lin F, Hong R, Zhou W, Li H (2018) Facial expression recognition with data augmentation and compact feature learning. IEEE international conference on image processing, https://doi.org/10.1109/ICIP.2018.8451039

  5. Agarwal S, Mukherjee DP (2019) Synthesis of realistic facial expressions using expression map. IEEE Trans Multimed 21:902–914

    Article  Google Scholar 

  6. Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28:2439–2450. https://doi.org/10.1109/TIP.2018.2886767

    Article  MathSciNet  Google Scholar 

  7. Kim D, Baddar WJ, Jang J, Ro YM (2017) Multi-objective based spatial-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10:223–236. https://doi.org/10.1109/TAFFC.2017.2695999

    Article  Google Scholar 

  8. Ma H, Celik T (2019) Fer-net facial expression recognition using densely connected convolutional network. Electron Lett 55:184–186

    Article  Google Scholar 

  9. Zhang X, Ma Y (2019) Learning of complicate facial expression categories. International conference on image, video and signal process

  10. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH (2015) Challenges in representation learning: A report on three machine learning contests. Neural Netw 64:59–63. https://doi.org/j.neunet.2014.09.005

    Article  Google Scholar 

  11. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International conference on learning representations

  12. Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE/CVF conference on computer vision and pattern recognition

  13. Fu C, Liu W, Ranga A, Tyagi A, Berg A (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  14. Kuo C, Lai S, Sarkis M (2018) A compact deep learning model for robust facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition workshops, https://doi.org/10.1109/CVPRW.2018.00286

  15. Xie S, Hu H (2019) Facial expression recognition using hierarchical features with deep comprehensive multi-patches aggregation convolutional neural networks. IEEE Trans Multimed 21:211–220

    Article  Google Scholar 

  16. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10:18–31

    Article  Google Scholar 

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE/CVF Conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  18. Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware convolutional neural network for facial expression recognition. IEEE international conference on automatic face and gesture recognition, pp 558–565

  19. Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649

    Article  Google Scholar 

  20. Zia MS, Hussain M, Jaffar MA (2018) A novel spontaneous facial expression recognition using dynamically weighted majority voting based ensemble classifier. Multimed Tools Appl 77:25537–25567

    Article  Google Scholar 

  21. Li D, Wen G, Li X, Cai X (2019) Graph-based dynamic ensemble pruning for facial expression recognition. Appl Intell 49:3188–3206

    Article  Google Scholar 

  22. Li H, Wen G (2019) Sample awareness-based personalized facial expression recognition. Appl Intell 49:2956–2969

    Article  Google Scholar 

  23. Lopes A, Aguiar E, Souza AD, Oliveira-Santos T (2017) Facial expression recognition with convolutional neural networks: Coping with few data and the training sample order. Pattern Recogn 61:610–628

    Article  Google Scholar 

  24. Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications 91:464–471. https://doi.org/10.1016/j.eswa.2017.09.030

    Article  Google Scholar 

  25. Li S, Deng W (2016) Real world expression recognition: A highly imbalanced detection problem. IEEE international conference on biometrics, pp 1–6. https://doi.org/10.1109/ICB.2016.7550074

  26. Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. IEEE International conference on computer vision workshops, pp 2106–2112. https://doi.org/10.1109/ICCVW.2011.6130508

  27. Ekman P, Friesen W (1978) Facial action coding system: A technique for the measurement of facial movement. Facial action coding system

  28. Liu M, Li S, Shan S, Chen X (2015) Au-inspired deep networks for facial expression feature learning. Neurocomputing 159:126–136

    Article  Google Scholar 

  29. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. International conference on learning representations

  30. Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. European conference on computer vision, pp 1–16

  31. Wang Z (2020) A new clustering method based on morphological operations. Expert Sys Appl, vol 145

  32. Wang Z (2017) Determining the clustering centers by slope difference distribution. IEEE Access 5:10995–11002

    Article  Google Scholar 

  33. Pantic M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. IEEE International conference on multimedia and expo, pp 317–321. https://doi.org/10.1109/ICME.2005.1521424

  34. Lucey P, Cohn JF, Kanade T, Saragih J (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. IEEE conference on computer vision and pattern recognition workshops, pp 94–101

  35. Lyons MJ, Akamatsu S, Kamachi M, Gyoba J, Budynek J (1998) The japanese female facial expression (jaffe) database. Proceedings of third international conference on automatic face and gesture recognition, pp 14–16

  36. Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36:1532–1545

    Article  Google Scholar 

  37. Zhao C, Chen K, Wei Z, Chen Y, Miao D, Wang W (2019) Multilevel triplet deep learning model for person re-identification. Pattern Recogn Lett 117:161–168

    Article  Google Scholar 

  38. Zhao C, Lv X, Zhang Z, Zuo W, Wu J, Miao D (2020) Deep fusion feature representation learning with hard mining center-triplet loss for person re-identification. IEEE Trans Multimed 22:3180–3195

    Article  Google Scholar 

  39. Li S, Den W (2020) A deeper look at facial expression dataset bias. IEEE Trans Affect Comput, pp 1–13

  40. Nguyen D, Kim S, Lee G, Yang H, Na I, Kim S (2020) Facial expression recognition using a temporal ensemble of multi-level convolutional neural networks. IEEE Trans Affect Comput, pp 1–12

  41. Georgescu M, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  42. Tsai KY, Ding JJ, Lee YC (2018) Frontalization with adaptive exponentially-weighted average ensemble rule for deep learning based facial expression recognition. IEEE Asia Pacific conference on circuits and systems, pp 447–450

  43. Acharya D, Huang Z, Paudel D, Gool LV (2018) Covariance pooling for facial expression recognition. IEEE conference on computer vision and pattern recognition, pp 2584–2593

  44. Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingdong Ma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China under Grant 61461039.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Wang, X. & Wei, L. Multi-level spatial and semantic enhancement network for expression recognition. Appl Intell 51, 8565–8578 (2021). https://doi.org/10.1007/s10489-021-02254-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02254-0

Keywords

Navigation