Skip to main content
Log in

Effective attention feature reconstruction loss for facial expression recognition in the wild

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Facial expression recognition (FER) in the wild is very challenging due to occlusion, posture, illumination, and other uncontrolled factors. Learning discriminant features for FER using Convolutional Neural Networks is a momentous task for the significant class imbalance, wrong labels, inter-class similarities, and intra-class variations. The traditional method utilizes the Cross entropy loss function to optimize the convolutional network to obtain discriminative features for classification. However, this loss function cannot effectively solve the above problems in practice and cannot contribute to obtaining highly discriminant facial features for further analysis. Center loss improves the learning efficiency by reducing the intra-class distance of similar expressions, while the improvement of inter-class similarity, class imbalance, and generalization is insufficient. In this paper, we propose a lightweight Effective Attention Feature Reconstruction loss (EAFR loss), which can further optimize the feature space and enhance the discriminability of expression. The loss model is composed of the Focal Smoothing loss (FS loss) and the Aggregation-Separation loss (AS loss). Firstly, the FS loss can improve the poor recognition performance caused by imbalanced classes and prevent paranoid knowledge learning behaviors. Meanwhile, AS loss further accurately condenses the intra-class expression features and expands the inter-class distance, which is achieved by using progressive stage max-pooling channel and position attention mechanism and lightweight asymmetric autoencoder model for feature reconstruction. Finally, the EAFR loss joins the above two loss functions to more comprehensively solve the above typical problems for FER in the wild. We validate the proposed loss function on three most commonly used large-scale wild expression datasets (RAF-DB, FERPlus, and AffectNet), and the results show that our model achieves superior performance to several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Acharya D, Huang Z, Pani Paudel D, Van Gool L (2018) Covariance pooling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 367–374. IEEE

  2. Albanie S, Nagrani A, Vedaldi A, Zisserman A (2018) Emotion recognition in speech using cross-modal transfer in the wild. In: Proceedings of the 26th ACM international conference on Multimedia, pp 292–301

  3. Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI), pp 279–283

  4. Cai J, Meng Z, Khan AS, Li Z, O'Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp 302–309. IEEE

  5. Fan X, Deng Z, Wang K, Peng X, Qiao Y (2020) Learning discriminative representation for facial expression recognition from uncertainties. In: 2020 IEEE International Conference on Image Processing (ICIP), pp 903–907. IEEE

  6. Farzaneh AH, Qi X (2020) Discriminant Distribution-Agnostic Loss for Facial Expression Recognition in the Wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 406–407. IEEE

  7. Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  8. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Bengio Y (2013) Challenges in representation learning: A report on three machine learning contests. In: International Conference on Neural Information Processing, pp 117–124. Springer

  9. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 87–102. Springer

  10. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13713–13722. IEEE

  11. Huang C (2017) Combining convolutional neural networks for emotion recognition. In: 2017 IEEE MIT Undergraduate Research Technology Conference (URTC), pp 1–4. IEEE

  12. Karnati M, Seal A, Krejcar O, Yazidi A (2020) Facial expression recognition using local gravitational force descriptor-based deep convolution neural networks. IEEE Trans Instrum Meas 70:1–12

    Google Scholar 

  13. Karnati M, Seal A, Krejcar O, Yazidi A (2021) FER-net: facial expression recognition using deep neural net. Neural Comput Appl 33:9125–9136

    Article  Google Scholar 

  14. Karnati M, Seal A, Yazidi A, Krejcar O (2021) LieNet: a deep convolution neural networks framework for detecting deception. IEEE Trans Cogn Develop Syst. 126(5): 550–569

  15. Li Y, Zeng J, Shan S, Chen X (2018) Patch-gated CNN for occlusion-aware facial expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp 2209–2214. IEEE

  16. Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Asian Conference on Machine Learning (ACML), pp 897–911

  17. Li S, Deng W, Du JP (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference On Computer Vision And Pattern Recognition (CVPR), pp 2852–2861. IEEE

  18. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  19. Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via CF labels and distillation. IEEE Trans Image Process 30:2016–2028

    Article  Google Scholar 

  20. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2980–2988. IEEE

  21. Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 212–220. IEEE

  22. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Mat thews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference On Computer Vision And Pattern Recognition-Workshops (CVPRW), pp 94–101. IEEE

  23. Lyons M, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Third IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp 200–205. IEEE

  24. Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681

  25. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  26. Müller R, Kornblith S, Hinton G (2019) When does label smoothing help? In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS), pp 4694–4703

  27. Ou J, Bai XB, Pei Y, Ma L, Liu W (2010) Automatic facial expression recognition using Gabor filter and expression analysis. In: 2010 Second International Conference on Computer Modeling and Simulation (ICCMS), pp 215–218. IEEE

  28. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 618–626. IEEE

  29. Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816

    Article  Google Scholar 

  30. Shao J, Qian Y (2019) Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 355:82–92

    Article  Google Scholar 

  31. Siqueira H, Magg S, Wermter S (2020) Efficient facial feature learning with wide ensemble-based convolutional neural networks. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 5800–5809

  32. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

  33. Vo TH, Lee GS, Yang HJ, Kim SH (2020) Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8:131988–132001

    Article  Google Scholar 

  34. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6897–6906. IEEE

  35. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  Google Scholar 

  36. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision (ECCV), pp 499–515. Springer

  37. Xia HY, Li C, Tan Y, Li L, Song S (2021) Destruction and reconstruction learning for facial expression recognition. IEEE Multimedia 28(2):20–28

    Article  Google Scholar 

  38. Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM on international conference on multimodal interaction (ICMI), pp 435–442

  39. Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European conference on computer vision (ECCV), pp 222–237

  40. Zhang Z, Luo P, Loy CC, Tang X (2018) From facial expression recognition to interpersonal relation prediction. Int J Comput Vis 126(5):550–569

    Article  MathSciNet  Google Scholar 

  41. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

  42. Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence. 35(4), pp 3510–3519

  43. Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by the National Science Foundation of China under Grant 61966035 and U1803261, by the Autonomous Region Science and Technology Department International Cooperation Project under Grant 2020E01023, by Tianshan Innovation Team Plan Project of Xinjiang Uygur Autonomous Region under Grant 202101642, and by the Funds for Creative Research Groups of Higher Education of Xinjiang Uygur Autonomous Region under Grant XJEDU 2017T002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yurong Qian.

Ethics declarations

Conflict of interest

We wish to submit a new manuscript entitled “Effective Attention Feature Reconstruction Loss for Facial Expression Recognition in the Wild” for consideration in the Neural Computing and Applications Journal. We declare that this work is original and there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, W., Fan, Y. & Qian, Y. Effective attention feature reconstruction loss for facial expression recognition in the wild. Neural Comput & Applic 34, 10175–10187 (2022). https://doi.org/10.1007/s00521-022-07016-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07016-8

Keywords

Navigation