Skip to main content
Log in

SCL-FExR: supervised contrastive learning approach for facial expression Recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Facial Expression Recognition (FER) is a significant field of computer vision and has emerged as a crucial component of Human-computer interaction. Breakthroughs in self-supervised representation learning have resulted from a renaissance of work in contrastive learning, following the state-of-the-art performance in unsupervised training of deep image models. However, due to the random sampling of false negatives for contrastive loss calculation, the representation quality might degrade in FER. In this work, we extend the self-supervised contrastive learning technique to the fully supervised setting to effectively exploit label information in classifying facial expressions. Therefore, we propose a Supervised Contrastive Learning- Facial Expression Recognition (SCL-FExR) system to create a model which is robust for real-world emotion detection. Our goal is not to compete with the highly complex state-of-the-art CNN-based Deep Neural Network, but to establish a method that can be incorporated to achieve similar performance but with less-complex models and more robustness. We demonstrate the effectiveness of the suggested method using three FER datasets: FER2013, AffectNet, and CK+. On FER2013, we achieved a similar accuracy of 76%, establishing a method that can be incorporated into less complex CNN-based Deep Neural Networks to achieve robustness and be significantly more noise-resistant. The secondary aim is to show how a data-based strategy may be used to train very complicated deep learning models instead of a model-based approach, which solves the issue of computational expenditure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

1. The datasets generated during and/or analysed during the current study are available in the FER2013 repository, https://www.kaggle.com/datasets/msambare/fer2013

2. The datasets generated during and/or analysed during the current study are available in the AffectNet repository, http://mohammadmahoor.com/affectnet/, published in Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor, “AffectNet: A New Database for Facial Expression, Valence, and Arousal Computation in the Wild”, IEEE Transactions on Affective Computing, 2017.

3. The datasets generated during and/or analysed during the current study are available in the CK+ repository, https://paperswithcode.com/dataset/ck, published in the article P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, “The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010, pp. 94–101, https://doi.org/10.1109/CVPRW.2010.5543262.

References

  1. Ahonen T, Hadid, A., Pietikäinen, M. (2004) Face recognition with local binary patterns. In European Conference on Computer Vision; Springer: Berlin, Germany. pp. 469–481

  2. Alex K, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25

  3. Bau D, Zhou B, Khosla A, Oliva A, Torralba A (2017) Network dissection: Quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6541–6549

  4. Bisogni C, Castiglione A, Hossain S, Narducci F, Umer S (2022) Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Transac Indust Inform 18(8):5619–5627

    Article  Google Scholar 

  5. Breuer R, Kimmel R (2017) A deep learning perspective on the origin of facial expressions. arXiv, arXiv:1705.01842.

  6. Carrier PL, Courville A, Goodfellow IJ; Mirza M; Bengio Y (2013) FER-2013 face database; Universit de Montral: Montreal, QC, Canada

  7. Chaitanya K, Erdil E, Karani N, Konukoglu E (2020) Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv Neural Inf Proces Syst 33:12546–12558

    Google Scholar 

  8. Chen L, Bentley P, Mori K, Misawa K, Fujiwara M, Rueckert D (2019) Self-supervised learning for medical image analysis using image context restoration. Med Image Anal 58:101539

    Article  Google Scholar 

  9. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. pp. 1597–1607

  10. Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1422–1430

  11. Dosovitskiy A, Springenberg JT, Riedmiller M, Brox T (2014) Discriminative unsupervised feature learning with convolutional neural networks. Advances in neural information processing systems 27

  12. Fu R, Hu Q, Dong X, Guo Y, Gao Y, Li B (2020) Axiom-based grad-cam: Towards accurate visualization and explanation of cnns. arXiv preprint arXiv:2008.02312

  13. Gan Y (2018) Facial expression recognition using convolutional neural network. In: Proceedings of the 2nd international conference on vision, image and signal processing. pp. 1–5

  14. Georgescu M-I, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  15. Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728

  16. Gunel B, Jingfei D, Conneau A, Stoyanov V (2020) Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403

  17. Hadsell, R., Chopra, S., and LeCun, Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2. pp. 1735–1742. IEEE

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778

  19. Hua W, Dai F, Huang L, Xiong J, Gui G (2019) HERO: human emotions recognition for realizing intelligent internet of things. IEEE Access 7:24321–24332

    Article  Google Scholar 

  20. Huang Y, Chen F, Lv S, Wang X (2019) Facial expression recognition: a survey. Symmetry 11(10):1189

    Article  Google Scholar 

  21. Jeon J, Park J-C, Jo YJ, Nam CM, Bae K-H, Hwang Y, Kim D-S (2016) A real-time facial expression recognizer using deep neural network. In: proceedings of the 10th international conference on ubiquitous information management and communication. pp. 1–4

  22. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Proces Syst 33:18661–18673

    Google Scholar 

  23. Kim B-K, Roh J, Dong S-Y, Lee S-Y (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multim User Interfaces 10(2):173–189

    Article  Google Scholar 

  24. Knyazev B, Shvetsov R, Efremova N., et al. (2017) Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598

  25. Kolesnikov A, Zhai X, Beyer L (2019) Revisiting self-supervised visual representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1920–1929

  26. Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput

  27. Li Y, Zeng J, Shan S, Chen X (2018) Patch-Gated CNN for occlusion aware facial expression recognition. In: Proc. ICPR. pp. 2209–2214

  28. Liu M, Li S, Shan S, Chen X (2012) Enhancing expression recognition in the wild with unlabeled reference data. In Asian Conference on Computer Vision, Springer, pages 577–588

  29. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, San Francisco, CA, USA. pp. 94–101

  30. Misra I, van der Maaten L (2020) Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6707–6717

  31. Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA. pp. 1–10

  32. Mollahosseini A, Hasani B, Mahoor MH (2017) AffectNet: a new database for facial expression, valence, and arousal computation in the wild. IEEE Transactions on Affective Computing

  33. Naik AJ, Gopalakrishna MT (2021) Deep-violence: individual person violent activity detection in video. Multimed Tools Appl 80:18365–18380

    Article  Google Scholar 

  34. Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, Springer, pp. 69–84

  35. Rahimi Taghanaki S, Etemad A (2020) Self-supervised wearable-based activity recognition by learning to forecast motion. arXiv e-prints. pp. arXiv–2010

  36. Ramachandran, P, Zoph B, Quoc VL (2017) Swish: a self-gated activation function. arXiv preprint arXiv:1710.05941 7, no. 1. 5

  37. Rifai S, Bengio Y, Courville A, Vincent P, Mirza M (2012) Disentangling factors of variation for facial expression recognition. In European Conference on Computer Vision (ECCV), Springer, pages 808–822

  38. Roy S, Etemad A (2021) Self-supervised contrastive learning of multi-view facial expressions. In: Proceedings of the 2021 International Conference on Multimodal Interaction. pp. 253–257

  39. Roy S, Etemad A (2021) "Spatiotemporal contrastive learning of facial expressions in videos." In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8. IEEE

  40. Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2016) Grad-CAM: Why did you say that?. arXiv preprint arXiv:1611.07450

  41. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. pp. 618–626

  42. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  43. Spurr A, Dahiya A, Wang X, Zhang X, Hilliges O (2021) Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11230–11239

  44. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9

  45. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR

  46. Tian Y, Krishnan D, Isola P (2020) Contrastive multiview coding. In: European conference on computer vision, pp. 776–794. Springer, Cham.

  47. Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742

  48. Dan Z, Lin Z, Yan X, Liu Y, Wang F, Tang B (2022) Face2Exp: Combating Data Biases for Facial Expression Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20291–2030

  49. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. pp. 649–666. Springer

  50. Zhao X, Vemulapalli R (2021) Philip Andrew Mansfield, Boqing Gong, Bradley Green, Lior Shapira, and Ying Wu. "Contrastive Learning for Label Efficient Semantic Segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10623–10633

  51. Zhuang C, Zhai AL, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE International Conference on Computer Vision. pp. 6002–6012

Download references

Funding

• The authors did not receive support from any organization for the submitted work.

• No funding was received to assist with the preparation of this manuscript.

• No funding was received for conducting this study.

• No funds, grants, or other support was received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kshitiza Vasudeva.

Ethics declarations

Conflict of interests

• The authors have no relevant financial or non-financial interests to disclose.

• The authors have no competing interests to declare that are relevant to the content of this article.

• All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

• The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vasudeva, K., Dubey, A. & Chandran, S. SCL-FExR: supervised contrastive learning approach for facial expression Recognition. Multimed Tools Appl 82, 31351–31371 (2023). https://doi.org/10.1007/s11042-023-14803-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14803-5

Keywords

Navigation