Abstract
Facial Expression Recognition (FER) is a significant field of computer vision and has emerged as a crucial component of Human-computer interaction. Breakthroughs in self-supervised representation learning have resulted from a renaissance of work in contrastive learning, following the state-of-the-art performance in unsupervised training of deep image models. However, due to the random sampling of false negatives for contrastive loss calculation, the representation quality might degrade in FER. In this work, we extend the self-supervised contrastive learning technique to the fully supervised setting to effectively exploit label information in classifying facial expressions. Therefore, we propose a Supervised Contrastive Learning- Facial Expression Recognition (SCL-FExR) system to create a model which is robust for real-world emotion detection. Our goal is not to compete with the highly complex state-of-the-art CNN-based Deep Neural Network, but to establish a method that can be incorporated to achieve similar performance but with less-complex models and more robustness. We demonstrate the effectiveness of the suggested method using three FER datasets: FER2013, AffectNet, and CK+. On FER2013, we achieved a similar accuracy of 76%, establishing a method that can be incorporated into less complex CNN-based Deep Neural Networks to achieve robustness and be significantly more noise-resistant. The secondary aim is to show how a data-based strategy may be used to train very complicated deep learning models instead of a model-based approach, which solves the issue of computational expenditure.
Similar content being viewed by others
Data availability
1. The datasets generated during and/or analysed during the current study are available in the FER2013 repository, https://www.kaggle.com/datasets/msambare/fer2013
2. The datasets generated during and/or analysed during the current study are available in the AffectNet repository, http://mohammadmahoor.com/affectnet/, published in Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor, “AffectNet: A New Database for Facial Expression, Valence, and Arousal Computation in the Wild”, IEEE Transactions on Affective Computing, 2017.
3. The datasets generated during and/or analysed during the current study are available in the CK+ repository, https://paperswithcode.com/dataset/ck, published in the article P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, “The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010, pp. 94–101, https://doi.org/10.1109/CVPRW.2010.5543262.
References
Ahonen T, Hadid, A., Pietikäinen, M. (2004) Face recognition with local binary patterns. In European Conference on Computer Vision; Springer: Berlin, Germany. pp. 469–481
Alex K, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25
Bau D, Zhou B, Khosla A, Oliva A, Torralba A (2017) Network dissection: Quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6541–6549
Bisogni C, Castiglione A, Hossain S, Narducci F, Umer S (2022) Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Transac Indust Inform 18(8):5619–5627
Breuer R, Kimmel R (2017) A deep learning perspective on the origin of facial expressions. arXiv, arXiv:1705.01842.
Carrier PL, Courville A, Goodfellow IJ; Mirza M; Bengio Y (2013) FER-2013 face database; Universit de Montral: Montreal, QC, Canada
Chaitanya K, Erdil E, Karani N, Konukoglu E (2020) Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv Neural Inf Proces Syst 33:12546–12558
Chen L, Bentley P, Mori K, Misawa K, Fujiwara M, Rueckert D (2019) Self-supervised learning for medical image analysis using image context restoration. Med Image Anal 58:101539
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. pp. 1597–1607
Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1422–1430
Dosovitskiy A, Springenberg JT, Riedmiller M, Brox T (2014) Discriminative unsupervised feature learning with convolutional neural networks. Advances in neural information processing systems 27
Fu R, Hu Q, Dong X, Guo Y, Gao Y, Li B (2020) Axiom-based grad-cam: Towards accurate visualization and explanation of cnns. arXiv preprint arXiv:2008.02312
Gan Y (2018) Facial expression recognition using convolutional neural network. In: Proceedings of the 2nd international conference on vision, image and signal processing. pp. 1–5
Georgescu M-I, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728
Gunel B, Jingfei D, Conneau A, Stoyanov V (2020) Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403
Hadsell, R., Chopra, S., and LeCun, Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2. pp. 1735–1742. IEEE
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778
Hua W, Dai F, Huang L, Xiong J, Gui G (2019) HERO: human emotions recognition for realizing intelligent internet of things. IEEE Access 7:24321–24332
Huang Y, Chen F, Lv S, Wang X (2019) Facial expression recognition: a survey. Symmetry 11(10):1189
Jeon J, Park J-C, Jo YJ, Nam CM, Bae K-H, Hwang Y, Kim D-S (2016) A real-time facial expression recognizer using deep neural network. In: proceedings of the 10th international conference on ubiquitous information management and communication. pp. 1–4
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Proces Syst 33:18661–18673
Kim B-K, Roh J, Dong S-Y, Lee S-Y (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multim User Interfaces 10(2):173–189
Knyazev B, Shvetsov R, Efremova N., et al. (2017) Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598
Kolesnikov A, Zhai X, Beyer L (2019) Revisiting self-supervised visual representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1920–1929
Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput
Li Y, Zeng J, Shan S, Chen X (2018) Patch-Gated CNN for occlusion aware facial expression recognition. In: Proc. ICPR. pp. 2209–2214
Liu M, Li S, Shan S, Chen X (2012) Enhancing expression recognition in the wild with unlabeled reference data. In Asian Conference on Computer Vision, Springer, pages 577–588
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, San Francisco, CA, USA. pp. 94–101
Misra I, van der Maaten L (2020) Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6707–6717
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA. pp. 1–10
Mollahosseini A, Hasani B, Mahoor MH (2017) AffectNet: a new database for facial expression, valence, and arousal computation in the wild. IEEE Transactions on Affective Computing
Naik AJ, Gopalakrishna MT (2021) Deep-violence: individual person violent activity detection in video. Multimed Tools Appl 80:18365–18380
Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision, Springer, pp. 69–84
Rahimi Taghanaki S, Etemad A (2020) Self-supervised wearable-based activity recognition by learning to forecast motion. arXiv e-prints. pp. arXiv–2010
Ramachandran, P, Zoph B, Quoc VL (2017) Swish: a self-gated activation function. arXiv preprint arXiv:1710.05941 7, no. 1. 5
Rifai S, Bengio Y, Courville A, Vincent P, Mirza M (2012) Disentangling factors of variation for facial expression recognition. In European Conference on Computer Vision (ECCV), Springer, pages 808–822
Roy S, Etemad A (2021) Self-supervised contrastive learning of multi-view facial expressions. In: Proceedings of the 2021 International Conference on Multimodal Interaction. pp. 253–257
Roy S, Etemad A (2021) "Spatiotemporal contrastive learning of facial expressions in videos." In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8. IEEE
Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2016) Grad-CAM: Why did you say that?. arXiv preprint arXiv:1611.07450
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. pp. 618–626
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Spurr A, Dahiya A, Wang X, Zhang X, Hilliges O (2021) Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11230–11239
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR
Tian Y, Krishnan D, Isola P (2020) Contrastive multiview coding. In: European conference on computer vision, pp. 776–794. Springer, Cham.
Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742
Dan Z, Lin Z, Yan X, Liu Y, Wang F, Tang B (2022) Face2Exp: Combating Data Biases for Facial Expression Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20291–2030
Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. pp. 649–666. Springer
Zhao X, Vemulapalli R (2021) Philip Andrew Mansfield, Boqing Gong, Bradley Green, Lior Shapira, and Ying Wu. "Contrastive Learning for Label Efficient Semantic Segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10623–10633
Zhuang C, Zhai AL, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE International Conference on Computer Vision. pp. 6002–6012
Funding
• The authors did not receive support from any organization for the submitted work.
• No funding was received to assist with the preparation of this manuscript.
• No funding was received for conducting this study.
• No funds, grants, or other support was received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
• The authors have no relevant financial or non-financial interests to disclose.
• The authors have no competing interests to declare that are relevant to the content of this article.
• All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
• The authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vasudeva, K., Dubey, A. & Chandran, S. SCL-FExR: supervised contrastive learning approach for facial expression Recognition. Multimed Tools Appl 82, 31351–31371 (2023). https://doi.org/10.1007/s11042-023-14803-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14803-5