Skip to main content
Log in

Majority biased facial emotion recognition using residual variational autoencoders

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recent studies have established the success of deep learning models in facial emotion recognition. However, such models are often not well suited to tackle one of the most commonly encountered problems of imbalanced classes. In real datasets, various emotion classes are found to be highly underrepresented leading to dramatic reduction in the performance of classification models. In the current study, a residual variational autoencoder-based model has been proposed to address the problem of imbalanced facial emotion recognition. Firstly, in order to capture the most important features in the form of embeddings, a variational autoencoder equipped with residual connections has been trained in an unsupervised fashion to obtain the most effective latent space representation of all input images. After the training phase, only the encoder part of the actual autoencoder is used to transform all labeled facial images into a latent vector form. Next, the imbalanced latent vectors are resampled using well-known algorithms to tackle the imbalanced classes. In this context, three major types of algorithms viz., Undersampling, Oversampling, and Hybrid are used for the same. To establish the quality of the proposed method, various well-known classifiers are trained and tested in terms of test phase confusion matrix-based performance indicators. All hyperparameters are selected by employing the Grid search method. In addition, to understand the effect of oversampling minority class samples, a separate study is conducted by observing classifier performance against varying degrees of oversampling. Experimental results and extensive comparative studies have shown that the residual variational autoencoder model combined with SMOTE-ENN hybrid resampling technique can boost the classifier performance to a greater extent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data Availability

The datasets analysed during the current study are available from the corresponding author on reasonable request

References

  1. Abdul-Hadi MH, Waleed J (2020) Human speech and facial emotion recognition technique using svm. In 2020 International Conference on Computer Science and Software Engineering (CSASE), pp 191–196. IEEE

  2. Alamgir, Alam M, et al (2022) An artificial intelligence driven facial emotion recognition system using hybrid deep belief rain optimization. Multimedia Tools App pp 1–28

  3. Allognon SOC, de S Britto A, Koerich AL (2020) Continuous emotion recognition via deep convolutional autoencoder and support vector regressor. In 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE

  4. Arora M, Kumar M (2021) Autofer: Pca and pso based automatic facial emotion recognition. Multimedia Tools Appl 80(2):3039–3049

    Article  Google Scholar 

  5. Arora M, Kumar M, Garg NK (2018) Facial emotion recognition system based on pca and gradient features. National Academy Sci Lett 41(6):365–368

    Article  Google Scholar 

  6. Arora S, Risteski A, Zhang Y (2017) Theoretical limitations of encoder-decoder gan architectures. arXiv preprint arXiv:1711.02651

  7. Arora V, Sun M, Wang C (2019) Deep embeddings for rare audio event detection with imbalanced data. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3297–3301. IEEE

  8. Banerjee A, Bhattacharjee M, Ghosh K, Chatterjee S (2020) Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media. Multimedia Tools Appl 79(47):35995–36031

    Article  Google Scholar 

  9. Banerjee A, Ghosh K, Sarkar A, Bhattacharjee M, Chatterjee S (2021) Effects of class imbalance problem in convolutional neural network based image classification. In Advances in Smart Communication Technology and Information Processing: OPTRONIX 2020, pp 181–191. Springer

  10. Batista GE, Bazzan ALC, Monard MC, et al (2003) Balancing training data for automated annotation of keywords: a case study. In WOB, pp 10–18

  11. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter 6(1):20–29

    Article  Google Scholar 

  12. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Machine Learning Research 13(2)

  13. Calderon-Ramirez S, Yang S, Moemeni A, Elizondo D, Colreavy-Donnelly S, Chavarría-Estrada LF, Molina-Cabello MA (2021) Correcting data imbalance for semi-supervised covid-19 detection using x-ray chest images. Appl Soft Comput 111:107692

    Article  Google Scholar 

  14. Chatterjee S, Das AK, Nayak J, Pelusi D (2022) Improving facial emotion recognition using residual autoencoder coupled affinity based overlapping reduction. Mathematics 10(3):406

    Article  Google Scholar 

  15. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artificial Int Research 16:321–357

    Google Scholar 

  16. Chen L, Zhou M, Su W, Wu M, She J, Hirota K (2018) Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction. Inform Sci 428:49–61

    Article  MathSciNet  Google Scholar 

  17. Chen L, Su W, Wu M, Pedrycz W, Hirota K (2020) A fuzzy deep neural network with sparse autoencoder for emotional intention understanding in human-robot interaction. IEEE Trans Fuzzy Syst 28(7):1252–1264

    Google Scholar 

  18. Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. In 2019 IEEE Visual Communications and Image Processing (VCIP), pp 1–4. IEEE

  19. Chen L, Wu M, Pedrycz W, Hirota K (2021) Deep sparse autoencoder network for facial emotion recognition. In Emotion Recognition and Understanding for Emotional Human-Robot Interaction Systems, pp 25–39. Springer

  20. Christy A, Vaithyasubramanian S, Jesudoss A, Praveena MDA (2020) Multimodal speech emotion recognition and classification using convolutional neural network techniques. Int J Speech Technol 23:381–388

    Article  Google Scholar 

  21. Deeb H, Sarangi A, Mishra D, Sarangi SK (2022) Human facial emotion recognition using improved black hole based extreme learning machine. Multimedia Tools Appl pp 1–24

  22. Dino HI, Abdulrazzaq MB (2019) Facial expression classification based on svm, knn and mlp classifiers. In 2019 International Conference on Advanced Science and Engineering (ICOASE), pp 70–75. IEEE

  23. Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Int 20(1):18–36

    Article  MathSciNet  Google Scholar 

  24. Fard AP, Mahoor MH (2022) Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 10:26756–26768

    Article  Google Scholar 

  25. Farzaneh AH, Qi X (2021) Facial expression recognition in the wild via deep attentive center loss. In Proceedings of the IEEE/CVF winter conference on applications of computer vision pp 2402–2411

  26. Gautam KS, Thangavel SK (2019) Video analytics-based facial emotion recognition system for smart buildings. Int J Comput Appl pp 1–10

  27. Ghosh K, Banerjee A, Chatterjee S, Bhattacharjee M, Sarkar A (2021) Oversampling using fuzzy rough set theory in imbalanced neural based diabetic patient readmission prediction: A hybrid approach. In 2021 International Conference on Computer Communication and Informatics (ICCCI), pp 1–5. IEEE

  28. Ghosh K, Banerjee A, Chatterjee S, Sen S (2019) Imbalanced twitter sentiment analysis using minority oversampling. In 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), pp 1–5. IEEE

  29. Ghosh K, Bellinger C, Corizzo R, Krawczyk B, Japkowicz N (2021) On the combined effect of class imbalance and concept complexity in deep learning. In 2021 IEEE International Conference on Big Data (Big Data), pp 4859–4868. IEEE

  30. Ghosh K, Sarkar A, Banerjee A, Chatterjee S (2021) Performance improvement of convolutional neural network using random under sampling. In Advances in Smart Communication Technology and Information Processing: OPTRONIX 2020, pp 207–217. Springer

  31. Green MC, Plumbley MD (2021) Federated learning with highly imbalanced audio data. arXiv preprint arXiv:2105.08550

  32. Haddad J, Lézoray O, Hamel P (2020) 3d-cnn for facial emotion recognition in videos. In International Symposium on Visual Computing, pp 298–309. Springer

  33. Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pp 878–887. Springer

  34. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328. IEEE

  35. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  36. Hu M, Wang H, Wang X, Yang J, Wang R (2019) Video facial emotion recognition based on local enhanced motion history image and cnn-ctslstm networks. J Visual Commun Image Representation 59:176–185

    Article  Google Scholar 

  37. Huang C, Trabelsi A, Qin X, Farruque N, Mou L, Zaiane OR (2021) Seq2emo: A sequence to multi-label emotion classification model. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4717–4724

  38. Imani M, Montazer GA (2019) A survey of emotion recognition methods with emphasis on e-learning environments. J Netw Comput Appl 147:102423

    Article  Google Scholar 

  39. Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recogn Lett 120:69–74

    Article  Google Scholar 

  40. Jang J, Kim Y, Choi K, Suh S (2021) Sequential targeting: A continual learning approach for data imbalance in text classification. Expert Syst Appl 179:115067

    Article  Google Scholar 

  41. Japkowicz N, Stephen S (2002) The class imbalance problem: A systematic study. Int Data Analysis 6(5):429–449

    Article  Google Scholar 

  42. Jiang M, Francis SM, Srishyla D, Conelea C, Zhao Q, Jacob S (2019) Classifying individuals with asd through facial emotion recognition and eye-tracking. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 6063–6068. IEEE

  43. Kim DH, Song BC (2021) Contrastive adversarial learning for person independent facial emotion recognition. In Proceedings of the AAAI Conference on Artificial Intelligence 35:5948–5956

    Article  Google Scholar 

  44. Kumov V, Samorodov A (2020) Recognition of genetic diseases based on combined feature extraction from 2d face images. In 2020 26th Conference of Open Innovations Association (FRUCT), pp 1–7. IEEE

  45. Lakshmi D, Ponnusamy R (2021) Facial emotion recognition using modified hog and lbp features with deep stacked autoencoders. Microprocessors and Microsystems 82:103834

    Article  Google Scholar 

  46. Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe, pp 63–66. Springer

  47. Lee S-C, Chen K-W, Liu C-C, Kuo C-J, Hsueh I-P, Hsieh C-L (2021) Using machine learning to improve the discriminative power of the ferd screener in classifying patients with schizophrenia and healthy adults. J Affective Disorders

  48. Lee S-C, Liu C-C, Kuo C-J, Hsueh I-P, Hsieh C-L (2020) Sensitivity and specificity of a facial emotion recognition test in classifying patients with schizophrenia. J Affect Disord 275:224–229

    Article  Google Scholar 

  49. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  50. Li X, Li X, Li Z, Xiong X, Khyam MO, Sun C (2021) Robust vehicle detection in high-resolution aerial images with imbalanced data. IEEE Trans Artificial Int

  51. Lin C-J, Lin C-H, Wang S-H, Wu C-H (2019) Multiple convolutional neural networks fusion using improved fuzzy integral for facial emotion recognition. Appl Sci 9(13):2593

    Article  Google Scholar 

  52. Lopes N, Silva A, Khanal SR, Reis A, Barroso J, Filipe V, Sampaio J (2018) Facial emotion recognition in the elderly using a svm classifier. In 2018 2nd International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW) pp 1–5. IEEE

  53. Ma T, Benon K, Arnold B, Yu K, Yang Y, Hua Q, Wen Z, Paul AK (2020) Bottleneck feature extraction-based deep neural network model for facial emotion recognition. In International Conference on Mobile Networks and Management pp 30–46. Springer

  54. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affective Comput 10(1):18–31

    Article  Google Scholar 

  55. Naruniec J, Helminger L, Schroers C, Weber RM (2020) High-resolution neural face swapping for visual effects. In Computer Graphics Forum, vol 39, pp 173–184. Wiley Online Library

  56. Ngo QT, Yoon S (2020) Facial expression recognition based on weighted-cluster loss and deep transfer learning using a highly imbalanced dataset. Sensors 20(9):2639

    Article  Google Scholar 

  57. Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowledge Eng Soft Data Paradigms 3(1):4–21

    Article  Google Scholar 

  58. Nguyen D, Nguyen DT, Zeng R, Nguyen TT, Tran S, Nguyen TK, Sridharan S, Fookes C (2021) Deep auto-encoders with sequential learning for multimodal dimensional emotion recognition. IEEE Trans Multimedia

  59. Nnamoko N, Korkontzelos I (2020) Efficient treatment of outliers and class imbalance for diabetes prediction. Artificial Int Medicine 104:101815

    Article  Google Scholar 

  60. Ottl S, Amiriparian S, Gerczuk M, Karas V, Schuller B (2020) Group-level speech emotion recognition utilising deep spectrum features. In Proceedings of the 2020 International Conference on Multimodal Interaction, pp 821–826

  61. Panda MR, Kar SS, Nanda AK, Priyadarshini R, Panda S, Bisoy SK (2021) Feedback through emotion extraction using logistic regression and cnn. The Visual Computer pp 1–13

  62. Panda R, Malheiro RM, Paiva RP (2020) Audio features for music emotion recognition: a survey. IEEE Trans Affective Comput

  63. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Machine Learning Research 12:2825–2830

    MathSciNet  Google Scholar 

  64. Pise A, Vadapalli H, Sanders I (2020) Facial emotion recognition using temporal relational network: an application to e-learning. Multimedia Tools Appl pp 1–21

  65. Pouyanfar S, Wang T, Chen SC (2019) A multi-label multimodal deep learning framework for imbalanced data classification. In 2019 IEEE conference on multimedia information processing and retrieval (MIPR), pp 199–204. IEEE

  66. Rajotte J-F, Mukherjee S, Robinson C, Ortiz A, West C, Ferres JL, Ng RT (2021) Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary. arXiv preprint arXiv:2101.07235

  67. Richardson AM, Lidbury BA (2017) Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines. BMC medical Informatics and Decision Making 17(1):1–11

    Article  Google Scholar 

  68. Ruiz-Garcia A, Palade V, Elshaw M, Awad M (2020) Generative adversarial stacked autoencoders for facial pose normalization and emotion recognition. In 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE

  69. Sajjad M, Kwon S et al (2020) Clustering-based speech emotion recognition by incorporating learned features and deep bilstm. IEEE Access 8:79861–79875

    Article  Google Scholar 

  70. Sengupta S, Athwale A, Gulati T, Zelek J, Lakshminarayanan V (2020) Funsyn-net: enhanced residual variational auto-encoder and image-to-image translation network for fundus image synthesis. In Medical Imaging 2020: Image Processing, vol 11313, p 113132M. International Society for Optics and Photonics

  71. Sivasangari A, Ajitha P, Rajkumar I, Poonguzhali S (2019) Emotion recognition system for autism disordered people. J Ambient Int Humanized Comput pp 1–7

  72. Sujanaa J, Palanivel S, Balasubramanian M (2021) Emotion recognition using support vector machine and one-dimensional convolutional neural network. Multimedia Tools Appl pp 1–15

  73. Talpur BA, O’Sullivan D (2020) Multi-class imbalance in text classification: A feature engineering approach to detect cyberbullying in twitter. In Informatics, vol 7, pp 52. Multidisciplinary Digital Publishing Institute

  74. Tarnowski P, Kołodziej M, Majkowski A, Rak RJ (2017) Emotion recognition using facial expressions. Procedia Comput Sci 108:1175–1184

    Article  Google Scholar 

  75. Vinay A, Bharadwaj A, Srinivasan A, Murthy KNB, Natarajan S (2018) Root orb–an improved algorithm for face recognition. In Emerging Trends in Engineering, Science and Technology for Society, Energy and Environment pp 881–888. CRC Press

  76. Vinay A, Kamath VR, Varun M, Natarajan S, Murthy KNB, et al. (2018) Aggregation of lark vectors for facial image classification. In International Conference on Mathematical Modelling and Scientific Computation pp 427–448. Springer

  77. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  Google Scholar 

  78. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybernetics 3:408–421

    Article  MathSciNet  Google Scholar 

  79. Wu J-L, He Y, Yu L-C, Lai KR (2020) Identifying emotion labels from psychiatric social texts using a bi-directional lstm-cnn model. IEEE Access 8:66638–66646

    Article  Google Scholar 

  80. Xu C, Yan C, Jiang M, Alenezi F, Alhudhaif A, Alnaim N, Polat K, Wu W (2022) A novel facial emotion recognition method for stress inference of facial nerve paralysis patients. Expert Syst Appl 197:116705

    Article  Google Scholar 

  81. Yang D-Q, Li T, Liu M-T, Li X-W, Chen B-H (2021) A systematic study of the class imbalance problem: Automatically identifying empty camera trap images using convolutional neural networks. Ecological Informatics, pp 101350

  82. Yen S-J, Lee Y-S (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727

    Article  MathSciNet  Google Scholar 

  83. Yi W, Sun Y, He S (2018) Data augmentation using conditional gans for facial emotion recognition. In 2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama), pp 710–714. IEEE

  84. Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649

    Article  Google Scholar 

  85. Zepf S, Hernandez J, Schmitt A, Minker W, Picard RW (2020) Driver emotion recognition for intelligent vehicles: a survey. ACM Comput Surv (CSUR) 53(3):1–30

    Article  Google Scholar 

  86. Zhang H (2020) Expression-eeg based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access 8:164130–164143

    Article  Google Scholar 

  87. Zhang Y, Chan W, Jaitly N (2016) Very deep convolutional networks for end-to-end speech recognition

  88. Zhao JJ, Ma RL, Zhang XL (2017) Speech emotion recognition based on decision tree and improved svm mixed model. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology 37(4):386–390

    Google Scholar 

  89. Zheng M, Li T, Zheng X, Yu Q, Chen C, Zhou D, Lv C, Yang W (2021) Uffdfr: Undersampling framework with denoising, fuzzy c-means clustering, and representative sample selection for imbalanced data classification. Inform Sci 576:658–680

    Article  MathSciNet  Google Scholar 

Download references

Funding

The authors declare that no funding was received for conducting this study or preparing the manuscript

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sankhadeep Chatterjee.

Ethics declarations

Conflicts of interest

The authors have no competing interests/conflict of interests to declare that are relevant to the content of this article

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chatterjee, S., Maity, S., Ghosh, K. et al. Majority biased facial emotion recognition using residual variational autoencoders. Multimed Tools Appl 83, 13659–13688 (2024). https://doi.org/10.1007/s11042-023-15888-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15888-8

Keywords

Navigation