Skip to main content

Advertisement

Log in

Emotion Recognition System via Facial Expressions and Speech Using Machine Learning and Deep Learning Techniques

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Patients in hospitals frequently exhibit psychological issues such as sadness, pessimism, eccentricity, and anxiety. However, hospitals normally lack tools and facilities to continuously monitor the psychological health of patients. It is desirable to identify depression in patients so that it can be managed by instantly providing better therapy. This can be possible by advances in machine learning for image processing with notable applications in the domain of emotion recognition using facial expressions. In this paper, we have proposed two different methods, i.e. facial expression detection and voice analysis, to predict emotions. For facial expression recognition, we have used two approaches, one is the use of Gabor filters for feature extraction with support vector machine for classification and another is using convolutional neural network (CNN). For voice analysis, we extracted mel-frequency cepstral coefficients from speech data and, based on those features, predicted the emotions of the speech using a CNN model. Experimental results show that our proposed emotion recognition methods obtained high accuracy and thus could be potentially deployed to real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

Data is available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analysed in this study. The data can be found here: [https://www.kasrl.org/jaffe_download.html] (accessed on 12th March 2022), [https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio] (accessed on 13th April 2022), [https://www.kaggle.com/datasets/barelydedicated/savee-database] (accessed on 13th April 2022), [https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess] (accessed on 15th April 2022), and [https://www.kaggle.com/datasets/ejlok1/cremad] (accessed on 17th April 2022).

Code Availability

The code is available at the following github link: https://github.com/AayushiChaudhari5694/EmotionRecognition_Image_Speech.git.

References

  1. Sonawane B, Sharma P. Deep learning based approach of emotion detection and grading system. Pattern Recognit Image Anal. 2020;30(4):726–40.

    Article  Google Scholar 

  2. Kim DJ. Facial expression recognition using ASM-based post-processing technique. Pattern Recognit Image Anal. 2016;26(3):576–81.

    Article  Google Scholar 

  3. Muhammad K, Khan S, Kumar N, Del Ser J, Mirjalili S. Vision-based personalized wireless capsule endoscopy for smart healthcare: taxonomy, literature review, opportunities and challenges. Futur Gener Comput Syst. 2020;113:266–80.

    Article  Google Scholar 

  4. Pisor AC, Gervais MM, Purzycki BG, Ross CT. Preferences and constraints: the value of economic games for studying human behaviour. R Soc Open Sci. 2020;7(6): 192090.

    Article  Google Scholar 

  5. Le DN, Nguyen GN, Van Chung L, Dey N. MMAS algorithm for features selection using 1D-DWT for video-based face recognition in the online video contextual advertisement user-oriented system. J Glob Inf Manag (JGIM). 2017;25(4):103–24.

    Article  Google Scholar 

  6. Panning A, Al-Hamadi AK, Niese R, Michaelis B. Facial expression recognition based on Haar-like feature detection. Pattern Recognit Image Anal. 2008;18(3):447–52.

    Article  Google Scholar 

  7. Tarnowski P, Kołodziej M, Majkowski A, Rak RJ. Emotion recognition using facial expressions. Proc Comput Sci. 2017;108:1175–84.

    Article  Google Scholar 

  8. Le DN, Nguyen GN, Bhateja V, Satapathy SC. Optimizing feature selection in video-based recognition using Max-Min Ant System for the online video contextual advertisement user-oriented system. J Comput Sci. 2017;21:361–70.

    Article  Google Scholar 

  9. Rozaliev VL, Orlova YA. Motion and posture recognition for identifying human emotional reactions. Pattern Recognit Image Anal. 2015;25(4):710–21.

    Article  Google Scholar 

  10. Basu S, Chakraborty J, Bag A, Aftabuddin M. A review on emotion recognition using speech. In: 2017 international conference on inventive communication and computational technologies (ICICCT). IEEE; 2017. p. 109–114.

  11. Liu M, Shan S, Wang R, Chen X. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. p. 1749–1756.

  12. Jie S, Yongsheng Q. Multi-view facial expression recognition with multi-view facial expression light weight network. Pattern Recognit Image Anal. 2020;30(4):805–14.

    Article  Google Scholar 

  13. Chen Y, Wang J, Chen S, Shi Z, Cai J. Facial motion prior networks for facial expression recognition. In: 2019 IEEE visual communications and image processing (VCIP). IEEE; 2019. p. 1–4.

  14. Hibare R, Vibhute A. Feature extraction techniques in speech processing: a survey. Int J Comput Appl. 2014;107(5).

  15. Meng Z, Liu P, Cai J, Han S, Tong Y. Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE; 2017. p. 558–565.

  16. Meng D, Peng X, Wang K, Qiao Y. Frame attention networks for facial expression recognition in videos. In: 2019 IEEE international conference on image processing (ICIP). IEEE; 2019. p. 3866–3870.

  17. Verma A, Dogra A, Malik K, Talwar M. Emotion recognition system for patients with behavioral disorders. In: Intelligent communication, control and devices. Singapore: Springer; 2018. p. 139–145.

  18. Alugupally N, Samal A, Marx D, Bhatia S. Analysis of landmarks in recognition of face expressions. Pattern Recognit Image Anal. 2011;21(4):681–93.

    Article  Google Scholar 

  19. Wang X, Huang J, Zhu J, Yang M, Yang F. Facial expression recognition with deep learning. In: Proceedings of the 10th international conference on internet multimedia computing and service. 2018. p. 1–4.

  20. Yang H, Ciftci U, Yin L. Facial expression recognition by deexpression residue learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 2168–2177.

  21. Kamachi M, Lyons M, Gyoba J. The Japanese Female Facial Expression (JAFFE) database. 1997. http://www.kasrl.org/jaffe.html.

  22. Kosti R, Alvarez JM, Recasens A, Lapedriza A. Context based emotion recognition using emotic dataset. IEEE Trans Pattern Anal Mach Intell. 2019;42(11):2755–66.

    Google Scholar 

  23. Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS) [Data set]. In: PLoS ONE 2018;(1.0.0, Vol. 13, Number 5, p. e0196391). Zenodo. https://doi.org/10.5281/zenodo.1188976.

  24. Cao H, Cooper D, Keutmann M, Gur R, Nenkova A, Verma R. CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput. 2014;5:377–90.

    Article  Google Scholar 

  25. Haq S, Jackson PJB. Multimodal emotion recognition. In: Wang W, editor. Machine audition: principles, algorithms and systems. Hershey: IGI Global Press; 2010. p. 398–423. https://doi.org/10.4018/978-1-61520-919-4.

  26. El Ayadi M, Kamel MS, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 2011;44(3):572–87.

    Article  MATH  Google Scholar 

  27. Koolagudi SG, Rao KS. Emotion recognition from speech: a review. Int J Speech Technol. 2012;15(2):99–117.

    Article  Google Scholar 

  28. Palo HK, Chandra M, Mohanty MN. Emotion recognition using MLP and GMM for Oriya language. Int J Comput Vis Robot. 2017;7(4):426–42.

    Article  Google Scholar 

  29. Murthy HA, Yegnanarayana B. Formant extraction from group delay function. Speech Commun. 1991;10(3):209–21.

    Article  Google Scholar 

  30. Choudhary A, Govil MC, Singh G, Awasthi LK. Workflow scheduling algorithms in cloud environment: a review, taxonomy, and challenges. In: 2016 4th international conference on parallel, distributed and grid computing (PDGC). IEEE; 2016. p. 617–624.

  31. Albu F, Hagiescu D, Vladutu L, Puica MA. Neural network approaches for children’s emotion recognition in intelligent learning applications. In: Proceedings of the 7th international conference on education and new learning technologies (EDULEARN15). 2015. p. 3229–3239.

Download references

Funding

The research received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

AC data curation, investigation, writing—original draft, implementation. CB methodology, project administration, writing—review and editing. TTN writing—review and editing. NP data curation, investigation, implementation, testing. KP methodology, writing—review and editing, implementation. KS methodology, writing—review and editing, implementation.

Corresponding author

Correspondence to Chintan Bhatt.

Ethics declarations

Conflict of Interest

The authors declared no conflict of interest.

Ethical Approval

Not applicable.

Consent to Participate

All authors have read and agreed to participate in the publication of the manuscript.

Consent for Publication

All authors have read and agreed to publish the latest version of the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enabling Innovative Computational Intelligence Technologies for IOT” guest edited by Omer Rana, Rajiv Misra, Alexander Pfeiffer, Luigi Troiano and Nishtha Kesswani.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaudhari, A., Bhatt, C., Nguyen, T.T. et al. Emotion Recognition System via Facial Expressions and Speech Using Machine Learning and Deep Learning Techniques. SN COMPUT. SCI. 4, 363 (2023). https://doi.org/10.1007/s42979-022-01633-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01633-9

Keywords

Navigation