Skip to main content

Advertisement

Log in

Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Over the last decade, emotion recognition has become a widely researched area worth considering in any project related to affective computing. Due to the limitless applications of this new discipline, the development of emotion recognition systems has emerged as a lucrative opportunity in the corporate sector. Emotion recognition can be detected in many ways, such as face, speech, text, gestures, etc. Hence, this article presents a comprehensive survey of emotion recognition methods, focusing on three key modalities: visual, speech, and audio-visual. Moreover, a novel approach for each modality is proposed, and a comparative analysis is conducted with existing methods to assess their effectiveness. The results of this study revealed that face emotion recognition on the RAVDESS dataset can achieve good results compared to speech emotion recognition, which was the opposite case in previous literature. Combined modalities were found to likely achieve even better results with an accuracy of 89% compared to previous works. Finally, this article concludes by offering new insights and suggestions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availibility statement

The RAVDESS database used in this paper is available from https://zenodo.org/record/1188976

References

  1. Medjden S, Ahmed N, Lataifeh M. Adaptive user interface design and analysis using emotion recognition through facial expressions and body posture from an rgb-d sensor. PLOS ONE. 2020;15(7):1–37. https://doi.org/10.1371/journal.pone.0235908.

    Article  Google Scholar 

  2. Wang Y, Song W, Tao W, Liotta A, Yang D, Li X, Gao S, Sun Y, Ge W, Zhang W, Zhang W. A systematic review on affective computing: emotion models, databases, and recent advances. Inf Fus. 2022;83–84:19–52. https://doi.org/10.1016/j.inffus.2022.03.009.

    Article  Google Scholar 

  3. Zhang Y, Qian Y, Wu D, Hossain MS, Ghoneim A, Chen M. Emotion-aware multimedia systems security. IEEE Trans Multim. 2019;21(3):617–24. https://doi.org/10.1109/TMM.2018.2882744.

    Article  Google Scholar 

  4. Izard CE. Human emotions. Berlin: Springer. https://doi.org/10.1007/978-1-4899-2209-0.

  5. Ekman P. An argument for basic emotions. Cogn Emot. 1992;6(3–4):169–200. https://doi.org/10.1080/02699939208411068.

    Article  Google Scholar 

  6. Plutchik R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci. 2001;89(4):344–50.

    Article  Google Scholar 

  7. Garcia-Garcia JM, Penichet VMR, Lozano MD. Emotion detection: a technology review. In: Proceedings of the XVIII international conference on human computer interaction. Interacción ’17. Association for Computing Machinery, New York, NY, USA. 2017. https://doi.org/10.1145/3123818.3123852.

  8. Binali H, Potdar V. Emotion detection state of the art. In: Proceedings of the CUBE international information technology conference. CUBE ’12. Association for Computing Machinery, New York, NY, USA. pp. 501–7. 2012. https://doi.org/10.1145/2381716.2381812.

  9. Al-Saqqa S, Abdel-Nabi H, Awajan A. A survey of textual emotion detection. In: 2018 8th international conference on computer science and information technology (CSIT). 2018. pp. 136–42. https://doi.org/10.1109/CSIT.2018.8486405.

  10. Wani TM, Gunawan TS, Qadri SAA, Kartiwi M, Ambikairajah E. A comprehensive review of speech emotion recognition systems. IEEE Access. 2021;9:47795–814. https://doi.org/10.1109/ACCESS.2021.3068045.

    Article  Google Scholar 

  11. Canal FZ, Müller TR, Matias JC, Scotton GG, de Sa Junior AR, Pozzebon E, Sobieranski AC. A survey on facial emotion recognition techniques: a state-of-the-art literature review. Inf Sci. 2022;582:593–617. https://doi.org/10.1016/j.ins.2021.10.005.

    Article  Google Scholar 

  12. Torres EP, Torres EA, Hernández-Álvarez M, Yoo SG. Eeg-based bci emotion recognition: a survey. Sensors. 2020. https://doi.org/10.3390/s20185083.

    Article  Google Scholar 

  13. Abdullah SMSA, Ameen SYA, Sadeeq MA, Zeebaree S. Multimodal emotion recognition using deep learning. J Appl Sci Technol Trends. 2021;2(02):52–8.

    Google Scholar 

  14. Marechal C, Mikolajewski D, Tyburek K, Prokopowicz P, Bougueroua L, Ancourt C, Wegrzyn-Wolska K. Survey on ai-based multimodal methods for emotion detection. High-performance modelling and simulation for big data applications. 2019;11400:307–24.

  15. Zhang J, Yin Z, Chen P, Nichele S. Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inf Fus. 2020;59:103–26.

    Article  Google Scholar 

  16. Haddad S, Daassi O, Belghith S. Emotion recognition from audio-visual information based on convolutional neural network. In: 2023 international conference on control, automation and diagnosis (ICCAD), IEEE. 2023. pp. 1–5.

  17. Desmet B, Hoste V. Emotion detection in suicide notes. Expert Syst Appl. 2013;40(16):6351–8. https://doi.org/10.1016/j.eswa.2013.05.050.

    Article  Google Scholar 

  18. Naidoo SW, Naicker N, Patel SS, Govender P. Computer vision: the effectiveness of deep learning for emotion detection in marketing campaigns. Int J Adv Comput Sci Appl. 2022;13(5).

  19. Tariq Z, Shah SK, Lee Y. Speech emotion detection using iot based deep learning for health care. In: 2019 IEEE international conference on big data (big data). 2019. pp. 4191–6. https://doi.org/10.1109/BigData47090.2019.9005638.

  20. Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS).

  21. Luna-Jiménez C, Griol D, Callejas Z, Kleinlein R, Montero JM, Fernández-Martínez F. Multimodal emotion recognition on Ravdess dataset using transfer learning. Sensors. 2021. https://doi.org/10.3390/s21227665.

    Article  Google Scholar 

  22. Luna-Jiménez C, Kleinlein R, Griol D, Callejas Z, Montero JM, Fernández-Martínez F. A proposal for multimodal emotion recognition using aural transformers and action units on Ravdess dataset. Appl Sci. 2022. https://doi.org/10.3390/app12010327.

    Article  Google Scholar 

  23. Bagheri E, Esteban PG, Cao H-L, Beir AD, Lefeber D, Vanderborght B. An autonomous cognitive empathy model responsive to users’ facial emotion expressions. ACM Trans Interact Intell Syst. 2020. https://doi.org/10.1145/3341198.

    Article  Google Scholar 

  24. Aghajani K. Audio-visual emotion recognition based on a deep convolutional neural network. J AI Data Min. 2022;10(4):529–37. https://doi.org/10.22044/jadm.2022.11809.2331.

    Article  Google Scholar 

  25. Chen J, Sherstneva AI, Botygin IA. Speech emotion recognition based on deep residual convolutional neural network. Euras Sci J. No. 3.2022.

  26. Singh P, Srivastava R, Rana KPS, Kumar V. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl Based Syst. 2021;229: 107316. https://doi.org/10.1016/j.knosys.2021.107316.

    Article  Google Scholar 

  27. Issa D, Fatih Demirci M, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59: 101894. https://doi.org/10.1016/j.bspc.2020.101894.

    Article  Google Scholar 

  28. Wijayasingha L, Stankovic JA. Robustness to noise for speech emotion classification using cnns and attention mechanisms. Smart Health. 2021;19: 100165. https://doi.org/10.1016/j.smhl.2020.100165.

    Article  Google Scholar 

  29. Mustaqeem Kwon S. Att-net: enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput. 2021;102:107101. https://doi.org/10.1016/j.asoc.2021.107101.

    Article  Google Scholar 

  30. Huang S-C, Pareek A, Seyyedi S, Banerjee I, Lungren M. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Dig Med. 2020. https://doi.org/10.1038/s41746-020-00341-z.

    Article  Google Scholar 

  31. Ramachandram D, Taylor GW. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag. 2017;34(6):96–108.

    Article  Google Scholar 

  32. Zhu C-Z, Hu R, Zou B-J, Zhao R-C, Chen C-L, Xiao Y-L. Automatic diabetic retinopathy screening via cascaded framework based on image-and lesion-level features fusion. J Comput Sci Technol. 2019;34:1307–18.

    Article  Google Scholar 

  33. Chandrasekar A, Radhika T, Zhu Q. Further results on input-to-state stability of stochastic Cohen-Grossberg BAM neural networks with probabilistic time-varying delays. Neural Process Lett. 2022;54:1–23.

    Article  Google Scholar 

  34. Radhika T, Chandrasekar A, Vijayakumar V, Zhu Q. Analysis of Markovian jump stochastic Cohen-Grossberg BAM neural networks with time delays for exponential input-to-state stability. Neural Process Lett. 2023;55:1–18.

    Article  Google Scholar 

  35. Rakkiyappan R, Chandrasekar A, Cao J. Passivity and passification of memristor-based recurrent neural networks with additive time-varying delays. IEEE Trans Neural Netw Learn Syst. 2014;26:2043–57.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syrine Haddad.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Human/animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haddad, S., Daassi, O. & Belghith, S. Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset. SN COMPUT. SCI. 5, 669 (2024). https://doi.org/10.1007/s42979-024-03020-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-03020-y

Keywords

Navigation