Abstract
Over the last decade, emotion recognition has become a widely researched area worth considering in any project related to affective computing. Due to the limitless applications of this new discipline, the development of emotion recognition systems has emerged as a lucrative opportunity in the corporate sector. Emotion recognition can be detected in many ways, such as face, speech, text, gestures, etc. Hence, this article presents a comprehensive survey of emotion recognition methods, focusing on three key modalities: visual, speech, and audio-visual. Moreover, a novel approach for each modality is proposed, and a comparative analysis is conducted with existing methods to assess their effectiveness. The results of this study revealed that face emotion recognition on the RAVDESS dataset can achieve good results compared to speech emotion recognition, which was the opposite case in previous literature. Combined modalities were found to likely achieve even better results with an accuracy of 89% compared to previous works. Finally, this article concludes by offering new insights and suggestions for future research.









Similar content being viewed by others
Data availibility statement
The RAVDESS database used in this paper is available from https://zenodo.org/record/1188976
References
Medjden S, Ahmed N, Lataifeh M. Adaptive user interface design and analysis using emotion recognition through facial expressions and body posture from an rgb-d sensor. PLOS ONE. 2020;15(7):1–37. https://doi.org/10.1371/journal.pone.0235908.
Wang Y, Song W, Tao W, Liotta A, Yang D, Li X, Gao S, Sun Y, Ge W, Zhang W, Zhang W. A systematic review on affective computing: emotion models, databases, and recent advances. Inf Fus. 2022;83–84:19–52. https://doi.org/10.1016/j.inffus.2022.03.009.
Zhang Y, Qian Y, Wu D, Hossain MS, Ghoneim A, Chen M. Emotion-aware multimedia systems security. IEEE Trans Multim. 2019;21(3):617–24. https://doi.org/10.1109/TMM.2018.2882744.
Izard CE. Human emotions. Berlin: Springer. https://doi.org/10.1007/978-1-4899-2209-0.
Ekman P. An argument for basic emotions. Cogn Emot. 1992;6(3–4):169–200. https://doi.org/10.1080/02699939208411068.
Plutchik R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci. 2001;89(4):344–50.
Garcia-Garcia JM, Penichet VMR, Lozano MD. Emotion detection: a technology review. In: Proceedings of the XVIII international conference on human computer interaction. Interacción ’17. Association for Computing Machinery, New York, NY, USA. 2017. https://doi.org/10.1145/3123818.3123852.
Binali H, Potdar V. Emotion detection state of the art. In: Proceedings of the CUBE international information technology conference. CUBE ’12. Association for Computing Machinery, New York, NY, USA. pp. 501–7. 2012. https://doi.org/10.1145/2381716.2381812.
Al-Saqqa S, Abdel-Nabi H, Awajan A. A survey of textual emotion detection. In: 2018 8th international conference on computer science and information technology (CSIT). 2018. pp. 136–42. https://doi.org/10.1109/CSIT.2018.8486405.
Wani TM, Gunawan TS, Qadri SAA, Kartiwi M, Ambikairajah E. A comprehensive review of speech emotion recognition systems. IEEE Access. 2021;9:47795–814. https://doi.org/10.1109/ACCESS.2021.3068045.
Canal FZ, Müller TR, Matias JC, Scotton GG, de Sa Junior AR, Pozzebon E, Sobieranski AC. A survey on facial emotion recognition techniques: a state-of-the-art literature review. Inf Sci. 2022;582:593–617. https://doi.org/10.1016/j.ins.2021.10.005.
Torres EP, Torres EA, Hernández-Álvarez M, Yoo SG. Eeg-based bci emotion recognition: a survey. Sensors. 2020. https://doi.org/10.3390/s20185083.
Abdullah SMSA, Ameen SYA, Sadeeq MA, Zeebaree S. Multimodal emotion recognition using deep learning. J Appl Sci Technol Trends. 2021;2(02):52–8.
Marechal C, Mikolajewski D, Tyburek K, Prokopowicz P, Bougueroua L, Ancourt C, Wegrzyn-Wolska K. Survey on ai-based multimodal methods for emotion detection. High-performance modelling and simulation for big data applications. 2019;11400:307–24.
Zhang J, Yin Z, Chen P, Nichele S. Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inf Fus. 2020;59:103–26.
Haddad S, Daassi O, Belghith S. Emotion recognition from audio-visual information based on convolutional neural network. In: 2023 international conference on control, automation and diagnosis (ICCAD), IEEE. 2023. pp. 1–5.
Desmet B, Hoste V. Emotion detection in suicide notes. Expert Syst Appl. 2013;40(16):6351–8. https://doi.org/10.1016/j.eswa.2013.05.050.
Naidoo SW, Naicker N, Patel SS, Govender P. Computer vision: the effectiveness of deep learning for emotion detection in marketing campaigns. Int J Adv Comput Sci Appl. 2022;13(5).
Tariq Z, Shah SK, Lee Y. Speech emotion detection using iot based deep learning for health care. In: 2019 IEEE international conference on big data (big data). 2019. pp. 4191–6. https://doi.org/10.1109/BigData47090.2019.9005638.
Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS).
Luna-Jiménez C, Griol D, Callejas Z, Kleinlein R, Montero JM, Fernández-Martínez F. Multimodal emotion recognition on Ravdess dataset using transfer learning. Sensors. 2021. https://doi.org/10.3390/s21227665.
Luna-Jiménez C, Kleinlein R, Griol D, Callejas Z, Montero JM, Fernández-Martínez F. A proposal for multimodal emotion recognition using aural transformers and action units on Ravdess dataset. Appl Sci. 2022. https://doi.org/10.3390/app12010327.
Bagheri E, Esteban PG, Cao H-L, Beir AD, Lefeber D, Vanderborght B. An autonomous cognitive empathy model responsive to users’ facial emotion expressions. ACM Trans Interact Intell Syst. 2020. https://doi.org/10.1145/3341198.
Aghajani K. Audio-visual emotion recognition based on a deep convolutional neural network. J AI Data Min. 2022;10(4):529–37. https://doi.org/10.22044/jadm.2022.11809.2331.
Chen J, Sherstneva AI, Botygin IA. Speech emotion recognition based on deep residual convolutional neural network. Euras Sci J. No. 3.2022.
Singh P, Srivastava R, Rana KPS, Kumar V. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl Based Syst. 2021;229: 107316. https://doi.org/10.1016/j.knosys.2021.107316.
Issa D, Fatih Demirci M, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59: 101894. https://doi.org/10.1016/j.bspc.2020.101894.
Wijayasingha L, Stankovic JA. Robustness to noise for speech emotion classification using cnns and attention mechanisms. Smart Health. 2021;19: 100165. https://doi.org/10.1016/j.smhl.2020.100165.
Mustaqeem Kwon S. Att-net: enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput. 2021;102:107101. https://doi.org/10.1016/j.asoc.2021.107101.
Huang S-C, Pareek A, Seyyedi S, Banerjee I, Lungren M. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Dig Med. 2020. https://doi.org/10.1038/s41746-020-00341-z.
Ramachandram D, Taylor GW. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag. 2017;34(6):96–108.
Zhu C-Z, Hu R, Zou B-J, Zhao R-C, Chen C-L, Xiao Y-L. Automatic diabetic retinopathy screening via cascaded framework based on image-and lesion-level features fusion. J Comput Sci Technol. 2019;34:1307–18.
Chandrasekar A, Radhika T, Zhu Q. Further results on input-to-state stability of stochastic Cohen-Grossberg BAM neural networks with probabilistic time-varying delays. Neural Process Lett. 2022;54:1–23.
Radhika T, Chandrasekar A, Vijayakumar V, Zhu Q. Analysis of Markovian jump stochastic Cohen-Grossberg BAM neural networks with time delays for exponential input-to-state stability. Neural Process Lett. 2023;55:1–18.
Rakkiyappan R, Chandrasekar A, Cao J. Passivity and passification of memristor-based recurrent neural networks with additive time-varying delays. IEEE Trans Neural Netw Learn Syst. 2014;26:2043–57.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Human/animal rights statement
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Haddad, S., Daassi, O. & Belghith, S. Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset. SN COMPUT. SCI. 5, 669 (2024). https://doi.org/10.1007/s42979-024-03020-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-03020-y