Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset

Haddad, Syrine; Daassi, Olfa; Belghith, Safya

doi:10.1007/s42979-024-03020-y

Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset

Original Research
Published: 25 June 2024

Volume 5, article number 669, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

230 Accesses
Explore all metrics

Abstract

Over the last decade, emotion recognition has become a widely researched area worth considering in any project related to affective computing. Due to the limitless applications of this new discipline, the development of emotion recognition systems has emerged as a lucrative opportunity in the corporate sector. Emotion recognition can be detected in many ways, such as face, speech, text, gestures, etc. Hence, this article presents a comprehensive survey of emotion recognition methods, focusing on three key modalities: visual, speech, and audio-visual. Moreover, a novel approach for each modality is proposed, and a comparative analysis is conducted with existing methods to assess their effectiveness. The results of this study revealed that face emotion recognition on the RAVDESS dataset can achieve good results compared to speech emotion recognition, which was the opposite case in previous literature. Combined modalities were found to likely achieve even better results with an accuracy of 89% compared to previous works. Finally, this article concludes by offering new insights and suggestions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal Emotion Recognition Based on Speech and Image

Fusing facial and speech cues for enhanced multimodal emotion recognition

Article 24 January 2024

A novel multimodal EEG-image fusion approach for emotion recognition: introducing a multimodal KMED dataset

Article 30 December 2024

Data availibility statement

The RAVDESS database used in this paper is available from https://zenodo.org/record/1188976

References

Medjden S, Ahmed N, Lataifeh M. Adaptive user interface design and analysis using emotion recognition through facial expressions and body posture from an rgb-d sensor. PLOS ONE. 2020;15(7):1–37. https://doi.org/10.1371/journal.pone.0235908.
Article Google Scholar
Wang Y, Song W, Tao W, Liotta A, Yang D, Li X, Gao S, Sun Y, Ge W, Zhang W, Zhang W. A systematic review on affective computing: emotion models, databases, and recent advances. Inf Fus. 2022;83–84:19–52. https://doi.org/10.1016/j.inffus.2022.03.009.
Article Google Scholar
Zhang Y, Qian Y, Wu D, Hossain MS, Ghoneim A, Chen M. Emotion-aware multimedia systems security. IEEE Trans Multim. 2019;21(3):617–24. https://doi.org/10.1109/TMM.2018.2882744.
Article Google Scholar
Izard CE. Human emotions. Berlin: Springer. https://doi.org/10.1007/978-1-4899-2209-0.
Ekman P. An argument for basic emotions. Cogn Emot. 1992;6(3–4):169–200. https://doi.org/10.1080/02699939208411068.
Article Google Scholar
Plutchik R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci. 2001;89(4):344–50.
Article Google Scholar
Garcia-Garcia JM, Penichet VMR, Lozano MD. Emotion detection: a technology review. In: Proceedings of the XVIII international conference on human computer interaction. Interacción ’17. Association for Computing Machinery, New York, NY, USA. 2017. https://doi.org/10.1145/3123818.3123852.
Binali H, Potdar V. Emotion detection state of the art. In: Proceedings of the CUBE international information technology conference. CUBE ’12. Association for Computing Machinery, New York, NY, USA. pp. 501–7. 2012. https://doi.org/10.1145/2381716.2381812.
Al-Saqqa S, Abdel-Nabi H, Awajan A. A survey of textual emotion detection. In: 2018 8th international conference on computer science and information technology (CSIT). 2018. pp. 136–42. https://doi.org/10.1109/CSIT.2018.8486405.
Wani TM, Gunawan TS, Qadri SAA, Kartiwi M, Ambikairajah E. A comprehensive review of speech emotion recognition systems. IEEE Access. 2021;9:47795–814. https://doi.org/10.1109/ACCESS.2021.3068045.
Article Google Scholar
Canal FZ, Müller TR, Matias JC, Scotton GG, de Sa Junior AR, Pozzebon E, Sobieranski AC. A survey on facial emotion recognition techniques: a state-of-the-art literature review. Inf Sci. 2022;582:593–617. https://doi.org/10.1016/j.ins.2021.10.005.
Article Google Scholar
Torres EP, Torres EA, Hernández-Álvarez M, Yoo SG. Eeg-based bci emotion recognition: a survey. Sensors. 2020. https://doi.org/10.3390/s20185083.
Article Google Scholar
Abdullah SMSA, Ameen SYA, Sadeeq MA, Zeebaree S. Multimodal emotion recognition using deep learning. J Appl Sci Technol Trends. 2021;2(02):52–8.
Google Scholar
Marechal C, Mikolajewski D, Tyburek K, Prokopowicz P, Bougueroua L, Ancourt C, Wegrzyn-Wolska K. Survey on ai-based multimodal methods for emotion detection. High-performance modelling and simulation for big data applications. 2019;11400:307–24.
Zhang J, Yin Z, Chen P, Nichele S. Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inf Fus. 2020;59:103–26.
Article Google Scholar
Haddad S, Daassi O, Belghith S. Emotion recognition from audio-visual information based on convolutional neural network. In: 2023 international conference on control, automation and diagnosis (ICCAD), IEEE. 2023. pp. 1–5.
Desmet B, Hoste V. Emotion detection in suicide notes. Expert Syst Appl. 2013;40(16):6351–8. https://doi.org/10.1016/j.eswa.2013.05.050.
Article Google Scholar
Naidoo SW, Naicker N, Patel SS, Govender P. Computer vision: the effectiveness of deep learning for emotion detection in marketing campaigns. Int J Adv Comput Sci Appl. 2022;13(5).
Tariq Z, Shah SK, Lee Y. Speech emotion detection using iot based deep learning for health care. In: 2019 IEEE international conference on big data (big data). 2019. pp. 4191–6. https://doi.org/10.1109/BigData47090.2019.9005638.
Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS).
Luna-Jiménez C, Griol D, Callejas Z, Kleinlein R, Montero JM, Fernández-Martínez F. Multimodal emotion recognition on Ravdess dataset using transfer learning. Sensors. 2021. https://doi.org/10.3390/s21227665.
Article Google Scholar
Luna-Jiménez C, Kleinlein R, Griol D, Callejas Z, Montero JM, Fernández-Martínez F. A proposal for multimodal emotion recognition using aural transformers and action units on Ravdess dataset. Appl Sci. 2022. https://doi.org/10.3390/app12010327.
Article Google Scholar
Bagheri E, Esteban PG, Cao H-L, Beir AD, Lefeber D, Vanderborght B. An autonomous cognitive empathy model responsive to users’ facial emotion expressions. ACM Trans Interact Intell Syst. 2020. https://doi.org/10.1145/3341198.
Article Google Scholar
Aghajani K. Audio-visual emotion recognition based on a deep convolutional neural network. J AI Data Min. 2022;10(4):529–37. https://doi.org/10.22044/jadm.2022.11809.2331.
Article Google Scholar
Chen J, Sherstneva AI, Botygin IA. Speech emotion recognition based on deep residual convolutional neural network. Euras Sci J. No. 3.2022.
Singh P, Srivastava R, Rana KPS, Kumar V. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl Based Syst. 2021;229: 107316. https://doi.org/10.1016/j.knosys.2021.107316.
Article Google Scholar
Issa D, Fatih Demirci M, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59: 101894. https://doi.org/10.1016/j.bspc.2020.101894.
Article Google Scholar
Wijayasingha L, Stankovic JA. Robustness to noise for speech emotion classification using cnns and attention mechanisms. Smart Health. 2021;19: 100165. https://doi.org/10.1016/j.smhl.2020.100165.
Article Google Scholar
Mustaqeem Kwon S. Att-net: enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput. 2021;102:107101. https://doi.org/10.1016/j.asoc.2021.107101.
Article Google Scholar
Huang S-C, Pareek A, Seyyedi S, Banerjee I, Lungren M. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Dig Med. 2020. https://doi.org/10.1038/s41746-020-00341-z.
Article Google Scholar
Ramachandram D, Taylor GW. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag. 2017;34(6):96–108.
Article Google Scholar
Zhu C-Z, Hu R, Zou B-J, Zhao R-C, Chen C-L, Xiao Y-L. Automatic diabetic retinopathy screening via cascaded framework based on image-and lesion-level features fusion. J Comput Sci Technol. 2019;34:1307–18.
Article Google Scholar
Chandrasekar A, Radhika T, Zhu Q. Further results on input-to-state stability of stochastic Cohen-Grossberg BAM neural networks with probabilistic time-varying delays. Neural Process Lett. 2022;54:1–23.
Article Google Scholar
Radhika T, Chandrasekar A, Vijayakumar V, Zhu Q. Analysis of Markovian jump stochastic Cohen-Grossberg BAM neural networks with time delays for exponential input-to-state stability. Neural Process Lett. 2023;55:1–18.
Article Google Scholar
Rakkiyappan R, Chandrasekar A, Cao J. Passivity and passification of memristor-based recurrent neural networks with additive time-varying delays. IEEE Trans Neural Netw Learn Syst. 2014;26:2043–57.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Robotics, Informatics and Complex Systems (RISC Lab-LR16ES07), Tunis, Tunisia
Syrine Haddad, Olfa Daassi & Safya Belghith
National Engineering School of Tunis, University of Tunis El manar, Tunis, 1002, Tunisia
Syrine Haddad & Safya Belghith
National Engineering School of Carthage, University of Carthage, Carthage, 2035, Tunisia
Olfa Daassi

Authors

Syrine Haddad
View author publications
You can also search for this author in PubMed Google Scholar
Olfa Daassi
View author publications
You can also search for this author in PubMed Google Scholar
Safya Belghith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syrine Haddad.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Human/animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Haddad, S., Daassi, O. & Belghith, S. Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset. SN COMPUT. SCI. 5, 669 (2024). https://doi.org/10.1007/s42979-024-03020-y

Download citation

Received: 01 December 2023
Accepted: 29 May 2024
Published: 25 June 2024
DOI: https://doi.org/10.1007/s42979-024-03020-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-modal Emotion Recognition Based on Speech and Image

Fusing facial and speech cues for enhanced multimodal emotion recognition

A novel multimodal EEG-image fusion approach for emotion recognition: introducing a multimodal KMED dataset

Data availibility statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human/animal rights statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-modal Emotion Recognition Based on Speech and Image

Fusing facial and speech cues for enhanced multimodal emotion recognition

A novel multimodal EEG-image fusion approach for emotion recognition: introducing a multimodal KMED dataset

Data availibility statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human/animal rights statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation