Hybrid multi-modal emotion recognition framework based on InceptionV3DenseNet

Alamgir, Fakir Mashuque; Alam, Md. Shafiul

doi:10.1007/s11042-023-15066-w

Hybrid multi-modal emotion recognition framework based on InceptionV3DenseNet

Published: 27 March 2023

Volume 82, pages 40375–40402, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fakir Mashuque Alamgir¹ &
Md. Shafiul Alam¹

206 Accesses
Explore all metrics

Abstract

Emotion recognition is one of the most complex research areas as individuals express emotional cues based on several modalities such as audio, facial expressions, and language. The recognition of emotion from one of the modalities is not always feasible as the single modalities are disturbed by several factors. The existing models cannot attain the maximum accuracy in exactly identifying the expressions of individuals. In this paper, a novel hybrid multi-modal emotion recognition framework InceptionV3DenseNet is proposed for improving the recognition accuracy. Initially contextual features are extracted from different modalities such as video, audio and text. From the video modality, the features such as shot length, lighting key, motion and color are extracted. Zero-crossing rate, Mel frequency cepstral coefficient (MFCC), energy and pitch are extracted from the audio modality and the unigram, bigram and TF-IDF are extracted from the textual modality. In feature extraction, high level features are extracted with better generalization capability. The extracted features are fused using the multi-set integrated canonical correlation analysis (MICCA) and are provided as the input to the proposed hybrid network model. It detects the correlation between multimodal features to provide better performance with single learning phase. Then the proposed hybrid deep learning model is utilized to classify emotional states by considering the accuracy and reliability. The work simulations are conducted in the MATLAB platform and evaluated using the MELD and RAVDESS datasets. The outcomes proved that the proposed model is more efficient and accurate than the compared models and attained an overall accuracy rate of 74.87% in MELD and 95.25% in RAVDESS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

Article 09 September 2022

Data availability

Data sharing not applicable to this article.

References

Abdullah SMSA, Ameen SYA, Sadeeq MAM, Zeebaree S (2021) Multi-modal emotion recognition using deep learning. J Appl Sci Technol Trends 2(02):52–58
Article Google Scholar
Bastanfard A, Aghaahmadi M, Fazel M, Moghadam M (2009) Persian viseme classification for developing visual speech training application. In Pacific-Rim Conference on Multimedia, Springer, Berlin, Heidelberg, 1080–1085
Bastanfard A, Amirkhani D, Hasani M (2019) Increasing the accuracy of automatic speaker age estimation by using multiple UBMs. In 2019 5th conference on knowledge based engineering and innovation (KBEI), IEEE 592–598
Cevher D, Zepf S, Klinger R (2019) Towards multi-modal emotion recognition in german speech events in cars using transfer learning. arXiv preprint arXiv:1909.02764
Chang X, Skarbek W (2021) Multi-modal residual perceptron network for audio–video emotion recognition. Sensors 21(16):5452
Article Google Scholar
Cimtay Y, Ekmekcioglu E, Caglar-Ozhan S (2020) Cross-subject multi-modal emotion recognition based on hybrid fusion. IEEE Access 8:168865–168878
Article Google Scholar
Correa NM, Eichele T, Adalı T, Li Y-O, Calhoun VD (2010) Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI. Neuroimage 50(4):1438–1445
Article Google Scholar
Dai W, Liu Z, Yu T, Fung P (2020) Modality-transferable emotion embeddings for low-resource multi-modal emotion recognition. arXiv preprint arXiv:2009.09629
Granger E, Cardinal P (2021) Cross attentional audio-visual fusion for dimensional emotion recognition. arXiv preprint arXiv:2111.05222
Guo J-J, Zhou R, Zhao L-M, Lu B-L (2019) Multi-modal emotion recognition from eye image, eye movement and eeg using deep neural networks. In 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC), 3071–3074
Hashim FA, Houssein EH, Hussain K, Mabrouk MS, Al-Atabany W (2022) Honey badger algorithm: new metaheuristic algorithm for solving optimization problems. Math Comput Simul 192:84–110
Article MathSciNet MATH Google Scholar
He Z, Li Z, Yang F, Wang L, Li J, Zhou C, Pan J (2020) Advances in multi-modal emotion recognition based on brain–computer interfaces. Brain Sci 10(10):687
Article Google Scholar
Ho N-H, Yang H-J, Kim S-H, Lee G (2020) Multi-modal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network. IEEE Access 8:61672–61686
Article Google Scholar
Huan R-H, Shu J, Bao S-L, Liang R-H, Chen P, Chi K-K (2021) Video multi-modal emotion recognition based on bi-GRU and attention fusion. Multimed Tools Appl 80(6):8213–8240
Article Google Scholar
Huang H, Hu Z, Wang W, Wu M (2019) Multi-modal emotion recognition based on ensemble convolutional neural network. IEEE Access 8:3265–3271
Article Google Scholar
Li Y, Ishi CT, Inoue K, Nakamura S, Kawahara T (2019) Expressing reactive emotion based on multi-modal emotion recognition for natural conversation in human–robot interaction. Adv Robot 33(20):1030–1041
Article Google Scholar
Li J-L, Lee C-C (2019) Attentive to individual: a multimodal emotion recognition network with personalized attention profile. In Interspeech 211–215
Liu D, Chen L, Wang Z, Diao G (2021) Speech expression multimodal emotion recognition based on deep belief network. J Grid Comput 19(2):1–13
Article Google Scholar
Liu W, Qiu J-L, Zheng W-L, Lu B-L (2019) Multi-modal emotion recognition using deep canonical correlation analysis. arXiv preprint arXiv:1908.05349
Mahdavi R, Bastanfard A, Amirkhani D (2020) Persian accents identification using modeling of speech articulatory features. In 2020 25th international computer conference, Computer Society of Iran (CSICC) 1–9
Mittal T, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) M3er: multiplicative multi-modal emotion recognition using facial, textual, and speech cues. Proc AAAI Conf Artif Intell 34(02):1359–1367
Google Scholar
Nemati S, Rohani R, Basiri ME, Abdar M, Yen NY, Makarenkov V (2019) A hybrid latent space data fusion method for multi-modal emotion recognition. IEEE Access 7:172948–172964
Article Google Scholar
Panda D, Chakladar DD, Dasgupta T (2020) Multi-modal system for emotion recognition using EEG and customer review. In Proceedings of the Global AI Congress 2019 Springer, Singapore, 399–410
Radoi A, Birhala A, Ristea N-C, Dutu L-C (2021) An end-to-end emotion recognition framework based on temporal aggregation of multimodal information. IEEE Access 9:135559–135570
Article Google Scholar
Rahdari F, Rashedi E, Eftekhari M (2019) A multi-modal emotion recognition system using facial landmark analysis. Iran J Sci Technol Trans Electr Eng 43(1):171–189
Article Google Scholar
Savargiv M, Bastanfard A (2013) Text material design for fuzzy emotional speech corpus based on Persian semantic and structure. In 2013 international conference on fuzzy theory and its applications (iFUZZY), IEEE 380–384
Savargiv M, Bastanfard A (2015) Persian speech emotion recognition. In 2015 7th conference on information and knowledge technology (IKT), IEEE 1–5
Savargiv M, Bastanfard A (2016) Real-time speech emotion recognition by minimum number of features. In 2016 Artificial intelligence and robotics (IRANOPEN), IEEE 72–76
Shahin I, Hindawi N, Nassif AB, Alhudhaif A, Polat K (2022) Novel dual-channel long short-term memory compressed capsule networks for emotion recognition. Expert Syst Appl Elsevier 188:116080
Article Google Scholar
Siddiqui MFH, Javaid AY (2020) A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images. Multimodal Technol Interact 4(3):46
Article Google Scholar
Singh P, Srivastava R, Rana KPS, Kumar V (2021) A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl Based Syst Elsevier 229:107316
Article Google Scholar
Siriwardhana S, Kaluarachchi T, Billinghurst M, Nanayakkara S (2020) Multi-modal emotion recognition with transformer-based self supervised feature fusion. IEEE Access 8:176274–176285
Article Google Scholar
Veni S, Anand R, Mohan D, Paul E (2021) Feature fusion in multimodal emotion recognition system for enhancement of human-machine interaction. In IOP conference series: materials science and engineering, IOP publishing, 1084(1): 012004
Xie B, Sidulova M, Park CH (2021) Robust multi-modal emotion recognition from conversation with transformer-based crossmodality fusion. Sensors 21(14):4913
Article Google Scholar
Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multi-modal sentiment analysis. Proc AAAI Conf Artif Intell 33(01):371–378
Google Scholar
Xu H, Zhang H, Han K, Wang Y, Peng Y, Li X (2019) Learning alignment for multi-modal emotion recognition from speech. arXiv preprint arXiv:1909.05645
Yalamanchili B, Dungala K, Mandapati K, Pillodi M, Vanga SR (2021) Survey on multi-modal emotion recognition (MER) systems. In machine learning technologies and applications: proceedings of ICACECS 2020, springer Singapore, 319–326
Yin G, Sun S, Yu D, Li D, Zhang K (2022) A multimodal framework for large-scale emotion recognition by fusing music and electrodermal activity signals. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), dl.acm.org, 18(3):1–23
Yu C, Tapus A (2019) Interactive robot learning for multi-modal emotion recognition. In International Conference on Social Robotics, Springer, Cham, 633–642
Yuan Y-H, Sun Q-S, Zhou Q, Xia D-S (2011) A novel multi-set integrated canonical correlation analysis framework and its application in feature fusion. Pattern Recogn 44(5):1031–1040
Article MATH Google Scholar
Zhang H (2020) Expression-EEG based collaborative multi-modal emotion recognition using deep autoencoder. IEEE Access 8:164130–164143
Article Google Scholar
Zhang G, Luo T, Pedrycz W, El-Meligy MA, Sharaf MAF, Li Z (2020) Outlier processing in multi-modal emotion recognition. IEEE Access 8:55688–55701
Article Google Scholar
Zhao Y, Chen D (2021) Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism Computational and Mathematical Methods in Medicine 2021

Download references

Author information

Authors and Affiliations

Department of Electrical & Electronic Engineering, University of Dhaka, Dhaka, Bangladesh
Fakir Mashuque Alamgir & Md. Shafiul Alam

Authors

Fakir Mashuque Alamgir
View author publications
You can also search for this author in PubMed Google Scholar
Md. Shafiul Alam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fakir Mashuque Alamgir.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alamgir, F.M., Alam, M.S. Hybrid multi-modal emotion recognition framework based on InceptionV3DenseNet. Multimed Tools Appl 82, 40375–40402 (2023). https://doi.org/10.1007/s11042-023-15066-w

Download citation

Received: 21 March 2022
Revised: 05 October 2022
Accepted: 02 March 2023
Published: 27 March 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-15066-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid multi-modal emotion recognition framework based on InceptionV3DenseNet

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid multi-modal emotion recognition framework based on InceptionV3DenseNet

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation