Abstract
Analyzing and detecting human intensions and emotions are important means to improve the communication between users and machines in the areas of human-computer interaction (HCI) and human-robot interaction (HRI). Despite significant progress in utilizing state-of-the-art (SOTA) Transformer-based models, various obstacles persist in managing complicated input interdependencies and extracting intricate contextual semantics. Moreover, it lacks practical applicability and struggles to accurately capture and effectively manage the inherent complexity and unpredictability of human emotions. In recognition of the identified research gaps, we introduce a robust and innovative fuzzy multi-modal Transformer (FMMT) model. Our novel fuzzy Transformer model uniquely heightens the comprehension of emotional contexts by concurrently analyzing audio, visual, and text data through three distinct branches. By incorporating fuzzy mathematic theory and introducing a unique temporal embedding technique to trace the evolution of emotional states, it effectively handles the inherent uncertainty in human emotions, thereby filling a significant void in emotional AI. Building upon the FMMT model, we further explored the emotion expression approach. Furthermore, performance comparison analysis with SOTA baseline methods and detailed ablation study were performed. The results show that the proposed FMMT performs better than the baseline methods. Finally, we conducted detailed experimental verification and empirical analyses of the practicality of the designed method by verifying uncertainty emotion and analyzing emotional state transitions combined with personalized factor. Overall, our research makes a significant contribution to emotion analysis through the implementation of a novel fuzzy Transformer model. This model enhances emotion perception and advances the methods for analyzing emotional expression, thus setting an edge over prior studies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The participants of this study did not give written consent for their data to be shared publicly; therefore, the data are not available.
References
Abbasimehr H, Paki R (2022) Improving time series forecasting using LSTM and attention models. J Ambient Intell Humaniz Comput 13(1):673–691. https://doi.org/10.1007/s12652-020-02761-x
Ahmad A, Singh V, Upreti K (2024) A systematic study on unimodal and multimodal human computer interface for emotion recognition. In: García Márquez FP, Jamil A, Ramirez IS, Eken S, Hameed AA (eds) Computing, internet of things and data analytics. ICCIDA 2023. Studies in computational intelligence, vol 1145. Springer, Cham, https://doi.org/10.1007/978-3-031-53717-2_35
Ahmed N, Aghbari ZA, Girija S (2023) A systematic survey on multimodal emotion recognition using learning algorithms. Intell Syst Appl 17:200171. https://doi.org/10.1016/j.iswa.2022.200171
Albadr MAA, Tiun S, Ayob M, Al-Dhief FT, Omar K, Maen MK (2022) Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimedia Tools Appl 81(17):23963–23989. https://doi.org/10.1007/s11042-022-12747-w
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74
An F, Liu Z (2020) Facial expression recognition algorithm based on parameter adaptive initialization of CNN and LSTM. Visual Comput 36(3):483–498. https://doi.org/10.1007/s00371-019-01635-4
Arkin E, Yadikar N, Xu X, Aysa A, Ubul K (2023) A survey: object detection methods from CNN to transformer. Multimedia Tools Appl 82(14):21353–21383. https://doi.org/10.1007/s11042-022-13801-3
Cao W, Zhang K, Wu H, Xu T, Chen E, Lv G, He M (2022) Video emotion analysis enhanced by recognizing emotion in video comments. Int J Data Sci Analytics 14(2):175–189. https://doi.org/10.1007/s41060-022-00317-0
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-End object detection with transformers. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer vision – ECCV 2020. ECCV 2020. Lecture notes in computer science(), vol 12346. Springer, Cham, https://doi.org/10.1007/978-3-030-58452-8_13
Chalapathi MV, Kumar MR, Sharma N, Shitharth S (2022) Ensemble learning by high-dimensional acoustic features for emotion recognition from speech audio signal. Secur Commun Netw 2022(1):8777026
Chen B et al (2021) Transformer-Based language model fine-tuning methods for covid-19 fake news detection. In: Chakraborty T, Shu K, Bernard HR, Liu H, Akhtar MS (eds) Combating online hostile posts in regional languages during emergency situation. CONSTRAINT 2021. Communications in computer and information science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_9
Chen H, Shi H, Liu X, Li X, Zhao G (2023) SMG: a micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis. Int J Comput Vision 131(6):1346–1366. https://doi.org/10.1007/s11263-023-01761-6
Chen J, Ro T, Zhu Z (2022) Emotion recognition with audio, video, EEG, and EMG: a dataset and baseline approaches. IEEE Access 10:13229–13242
Chen S, Guo X, Wu T, Ju X (2020) Exploring the online doctor-patient interaction on patient satisfaction based on text mining and empirical analysis. Inform Process Manage 57(5):102253
Chen SY, Wang J-H (2021) Individual differences and personalized learning: a review and appraisal. Univ Access Inf Soc 20(4):833–849. https://doi.org/10.1007/s10209-020-00753-4
Cheng Y, Yao L, Xiang G, Zhang G, Tang T, Zhong L (2020) Text sentiment orientation analysis based on multi-channel CNN and bidirectional GRU with attention mechanism. IEEE Access 8:134964–134975. https://doi.org/10.1109/ACCESS.2020.3005823
Cohen J (1960) A coefficient of agreement for nominal scales. Educational Psychol Meas 20(1):37–46
Cuadra A, Wang M, Stein LA, Jung MF, Dell N, Estrin D, Landay JA (2024) The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction. In: Proceedings of the 2024 CHI conference on human factors in computing systems (CHI '24). Association for computing machinery, New York, USA, Article 446, 1–18. https://doi.org/10.1145/3613904.3642336
Dai W, Cahyawijaya S, Liu Z, Fung P (2021) Multimodal end-to-end sparse model for emotion recognition. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: Human language technologies, Online. Association for computational pages 5305–5316. https://doi.org/10.18653/v1/2021.naacl-main.417
Dai Y, Gao Y, Liu F (2021) TransMed: transformers advance multi-modal medical image classification. Diagnostics 11(8):1384
Dewangan SK, Choubey S, Patra J, Choubey A (2024) IMU-CNN: implementing remote sensing image restoration framework based on mask-upgraded Cascade R-CNN and deep autoencoder. Multimedi Tools Appl. https://doi.org/10.1007/s11042-024-18122-1
Dey A, Chattopadhyay S, Singh PK, Ahmadian A, Ferrara M, Sarkar R (2020) A hybrid Meta-heuristic feature selection method using golden ratio and equilibrium optimization algorithms for speech emotion recognition. IEEE Access 8:200953–200970. https://doi.org/10.1109/ACCESS.2020.3035531
Dozio N, Marcolin F, Scurati GW, Ulrich L, Nonis F, Vezzetti E, Ferrise F (2022) A design methodology for affective virtual reality. Int J Hum Comput Stud 162:102791. https://doi.org/10.1016/j.ijhcs.2022.102791
Egger M, Ley M, Hanke S (2019) Emotion recognition from physiological signal analysis: a review. Electron Notes Theor Comput Sci 343:35–55
Ekman P (1999) Basic emotions. Handb Cognition Emot 98(45–60):16
Fahad M, Deepak A, Pradhan G, Yadav J (2021) DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst Signal Process 40(1):466–489
Fernández-Blanco Martín G, Matía F, García Gómez-Escalonilla L, Galan D, Sánchez-Escribano MG, de la Puente P, Rodríguez-Cantelar M (2023) An emotional model based on fuzzy logic and social psychology for a personal assistant robot. Appl Sci 13(5):3284. https://doi.org/10.3390/app13053284
Filippini C, Perpetuini D, Cardone D, Chiarelli AM, Merla A (2020) Thermal infrared imaging-based affective computing and its application to facilitate human robot interaction: a review. Appl Sci 10(8):2924
Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. EducationaPsychol Meas 33(3):613–619
Gómez-Cañón JS, Cano E, Eerola T, Herrera P, Hu X, Yang YH, Gómez E (2021) Music emotion recognition: toward new, robust standards in personalized and context-sensitive applications. IEEE Signal Process Mag 38(6):106–114. https://doi.org/10.1109/MSP.2021.3106232
Gong Y, Lai C-I, Chung Y-A, Glass J (2022) SSAST: self-supervised audio spectrogram transformer. Proc AAAI Conf Artif Intell 36(10):10699–10709. https://doi.org/10.1609/aaai.v36i10.21315
Greco CM, Tagarelli A (2023) Bringing order into the realm of transformer-based language models for artificial intelligence and law. Artif Intell Law. https://doi.org/10.1007/s10506-023-09374-7
Han W, Chen H, Gelbukh A, Zadeh A, Morency L-p, Poria S (2021) Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. Paper presented at the Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal, QC, Canada. https://doi.org/10.1145/3462244.3479919
Hayajneh AM, Aldalahmeh SA, Alasali F, Al-Obiedollah H, Zaidi SA, McLernon D (2024) Tiny machine learning on the edge: a framework for transfer learning empowered unmanned aerial vehicle assisted smart farming. IET Smart Cities 6(1):10–26
Hema C, Garcia Marquez FP (2023) Emotional speech recognition using CNN and Deep learning techniques. Appl Acoust 211:109492. https://doi.org/10.1016/j.apacoust.2023.109492
Ho M-T, Mantello P, Nguyen H-KT, Vuong Q-H (2021) Affective computing scholarship and the rise of China: a view from 25 years of bibliometric data. Humanit Social Sci Commun 8(1):282. https://doi.org/10.1057/s41599-021-00959-8
Hong A, Lunscher N, Hu T, Tsuboi Y, Zhang X, Alves SFdR, Benhabib B (2021) A multimodal emotional human–robot interaction architecture for social robots engaged in bidirectional communication. IEEE Trans Cybern 51(12):5954–5968. https://doi.org/10.1109/TCYB.2020.2974688
Hong SR, Hullman J, Bertini E (2020) Human factors in model interpretability: industry practices, challenges, and needs. Proc ACM on Human-Comp Inter 4(CSCW1):1–26
Hou C, Li Z, Wu J (2022) Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. Appl Intell 52(7):7670–7685. https://doi.org/10.1007/s10489-021-02804-6
Huddar MG, Sannakki SS, Rajpurohit VS (2021) Attention-based multi-modal sentiment analysis and emotion detection in conversation using RNN 6(6). https://doi.org/10.9781/ijimai.2020.07.004
Jamil S, Jalil Piran M, Kwon O-J (2023) A comprehensive survey of transformers for computer vision. Drones 7(5). https://doi.org/10.3390/drones7050287
Jeste DV, Graham SA, Nguyen TT, Depp CA, Lee EE, Kim H-C (2020) Beyond artificial intelligence: exploring artificial wisdom. Int Psychogeriatr 32(8):993–1001. https://doi.org/10.1017/S1041610220000927
Kattenborn T, Leitloff J, Schiefer F, Hinz S (2021) Review on convolutional neural networks (CNN) in vegetation remote sensing. ISPRS J Photogrammetry Remote Sens 173:24–49
Keltner D, Tracy JL, Sauter D, Cowen A (2019) What basic emotion theory really says for the twenty-first century study of emotion. J Nonverbal Behav 43(2):195–201
Kim M, Qiu X, Wang Y (2024) Interrater agreement in genre analysis: a methodological review and a comparison of three measures. Res Methods Appl Linguistics 3(1):100097. https://doi.org/10.1016/j.rmal.2024.100097
Krieglstein F, Beege M, Rey GD, Sanchez-Stockhammer C, Schneider S (2023) Development and validation of a theory-based questionnaire to measure different types of cognitive load. Education Psychol Rev 35(1):9. https://doi.org/10.1007/s10648-023-09738-0
Kumar P, Malik S, Raman B (2024) Interpretable multimodal emotion recognition using hybrid fusion of speech and image data. Multime Tools Appl 83(10):28373–28394. https://doi.org/10.1007/s11042-023-16443-1
Kuratko DF, Fisher G, Audretsch DB (2021) Unraveling the entrepreneurial mindset. Small Bus Econ 57(4):1681–1691. https://doi.org/10.1007/s11187-020-00372-6
Lai Y, Zhang L, Han D, Zhou R, Wang G (2020) Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web 23(5):2771–2787
Lashgari E, Liang D, Maoz U (2020) Data augmentation for deep-learning-based electroencephalography. J Neurosci Methods 346:108885. https://doi.org/10.1016/j.jneumeth.2020.108885
Li Z, Zhou Y, Liu Z, Zhu F, Yang C, Hu S (2023) QAP: quantum-inspired adaptive-priority-learning model for multimodal emotion recognition. In: Findings of the association for computational linguistics: ACL 2023, pages 12191–12204, Toronto, Canada. association for computational linguistics. https://doi.org/10.18653/v1/2023.findings-acl.772
Lio W, Liu B (2020) Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput 24(13):9351–9360. https://doi.org/10.1007/s00500-020-04951-3
Liu J, Ang MC, Chaw JK, Kor A-L, Ng KW (2023) Emotion assessment and application in human–computer interaction interface based on backpropagation neural network and artificial bee colony algorithm. Expert Syst Appl 232:120857. https://doi.org/10.1016/j.eswa.2023.120857
Liu J, Ang MC, Chaw JK, Ng KW, Kor AL (2024) The emotional state transition model empowered by genetic hybridization technology on human–robot interaction. IEEE Access 12:105999–106012. https://doi.org/10.1109/ACCESS.2024.3434689
Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, Long M (2023) iTransformer: Inverted transformers are effective for time series forecasting. In: The twelfth international conference on learning representations. arXiv preprint arXiv:2310.06625
Liu Z, Xu W, Zhang W, Jiang Q (2023) An emotion-based personalized music recommendation framework for emotion improvement. Inf Process Manag 60(3):103256. https://doi.org/10.1016/j.ipm.2022.103256
Luna-Jiménez C, Kleinlein R, Griol D, Callejas Z, Montero JM, Fernández-Martínez F (2022) A proposal for multimodal emotion recognition using aural transformers and action units on RAVDESS dataset. Appl Sci 12(1). https://doi.org/10.3390/app12010327
Luo W, Xu M, Lai H (2023) Multimodal Reconstruct and align net for missing modality problem in sentiment analysis. In: Dang-Nguyen DT et al. MultiMedia Modeling. MMM 2023. Lecture notes in computer science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_34
Luo Y, Fu Q, Xie J, Qin Y, Wu G, Liu J, Ding X (2020) EEG-based emotion classification using spiking neural networks. IEEE Access 8:46007–46016. https://doi.org/10.1109/ACCESS.2020.2978163
Luo Y, Ye J, Adams RB, Li J, Newman MG, Wang JZ (2020) ARBEE: towards automated recognition of bodily expression of emotion in the wild. Int J Comput Vision 128(1):1–25. https://doi.org/10.1007/s11263-019-01215-y
Majeed A, Beg MO, Arshad U, Mujtaba H (2022) Deep-EmoRU: mining emotions from roman Urdu text using deep learning ensemble. Multimedia Tools Appl 81(30):43163–43188. https://doi.org/10.1007/s11042-022-13147-w
Masuyama N, Loo CK, Seera M (2018) Personality affected robotic emotional model with associative memory for human-robot interaction. Neurocomputing 272:213–225
Mehta D, Siddiqui MFH, Javaid AY (2018) Facial emotion recognition: a Survey and Real-World user. Experiences Mixed Real 18(2):416
Middya AI, Nag B, Roy S (2022) Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowl Based Syst 244:108580. https://doi.org/10.1016/j.knosys.2022.108580
Mostefai B, Balla A, Trigano P (2019) A generic and efficient emotion-driven approach toward personalized assessment and adaptation in serious games. Cogn Syst Res 56:82–106. https://doi.org/10.1016/j.cogsys.2019.03.006
Muralitharan J, Arumugam C (2024) Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents. Neural Comput Appl. https://doi.org/10.1007/s00521-024-09707-w
Nath S, Shahi AK, Martin T, Choudhury N, Mandal R (2024) Speech emotion recognition using machine learning: a comparative analysis. SN Comput Sci 5(4):390. https://doi.org/10.1007/s42979-024-02656-0
Neethirajan S, Reimert I, Kemp B (2021) Measuring farm animal emotions—sensor-based approaches. Sensors 21(2):553. https://doi.org/10.3390/s21020553
Ortlieb SA, Carbon C-C (2019) A functional model of kitsch and art: linking aesthetic appreciation to the dynamics of social motivation. Front Psychol 9:2437. https://doi.org/10.3389/fpsyg.2018.02437
Pan B, Hirota K, Jia Z, Zhao L, Jin X, Dai Y (2023) Multimodal emotion recognition based on feature selection and extreme learning machine in video clips. J Ambient Intell Humaniz Comput 14(3):1903–1917. https://doi.org/10.1007/s12652-021-03407-2
Panda R, Malheiro RM, Paiva RP (2020) Audio features for music emotion recognition: a survey. IEEE Trans Affect Comput 14(1):68–88. https://doi.org/10.1109/TAFFC.2020.3032373
Park S, Kim SP, Whang M (2021) Individual’s Social Perception of virtual avatars embodied with their habitual facial expressions and facial appearance. Sensors 21(17):5986
Pashevich E (2022) Can communication with social robots influence how children develop empathy? Best-evidence synthesis. AI Soc 37(2):579–589. https://doi.org/10.1007/s00146-021-01214-z
Patwardhan N, Marrone S, Sansone C (2023) Transformers in the real world: a survey on NLP applications. Information 14(4):242. https://doi.org/10.3390/info14040242
Pekár J, Pčolár M (2022) Empirical distribution of daily stock returns of selected developing and emerging markets with application to financial risk management. CEJOR 30(2):699–731. https://doi.org/10.1007/s10100-021-00771-4
Poria S, Majumder N, Mihalcea R, Hovy E (2019) Emotion recognition in conversation: research challenges, datasets, and recent advances. IEEE Access 7:100943–100953. https://doi.org/10.1109/ACCESS.2019.2929050
Rahali A, Akhloufi MA (2023) End-to-end transformer-based models in textual-based NLP. AI 4(1):54–110. https://doi.org/10.3390/ai4010004
Rahman W, Hasan MK, Lee S, Zadeh A, Mao C, Morency LP, Hoque E (2020) Integrating multimodal information in large pretrained transformers. Proc Conf Assoc Comput Linguist Meet 2020:2359–2369. https://doi.org/10.18653/v1/2020.acl-main.214
Rao T, Li X, Zhang H, Xu M (2019) Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333:429–439
Reza S, Ferreira MC, Machado J, Tavares JMR (2022) A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Syst Appl 202:117275
Robinson R, Wiley K, Rezaeivahdati A, Klarkowski M, Mandryk RL (2020) "Let's Get physiological, physiological!": A systematic review of affective gaming. In: Proceedings of the annual symposium on computer-human interaction in play (CHI PLAY '20). Association for Computing Machinery, New York, USA, 132–147. https://doi.org/10.1145/3410404.3414227
Rodríguez RA (2024) A novel approach to calculate weighted average cost of capital (WACC) considering debt and firm’s cash flow durations. Managerial Decis Econ 45(2):1154–1179
Sahu LP, Pradhan G (2022) Analysis of short-time magnitude spectra for improving intelligibility assessment of dysarthric speech. Circuits Syst Signal Process 41(10):5676–5698. https://doi.org/10.1007/s00034-022-02047-x
Salehi AW, Khan S, Gupta G, Alabduallah BI, Almjally A, Alsolai H, Mellit A (2023) A study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 15(7). https://doi.org/10.3390/su15075930
Samuel O, Walker G, Salmon P, Filtness A, Stevens N, Mulvihill C, Stanton N (2019) Riding the emotional roller-coaster: using the circumplex model of affect to model motorcycle riders’ emotional state-changes at intersections. Transp Res Part F: Traffic Psychol Behav 66:139–150. https://doi.org/10.1016/j.trf.2019.08.018
Schiffmann M, Thoma A, Richer A (2021). Multi-modal emotion recognition for user adaptation in social robots. In: Zallio M, Raymundo Ibañez C, Hernandez JH (eds) Advances in human factors in robots, unmanned systems and cybersecurity. AHFE 2021. Lecture notes in networks and systems, vol 268. Springer, Cham. https://doi.org/10.1007/978-3-030-79997-7_16
Schoneveld L, Othmani A, Abdelkawy H (2021) Leveraging recent advances in deep learning for audio-visual emotion recognition. Pattern Recognit Lett 146:1–7
Shanmugam M, Ismail NNN, Magalingam P, Hashim NNWN, Singh D (2023) Understanding the use of acoustic measurement and Mel Frequency Cepstral Coefficient (MFCC) features for the classification of depression speech. In: Al-Sharafi MA, Al-Emran M, Tan GW-H, Ooi K-B (eds) Current and future trends on intelligent technology adoption, vol. 1. Springer Nature Switzerland, Cham, pp 345–359
Shi C, Zhang Y, Liu B (2024) A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos. Appl Intell 54(4):3040–3057. https://doi.org/10.1007/s10489-024-05329-w
Shukla J, Barreda-Angeles M, Oliver J, Nandi GC, Puig D (2019) Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Trans Affect Comput 12(4):857–869
Siriwardhana S, Kaluarachchi T, Billinghurst M, Nanayakkara S (2020) Multimodal emotion recognition with transformer-based self supervised feature fusion. IEEE Access 8:176274–176285. https://doi.org/10.1109/ACCESS.2020.3026823
Smith R, Parr T, Friston KJ (2019) Simulating emotions: an active inference model of emotional state inference and emotion concept learning. Front Psychol 10:2844
Stock-Homburg R (2022) Survey of emotions in human–robot interactions: perspectives from robotic psychology on 20 years of research. Int J Social Robot 14(2):389–411. https://doi.org/10.1007/s12369-021-00778-6
Stofa MM, Zulkifley MA, Zainuri MA (2022) Micro-expression-based emotion Recognition using Waterfall Atrous spatial pyramid pooling networks. Sensors 22(12). https://doi.org/10.3390/s22124634
Strauss GP, Zamani Esfahlani F, Raugh IM, Luther L, Sayama H (2023) Markov chain analysis indicates that positive and negative emotions have abnormal temporal interactions during daily life in schizophrenia. J Psychiatr Res 164:344–349. https://doi.org/10.1016/j.jpsychires.2023.06.025
Suhas BN, Mallela J, Illa A, Yamini BK, Atchayaram N, Yadav R, ... Ghosh PK (2020) Speech task based automatic classification of ALS and Parkinson’s disease and their severity using log Mel spectrograms. 2020 international conference on signal processing and communications (SPCOM), Bangalore, India, pp 1–5. https://doi.org/10.1109/SPCOM50965.2020.9179503
Sun L, Lian Z, Liu B, Tao J (2023) Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans Affect Comput 1–17:1. https://doi.org/10.1109/TAFFC.2023.3274829
Tami M, Masri S, Hasasneh A, Tadj C (2024) Transformer-based approach to pathology diagnosis using audio spectrogram. Information 15(5):253. https://doi.org/10.3390/info15050253
Tsai YH, Bai S, Pu Liang P, Kolter JZ, Morency LP, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. Proc Conf Assoc Comput Linguist Meet 2019:6558–6569. https://doi.org/10.18653/v1/p19-1656
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems. 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA
Wang N, Yan L, Wang Y (2019) Review of theoretical research on artificial intelligence. DEStech Trans Comput Sci Eng(Iciti. https://doi.org/10.12783/dtcse/iciti2018/29138
Wang W, Bao H, Huang S, Dong L, Wei F (2020) MiniLMv2: multi-head self-attention relation distillation for compressing pretrained transformers. arXiv preprint arXiv:.15828. https://api.semanticscholar.org/CorpusID:229923069
Wang Y, Shi Y, Zhang F, Wu C, Chan J, Yeh CF, Xiao A (2021) Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp 6778–6782, https://doi.org/10.1109/ICASSP39728.2021.9414087
Wei Q, Huang X, Zhang Y (2023) FV2ES: a fully End2End multimodal system for fast yet effective video emotion recognition inference. IEEE Trans Broadcast 69(1):10–20. https://doi.org/10.1109/TBC.2022.3215245
Verma GK, Tiwary US (2017) Affect representation and recognition in 3D continuous valence–arousal–dominance space. Multimed Tools Appl 76:2159–2183. https://doi.org/10.1007/s11042-015-3119-y
Xin J, Zhou C, Jiang Y, Tang Q, Yang X, Zhou J (2023) A signal recovery method for bridge monitoring system using TVFEMD and encoder-decoder aided LSTM. Measurement 214:112797. https://doi.org/10.1016/j.measurement.2023.112797
Xu D, Tian Z, Lai R, Kong X, Tan Z, Shi W (2020) Deep learning based emotion analysis of microblog texts. Inform Fusion 64:1–11
Xu J, Choi M-C (2023) Can emotional intelligence increase the positive psychological capital and life satisfaction of Chinese university students? Behav Sci 13(7):614. https://doi.org/10.3390/bs13070614
Xu S, Zhang Z, Li L, Zhou Y, Lin D, Zhang M, Liang Z (2023) Functional connectivity profiles of the default mode and visual networks reflect temporal accumulative effects of sustained naturalistic emotional experience. NeuroImage 269:119941. https://doi.org/10.1016/j.neuroimage.2023.119941
Yang B, Shao B, Wu L, Lin X (2022) Multimodal sentiment analysis with unidirectional modality translation. Neurocomputing 467:130–137. https://doi.org/10.1016/j.neucom.2021.09.041
Yang J, Yu Y, Niu D, Guo W, Xu Y (2023) ConFEDE: contrastive feature decomposition for multimodal sentiment analysis. In: proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers), pages 7617–7630, Toronto, Canada. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.421
Yeke S (2023) Digital intelligence as a partner of emotional intelligence in business administration. Asia Pac Manage Rev 28(4):390–400. https://doi.org/10.1016/j.apmrv.2023.01.001
Yu Y, Kim Y-J (2020) Attention-LSTM-attention model for speech emotion recognition and analysis of IEMOCAP Database. Electronics 9(5):713. https://doi.org/10.3390/electronics9050713
Yuvaraj R, Thagavel P, Thomas J, Fogarty J, Ali F (2023) Comprehensive analysis of feature extraction methods for emotion recognition from multichannel EEG recordings. Sensors 23(2):915. https://doi.org/10.3390/s23020915
Zadeh AB, Liang PP, Poria S, Cambria E, Morency L-P (2018) Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), p 2236–2246, Melbourne, Australia. Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1208
Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong T-C, Qu H (2020) Emotioncues: emotion-oriented visual summarization of classroom videos. IEEE Trans Vis Comput Graphics 27(7):3168–3181
Zhang J, Yin Z, Chen P, Nichele S (2020) Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inform Fusion 59:103–126
Zhang L, Xiao F, Cao Z (2023) Multi-channel EEG signals classification via CNN and multi-head self-attention on evidence theory. Inf Sci 642:119107. https://doi.org/10.1016/j.ins.2023.119107
Zhang S, Yang Y, Chen C, Zhang X, Leng Q, Zhao X (2024) Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: a systematic review of recent advancements and future prospects. Expert Syst Appl 237:121692. https://doi.org/10.1016/j.eswa.2023.121692
Zhao H, Jiang J (2022) Role stress, emotional exhaustion, and knowledge hiding: the joint moderating effects of network centrality and structural holes. Curr Psychol 41(12):8829–8841. https://doi.org/10.1007/s12144-021-01348-9
Zhou J, Wu Z, Wang Q, Yu Z (2022) Fault diagnosis method of Smart Meters based on DBN-CapsNet. Electronics 11(10). https://doi.org/10.3390/electronics11101603
Zhou J, Zhao T, Xie Y, Xiao F, Sun L (2022) Emotion recognition based on brain connectivity reservoir and valence lateralization for cyber-physical-social systems. Pattern Recognit Lett 161:154–160. https://doi.org/10.1016/j.patrec.2022.08.009
Zhuang X, Liu F, Hou J, Hao J, Cai X (2022) Transformer-based interactive multi-modal attention network for video sentiment detection. Neural Process Lett 54(3):1943–1960. https://doi.org/10.1007/s11063-021-10713-5
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no competing interests to declare in relation to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, J., Ang, M.C., Chaw, J.K. et al. Personalized emotion analysis based on fuzzy multi-modal transformer model. Appl Intell 55, 227 (2025). https://doi.org/10.1007/s10489-024-05954-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05954-5