Skip to main content

Advertisement

Personalized emotion analysis based on fuzzy multi-modal transformer model

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Analyzing and detecting human intensions and emotions are important means to improve the communication between users and machines in the areas of human-computer interaction (HCI) and human-robot interaction (HRI). Despite significant progress in utilizing state-of-the-art (SOTA) Transformer-based models, various obstacles persist in managing complicated input interdependencies and extracting intricate contextual semantics. Moreover, it lacks practical applicability and struggles to accurately capture and effectively manage the inherent complexity and unpredictability of human emotions. In recognition of the identified research gaps, we introduce a robust and innovative fuzzy multi-modal Transformer (FMMT) model. Our novel fuzzy Transformer model uniquely heightens the comprehension of emotional contexts by concurrently analyzing audio, visual, and text data through three distinct branches. By incorporating fuzzy mathematic theory and introducing a unique temporal embedding technique to trace the evolution of emotional states, it effectively handles the inherent uncertainty in human emotions, thereby filling a significant void in emotional AI. Building upon the FMMT model, we further explored the emotion expression approach. Furthermore, performance comparison analysis with SOTA baseline methods and detailed ablation study were performed. The results show that the proposed FMMT performs better than the baseline methods. Finally, we conducted detailed experimental verification and empirical analyses of the practicality of the designed method by verifying uncertainty emotion and analyzing emotional state transitions combined with personalized factor. Overall, our research makes a significant contribution to emotion analysis through the implementation of a novel fuzzy Transformer model. This model enhances emotion perception and advances the methods for analyzing emotional expression, thus setting an edge over prior studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The participants of this study did not give written consent for their data to be shared publicly; therefore, the data are not available.

References

  1. Abbasimehr H, Paki R (2022) Improving time series forecasting using LSTM and attention models. J Ambient Intell Humaniz Comput 13(1):673–691. https://doi.org/10.1007/s12652-020-02761-x

    Article  Google Scholar 

  2. Ahmad A, Singh V, Upreti K (2024) A systematic study on unimodal and multimodal human computer interface for emotion recognition. In: García Márquez FP, Jamil A, Ramirez IS, Eken S, Hameed AA (eds) Computing, internet of things and data analytics. ICCIDA 2023. Studies in computational intelligence, vol 1145. Springer, Cham, https://doi.org/10.1007/978-3-031-53717-2_35 

  3. Ahmed N, Aghbari ZA, Girija S (2023) A systematic survey on multimodal emotion recognition using learning algorithms. Intell Syst Appl 17:200171. https://doi.org/10.1016/j.iswa.2022.200171

    Article  MATH  Google Scholar 

  4. Albadr MAA, Tiun S, Ayob M, Al-Dhief FT, Omar K, Maen MK (2022) Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimedia Tools Appl 81(17):23963–23989. https://doi.org/10.1007/s11042-022-12747-w

    Article  MATH  Google Scholar 

  5. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74

    Article  Google Scholar 

  6. An F, Liu Z (2020) Facial expression recognition algorithm based on parameter adaptive initialization of CNN and LSTM. Visual Comput 36(3):483–498. https://doi.org/10.1007/s00371-019-01635-4

    Article  MATH  Google Scholar 

  7. Arkin E, Yadikar N, Xu X, Aysa A, Ubul K (2023) A survey: object detection methods from CNN to transformer. Multimedia Tools Appl 82(14):21353–21383. https://doi.org/10.1007/s11042-022-13801-3

    Article  Google Scholar 

  8. Cao W, Zhang K, Wu H, Xu T, Chen E, Lv G, He M (2022) Video emotion analysis enhanced by recognizing emotion in video comments. Int J Data Sci Analytics 14(2):175–189. https://doi.org/10.1007/s41060-022-00317-0

    Article  Google Scholar 

  9. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-End object detection with transformers. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer vision – ECCV 2020. ECCV 2020. Lecture notes in computer science(), vol 12346. Springer, Cham, https://doi.org/10.1007/978-3-030-58452-8_13

  10. Chalapathi MV, Kumar MR, Sharma N, Shitharth S (2022) Ensemble learning by high-dimensional acoustic features for emotion recognition from speech audio signal. Secur Commun Netw 2022(1):8777026

    Google Scholar 

  11. Chen B et al (2021) Transformer-Based language model fine-tuning methods for covid-19 fake news detection. In: Chakraborty T, Shu K, Bernard HR, Liu H, Akhtar MS (eds) Combating online hostile posts in regional languages during emergency situation. CONSTRAINT 2021. Communications in computer and information science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_9

  12. Chen H, Shi H, Liu X, Li X, Zhao G (2023) SMG: a micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis. Int J Comput Vision 131(6):1346–1366. https://doi.org/10.1007/s11263-023-01761-6

    Article  MATH  Google Scholar 

  13. Chen J, Ro T, Zhu Z (2022) Emotion recognition with audio, video, EEG, and EMG: a dataset and baseline approaches. IEEE Access 10:13229–13242

    Article  Google Scholar 

  14. Chen S, Guo X, Wu T, Ju X (2020) Exploring the online doctor-patient interaction on patient satisfaction based on text mining and empirical analysis. Inform Process Manage 57(5):102253

    Article  Google Scholar 

  15. Chen SY, Wang J-H (2021) Individual differences and personalized learning: a review and appraisal. Univ Access Inf Soc 20(4):833–849. https://doi.org/10.1007/s10209-020-00753-4

    Article  MATH  Google Scholar 

  16. Cheng Y, Yao L, Xiang G, Zhang G, Tang T, Zhong L (2020) Text sentiment orientation analysis based on multi-channel CNN and bidirectional GRU with attention mechanism. IEEE Access 8:134964–134975. https://doi.org/10.1109/ACCESS.2020.3005823

    Article  MATH  Google Scholar 

  17. Cohen J (1960) A coefficient of agreement for nominal scales. Educational Psychol Meas 20(1):37–46

    Article  MATH  Google Scholar 

  18. Cuadra A, Wang M, Stein LA, Jung MF, Dell N, Estrin D, Landay JA (2024) The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction. In: Proceedings of the 2024 CHI conference on human factors in computing systems (CHI '24). Association for computing machinery, New York, USA, Article 446, 1–18. https://doi.org/10.1145/3613904.3642336

  19. Dai W, Cahyawijaya S, Liu Z, Fung P (2021) Multimodal end-to-end sparse model for emotion recognition. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: Human language technologies, Online. Association for computational pages 5305–5316. https://doi.org/10.18653/v1/2021.naacl-main.417

  20. Dai Y, Gao Y, Liu F (2021) TransMed: transformers advance multi-modal medical image classification. Diagnostics 11(8):1384

    Article  MATH  Google Scholar 

  21. Dewangan SK, Choubey S, Patra J, Choubey A (2024) IMU-CNN: implementing remote sensing image restoration framework based on mask-upgraded Cascade R-CNN and deep autoencoder. Multimedi Tools Appl. https://doi.org/10.1007/s11042-024-18122-1

    Article  MATH  Google Scholar 

  22. Dey A, Chattopadhyay S, Singh PK, Ahmadian A, Ferrara M, Sarkar R (2020) A hybrid Meta-heuristic feature selection method using golden ratio and equilibrium optimization algorithms for speech emotion recognition. IEEE Access 8:200953–200970. https://doi.org/10.1109/ACCESS.2020.3035531

    Article  Google Scholar 

  23. Dozio N, Marcolin F, Scurati GW, Ulrich L, Nonis F, Vezzetti E, Ferrise F (2022) A design methodology for affective virtual reality. Int J Hum Comput Stud 162:102791. https://doi.org/10.1016/j.ijhcs.2022.102791

    Article  Google Scholar 

  24. Egger M, Ley M, Hanke S (2019) Emotion recognition from physiological signal analysis: a review. Electron Notes Theor Comput Sci 343:35–55

    Article  MATH  Google Scholar 

  25. Ekman P (1999) Basic emotions. Handb Cognition Emot 98(45–60):16

    MATH  Google Scholar 

  26. Fahad M, Deepak A, Pradhan G, Yadav J (2021) DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features. Circuits Syst Signal Process 40(1):466–489

    Article  MATH  Google Scholar 

  27. Fernández-Blanco Martín G, Matía F, García Gómez-Escalonilla L, Galan D, Sánchez-Escribano MG, de la Puente P, Rodríguez-Cantelar M (2023) An emotional model based on fuzzy logic and social psychology for a personal assistant robot. Appl Sci 13(5):3284. https://doi.org/10.3390/app13053284

    Article  Google Scholar 

  28. Filippini C, Perpetuini D, Cardone D, Chiarelli AM, Merla A (2020) Thermal infrared imaging-based affective computing and its application to facilitate human robot interaction: a review. Appl Sci 10(8):2924

    Article  Google Scholar 

  29. Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. EducationaPsychol Meas 33(3):613–619

    MATH  Google Scholar 

  30. Gómez-Cañón JS, Cano E, Eerola T, Herrera P, Hu X, Yang YH, Gómez E (2021) Music emotion recognition: toward new, robust standards in personalized and context-sensitive applications. IEEE Signal Process Mag 38(6):106–114. https://doi.org/10.1109/MSP.2021.3106232

    Article  Google Scholar 

  31. Gong Y, Lai C-I, Chung Y-A, Glass J (2022) SSAST: self-supervised audio spectrogram transformer. Proc AAAI Conf Artif Intell 36(10):10699–10709. https://doi.org/10.1609/aaai.v36i10.21315

    Article  Google Scholar 

  32. Greco CM, Tagarelli A (2023) Bringing order into the realm of transformer-based language models for artificial intelligence and law. Artif Intell Law. https://doi.org/10.1007/s10506-023-09374-7

    Article  MATH  Google Scholar 

  33. Han W, Chen H, Gelbukh A, Zadeh A, Morency L-p, Poria S (2021) Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. Paper presented at the Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal, QC, Canada. https://doi.org/10.1145/3462244.3479919

  34. Hayajneh AM, Aldalahmeh SA, Alasali F, Al-Obiedollah H, Zaidi SA, McLernon D (2024) Tiny machine learning on the edge: a framework for transfer learning empowered unmanned aerial vehicle assisted smart farming. IET Smart Cities 6(1):10–26

    Article  Google Scholar 

  35. Hema C, Garcia Marquez FP (2023) Emotional speech recognition using CNN and Deep learning techniques. Appl Acoust 211:109492. https://doi.org/10.1016/j.apacoust.2023.109492

    Article  MATH  Google Scholar 

  36. Ho M-T, Mantello P, Nguyen H-KT, Vuong Q-H (2021) Affective computing scholarship and the rise of China: a view from 25 years of bibliometric data. Humanit Social Sci Commun 8(1):282. https://doi.org/10.1057/s41599-021-00959-8

    Article  Google Scholar 

  37. Hong A, Lunscher N, Hu T, Tsuboi Y, Zhang X, Alves SFdR, Benhabib B (2021) A multimodal emotional human–robot interaction architecture for social robots engaged in bidirectional communication. IEEE Trans Cybern 51(12):5954–5968. https://doi.org/10.1109/TCYB.2020.2974688

    Article  Google Scholar 

  38. Hong SR, Hullman J, Bertini E (2020) Human factors in model interpretability: industry practices, challenges, and needs. Proc ACM on Human-Comp Inter 4(CSCW1):1–26

    Article  Google Scholar 

  39. Hou C, Li Z, Wu J (2022) Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. Appl Intell 52(7):7670–7685. https://doi.org/10.1007/s10489-021-02804-6

    Article  MATH  Google Scholar 

  40. Huddar MG, Sannakki SS, Rajpurohit VS (2021) Attention-based multi-modal sentiment analysis and emotion detection in conversation using RNN 6(6). https://doi.org/10.9781/ijimai.2020.07.004

  41. Jamil S, Jalil Piran M, Kwon O-J (2023) A comprehensive survey of transformers for computer vision. Drones 7(5). https://doi.org/10.3390/drones7050287

  42. Jeste DV, Graham SA, Nguyen TT, Depp CA, Lee EE, Kim H-C (2020) Beyond artificial intelligence: exploring artificial wisdom. Int Psychogeriatr 32(8):993–1001. https://doi.org/10.1017/S1041610220000927

    Article  Google Scholar 

  43. Kattenborn T, Leitloff J, Schiefer F, Hinz S (2021) Review on convolutional neural networks (CNN) in vegetation remote sensing. ISPRS J Photogrammetry Remote Sens 173:24–49

    Article  Google Scholar 

  44. Keltner D, Tracy JL, Sauter D, Cowen A (2019) What basic emotion theory really says for the twenty-first century study of emotion. J Nonverbal Behav 43(2):195–201

    Article  MATH  Google Scholar 

  45. Kim M, Qiu X, Wang Y (2024) Interrater agreement in genre analysis: a methodological review and a comparison of three measures. Res Methods Appl Linguistics 3(1):100097. https://doi.org/10.1016/j.rmal.2024.100097

    Article  Google Scholar 

  46. Krieglstein F, Beege M, Rey GD, Sanchez-Stockhammer C, Schneider S (2023) Development and validation of a theory-based questionnaire to measure different types of cognitive load. Education Psychol Rev 35(1):9. https://doi.org/10.1007/s10648-023-09738-0

    Article  Google Scholar 

  47. Kumar P, Malik S, Raman B (2024) Interpretable multimodal emotion recognition using hybrid fusion of speech and image data. Multime Tools Appl 83(10):28373–28394. https://doi.org/10.1007/s11042-023-16443-1

    Article  MATH  Google Scholar 

  48. Kuratko DF, Fisher G, Audretsch DB (2021) Unraveling the entrepreneurial mindset. Small Bus Econ 57(4):1681–1691. https://doi.org/10.1007/s11187-020-00372-6

    Article  MATH  Google Scholar 

  49. Lai Y, Zhang L, Han D, Zhou R, Wang G (2020) Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web 23(5):2771–2787

    Article  MATH  Google Scholar 

  50. Lashgari E, Liang D, Maoz U (2020) Data augmentation for deep-learning-based electroencephalography. J Neurosci Methods 346:108885. https://doi.org/10.1016/j.jneumeth.2020.108885

    Article  MATH  Google Scholar 

  51. Li Z, Zhou Y, Liu Z, Zhu F, Yang C, Hu S (2023) QAP: quantum-inspired adaptive-priority-learning model for multimodal emotion recognition. In: Findings of the association for computational linguistics: ACL 2023, pages 12191–12204, Toronto, Canada. association for computational linguistics. https://doi.org/10.18653/v1/2023.findings-acl.772

  52. Lio W, Liu B (2020) Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput 24(13):9351–9360. https://doi.org/10.1007/s00500-020-04951-3

    Article  MATH  Google Scholar 

  53. Liu J, Ang MC, Chaw JK, Kor A-L, Ng KW (2023) Emotion assessment and application in human–computer interaction interface based on backpropagation neural network and artificial bee colony algorithm. Expert Syst Appl 232:120857. https://doi.org/10.1016/j.eswa.2023.120857

    Article  Google Scholar 

  54. Liu J, Ang MC, Chaw JK, Ng KW, Kor AL (2024) The emotional state transition model empowered by genetic hybridization technology on human–robot interaction. IEEE Access 12:105999–106012. https://doi.org/10.1109/ACCESS.2024.3434689

    Article  Google Scholar 

  55. Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, Long M (2023) iTransformer: Inverted transformers are effective for time series forecasting. In: The twelfth international conference on learning representations. arXiv preprint arXiv:2310.06625

  56. Liu Z, Xu W, Zhang W, Jiang Q (2023) An emotion-based personalized music recommendation framework for emotion improvement. Inf Process Manag 60(3):103256. https://doi.org/10.1016/j.ipm.2022.103256

    Article  Google Scholar 

  57. Luna-Jiménez C, Kleinlein R, Griol D, Callejas Z, Montero JM, Fernández-Martínez F (2022) A proposal for multimodal emotion recognition using aural transformers and action units on RAVDESS dataset. Appl Sci 12(1). https://doi.org/10.3390/app12010327

  58. Luo W, Xu M, Lai H (2023) Multimodal Reconstruct and align net for missing modality problem in sentiment analysis. In: Dang-Nguyen DT et al. MultiMedia Modeling. MMM 2023. Lecture notes in computer science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_34

  59. Luo Y, Fu Q, Xie J, Qin Y, Wu G, Liu J, Ding X (2020) EEG-based emotion classification using spiking neural networks. IEEE Access 8:46007–46016. https://doi.org/10.1109/ACCESS.2020.2978163

    Article  Google Scholar 

  60. Luo Y, Ye J, Adams RB, Li J, Newman MG, Wang JZ (2020) ARBEE: towards automated recognition of bodily expression of emotion in the wild. Int J Comput Vision 128(1):1–25. https://doi.org/10.1007/s11263-019-01215-y

    Article  Google Scholar 

  61. Majeed A, Beg MO, Arshad U, Mujtaba H (2022) Deep-EmoRU: mining emotions from roman Urdu text using deep learning ensemble. Multimedia Tools Appl 81(30):43163–43188. https://doi.org/10.1007/s11042-022-13147-w

    Article  Google Scholar 

  62. Masuyama N, Loo CK, Seera M (2018) Personality affected robotic emotional model with associative memory for human-robot interaction. Neurocomputing 272:213–225

    Article  MATH  Google Scholar 

  63. Mehta D, Siddiqui MFH, Javaid AY (2018) Facial emotion recognition: a Survey and Real-World user. Experiences Mixed Real 18(2):416

    MATH  Google Scholar 

  64. Middya AI, Nag B, Roy S (2022) Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowl Based Syst 244:108580. https://doi.org/10.1016/j.knosys.2022.108580

    Article  Google Scholar 

  65. Mostefai B, Balla A, Trigano P (2019) A generic and efficient emotion-driven approach toward personalized assessment and adaptation in serious games. Cogn Syst Res 56:82–106. https://doi.org/10.1016/j.cogsys.2019.03.006

    Article  Google Scholar 

  66. Muralitharan J, Arumugam C (2024) Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents. Neural Comput Appl. https://doi.org/10.1007/s00521-024-09707-w

    Article  MATH  Google Scholar 

  67. Nath S, Shahi AK, Martin T, Choudhury N, Mandal R (2024) Speech emotion recognition using machine learning: a comparative analysis. SN Comput Sci 5(4):390. https://doi.org/10.1007/s42979-024-02656-0

    Article  MATH  Google Scholar 

  68. Neethirajan S, Reimert I, Kemp B (2021) Measuring farm animal emotions—sensor-based approaches. Sensors 21(2):553. https://doi.org/10.3390/s21020553

    Article  Google Scholar 

  69. Ortlieb SA, Carbon C-C (2019) A functional model of kitsch and art: linking aesthetic appreciation to the dynamics of social motivation. Front Psychol 9:2437. https://doi.org/10.3389/fpsyg.2018.02437

    Article  MATH  Google Scholar 

  70. Pan B, Hirota K, Jia Z, Zhao L, Jin X, Dai Y (2023) Multimodal emotion recognition based on feature selection and extreme learning machine in video clips. J Ambient Intell Humaniz Comput 14(3):1903–1917. https://doi.org/10.1007/s12652-021-03407-2

    Article  MATH  Google Scholar 

  71. Panda R, Malheiro RM, Paiva RP (2020) Audio features for music emotion recognition: a survey. IEEE Trans Affect Comput 14(1):68–88. https://doi.org/10.1109/TAFFC.2020.3032373

    Article  MATH  Google Scholar 

  72. Park S, Kim SP, Whang M (2021) Individual’s Social Perception of virtual avatars embodied with their habitual facial expressions and facial appearance. Sensors 21(17):5986

  73. Pashevich E (2022) Can communication with social robots influence how children develop empathy? Best-evidence synthesis. AI Soc 37(2):579–589. https://doi.org/10.1007/s00146-021-01214-z

    Article  Google Scholar 

  74. Patwardhan N, Marrone S, Sansone C (2023) Transformers in the real world: a survey on NLP applications. Information 14(4):242. https://doi.org/10.3390/info14040242

    Article  Google Scholar 

  75. Pekár J, Pčolár M (2022) Empirical distribution of daily stock returns of selected developing and emerging markets with application to financial risk management. CEJOR 30(2):699–731. https://doi.org/10.1007/s10100-021-00771-4

    Article  MATH  Google Scholar 

  76. Poria S, Majumder N, Mihalcea R, Hovy E (2019) Emotion recognition in conversation: research challenges, datasets, and recent advances. IEEE Access 7:100943–100953. https://doi.org/10.1109/ACCESS.2019.2929050

    Article  Google Scholar 

  77. Rahali A, Akhloufi MA (2023) End-to-end transformer-based models in textual-based NLP. AI 4(1):54–110. https://doi.org/10.3390/ai4010004

    Article  Google Scholar 

  78. Rahman W, Hasan MK, Lee S, Zadeh A, Mao C, Morency LP, Hoque E (2020) Integrating multimodal information in large pretrained transformers. Proc Conf Assoc Comput Linguist Meet 2020:2359–2369. https://doi.org/10.18653/v1/2020.acl-main.214

  79. Rao T, Li X, Zhang H, Xu M (2019) Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333:429–439

    Article  MATH  Google Scholar 

  80. Reza S, Ferreira MC, Machado J, Tavares JMR (2022) A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Syst Appl 202:117275

    Article  Google Scholar 

  81. Robinson R, Wiley K, Rezaeivahdati A, Klarkowski M, Mandryk RL (2020) "Let's Get physiological, physiological!": A systematic review of affective gaming. In: Proceedings of the annual symposium on computer-human interaction in play (CHI PLAY '20). Association for Computing Machinery, New York, USA, 132–147. https://doi.org/10.1145/3410404.3414227

  82. Rodríguez RA (2024) A novel approach to calculate weighted average cost of capital (WACC) considering debt and firm’s cash flow durations. Managerial Decis Econ 45(2):1154–1179

    Article  MATH  Google Scholar 

  83. Sahu LP, Pradhan G (2022) Analysis of short-time magnitude spectra for improving intelligibility assessment of dysarthric speech. Circuits Syst Signal Process 41(10):5676–5698. https://doi.org/10.1007/s00034-022-02047-x

    Article  MATH  Google Scholar 

  84. Salehi AW, Khan S, Gupta G, Alabduallah BI, Almjally A, Alsolai H, Mellit A (2023) A study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 15(7). https://doi.org/10.3390/su15075930

  85. Samuel O, Walker G, Salmon P, Filtness A, Stevens N, Mulvihill C, Stanton N (2019) Riding the emotional roller-coaster: using the circumplex model of affect to model motorcycle riders’ emotional state-changes at intersections. Transp Res Part F: Traffic Psychol Behav 66:139–150. https://doi.org/10.1016/j.trf.2019.08.018

    Article  Google Scholar 

  86. Schiffmann M, Thoma A, Richer A (2021). Multi-modal emotion recognition for user adaptation in social robots. In: Zallio M, Raymundo Ibañez C, Hernandez JH (eds) Advances in human factors in robots, unmanned systems and cybersecurity. AHFE 2021. Lecture notes in networks and systems, vol 268. Springer, Cham. https://doi.org/10.1007/978-3-030-79997-7_16

  87. Schoneveld L, Othmani A, Abdelkawy H (2021) Leveraging recent advances in deep learning for audio-visual emotion recognition. Pattern Recognit Lett 146:1–7

    Article  MATH  Google Scholar 

  88. Shanmugam M, Ismail NNN, Magalingam P, Hashim NNWN, Singh D (2023) Understanding the use of acoustic measurement and Mel Frequency Cepstral Coefficient (MFCC) features for the classification of depression speech. In: Al-Sharafi MA, Al-Emran M, Tan GW-H, Ooi K-B (eds) Current and future trends on intelligent technology adoption, vol. 1. Springer Nature Switzerland, Cham, pp 345–359

  89. Shi C, Zhang Y, Liu B (2024) A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos. Appl Intell 54(4):3040–3057. https://doi.org/10.1007/s10489-024-05329-w

    Article  MATH  Google Scholar 

  90. Shukla J, Barreda-Angeles M, Oliver J, Nandi GC, Puig D (2019) Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Trans Affect Comput 12(4):857–869

    Article  Google Scholar 

  91. Siriwardhana S, Kaluarachchi T, Billinghurst M, Nanayakkara S (2020) Multimodal emotion recognition with transformer-based self supervised feature fusion. IEEE Access 8:176274–176285. https://doi.org/10.1109/ACCESS.2020.3026823

    Article  Google Scholar 

  92. Smith R, Parr T, Friston KJ (2019) Simulating emotions: an active inference model of emotional state inference and emotion concept learning. Front Psychol 10:2844

    Article  MATH  Google Scholar 

  93. Stock-Homburg R (2022) Survey of emotions in human–robot interactions: perspectives from robotic psychology on 20 years of research. Int J Social Robot 14(2):389–411. https://doi.org/10.1007/s12369-021-00778-6

    Article  MATH  Google Scholar 

  94. Stofa MM, Zulkifley MA, Zainuri MA (2022) Micro-expression-based emotion Recognition using Waterfall Atrous spatial pyramid pooling networks. Sensors 22(12). https://doi.org/10.3390/s22124634

  95. Strauss GP, Zamani Esfahlani F, Raugh IM, Luther L, Sayama H (2023) Markov chain analysis indicates that positive and negative emotions have abnormal temporal interactions during daily life in schizophrenia. J Psychiatr Res 164:344–349. https://doi.org/10.1016/j.jpsychires.2023.06.025

    Article  Google Scholar 

  96. Suhas BN, Mallela J, Illa A, Yamini BK, Atchayaram N, Yadav R, ... Ghosh PK (2020) Speech task based automatic classification of ALS and Parkinson’s disease and their severity using log Mel spectrograms. 2020 international conference on signal processing and communications (SPCOM), Bangalore, India, pp 1–5. https://doi.org/10.1109/SPCOM50965.2020.9179503

  97. Sun L, Lian Z, Liu B, Tao J (2023) Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans Affect Comput 1–17:1. https://doi.org/10.1109/TAFFC.2023.3274829

    Article  MATH  Google Scholar 

  98. Tami M, Masri S, Hasasneh A, Tadj C (2024) Transformer-based approach to pathology diagnosis using audio spectrogram. Information 15(5):253. https://doi.org/10.3390/info15050253

    Article  Google Scholar 

  99. Tsai YH, Bai S, Pu Liang P, Kolter JZ, Morency LP, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. Proc Conf Assoc Comput Linguist Meet 2019:6558–6569. https://doi.org/10.18653/v1/p19-1656

    Article  Google Scholar 

  100. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I (2017) Attention is all you need. Advances in Neural Information Processing Systems. 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA

  101. Wang N, Yan L, Wang Y (2019) Review of theoretical research on artificial intelligence. DEStech Trans Comput Sci Eng(Iciti. https://doi.org/10.12783/dtcse/iciti2018/29138

  102. Wang W, Bao H, Huang S, Dong L, Wei F (2020) MiniLMv2: multi-head self-attention relation distillation for compressing pretrained transformers. arXiv preprint arXiv:.15828. https://api.semanticscholar.org/CorpusID:229923069

  103. Wang Y, Shi Y, Zhang F, Wu C, Chan J, Yeh CF, Xiao A (2021) Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp 6778–6782, https://doi.org/10.1109/ICASSP39728.2021.9414087

  104. Wei Q, Huang X, Zhang Y (2023) FV2ES: a fully End2End multimodal system for fast yet effective video emotion recognition inference. IEEE Trans Broadcast 69(1):10–20. https://doi.org/10.1109/TBC.2022.3215245

    Article  MATH  Google Scholar 

  105. Verma GK, Tiwary US (2017) Affect representation and recognition in 3D continuous valence–arousal–dominance space. Multimed Tools Appl 76:2159–2183. https://doi.org/10.1007/s11042-015-3119-y

  106. Xin J, Zhou C, Jiang Y, Tang Q, Yang X, Zhou J (2023) A signal recovery method for bridge monitoring system using TVFEMD and encoder-decoder aided LSTM. Measurement 214:112797. https://doi.org/10.1016/j.measurement.2023.112797

    Article  MATH  Google Scholar 

  107. Xu D, Tian Z, Lai R, Kong X, Tan Z, Shi W (2020) Deep learning based emotion analysis of microblog texts. Inform Fusion 64:1–11

    Article  MATH  Google Scholar 

  108. Xu J, Choi M-C (2023) Can emotional intelligence increase the positive psychological capital and life satisfaction of Chinese university students? Behav Sci 13(7):614. https://doi.org/10.3390/bs13070614

    Article  MATH  Google Scholar 

  109. Xu S, Zhang Z, Li L, Zhou Y, Lin D, Zhang M, Liang Z (2023) Functional connectivity profiles of the default mode and visual networks reflect temporal accumulative effects of sustained naturalistic emotional experience. NeuroImage 269:119941. https://doi.org/10.1016/j.neuroimage.2023.119941

    Article  Google Scholar 

  110. Yang B, Shao B, Wu L, Lin X (2022) Multimodal sentiment analysis with unidirectional modality translation. Neurocomputing 467:130–137. https://doi.org/10.1016/j.neucom.2021.09.041

    Article  MATH  Google Scholar 

  111. Yang J, Yu Y, Niu D, Guo W, Xu Y (2023) ConFEDE: contrastive feature decomposition for multimodal sentiment analysis. In: proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers), pages 7617–7630, Toronto, Canada. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.421

  112. Yeke S (2023) Digital intelligence as a partner of emotional intelligence in business administration. Asia Pac Manage Rev 28(4):390–400. https://doi.org/10.1016/j.apmrv.2023.01.001

    Article  Google Scholar 

  113. Yu Y, Kim Y-J (2020) Attention-LSTM-attention model for speech emotion recognition and analysis of IEMOCAP Database. Electronics 9(5):713. https://doi.org/10.3390/electronics9050713

    Article  MATH  Google Scholar 

  114. Yuvaraj R, Thagavel P, Thomas J, Fogarty J, Ali F (2023) Comprehensive analysis of feature extraction methods for emotion recognition from multichannel EEG recordings. Sensors 23(2):915. https://doi.org/10.3390/s23020915

    Article  Google Scholar 

  115. Zadeh AB, Liang PP, Poria S, Cambria E, Morency L-P (2018) Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), p 2236–2246, Melbourne, Australia. Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1208

  116. Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong T-C, Qu H (2020) Emotioncues: emotion-oriented visual summarization of classroom videos. IEEE Trans Vis Comput Graphics 27(7):3168–3181

    Article  MATH  Google Scholar 

  117. Zhang J, Yin Z, Chen P, Nichele S (2020) Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inform Fusion 59:103–126

    Article  MATH  Google Scholar 

  118. Zhang L, Xiao F, Cao Z (2023) Multi-channel EEG signals classification via CNN and multi-head self-attention on evidence theory. Inf Sci 642:119107. https://doi.org/10.1016/j.ins.2023.119107

    Article  MATH  Google Scholar 

  119. Zhang S, Yang Y, Chen C, Zhang X, Leng Q, Zhao X (2024) Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: a systematic review of recent advancements and future prospects. Expert Syst Appl 237:121692. https://doi.org/10.1016/j.eswa.2023.121692

    Article  Google Scholar 

  120. Zhao H, Jiang J (2022) Role stress, emotional exhaustion, and knowledge hiding: the joint moderating effects of network centrality and structural holes. Curr Psychol 41(12):8829–8841. https://doi.org/10.1007/s12144-021-01348-9

    Article  MATH  Google Scholar 

  121. Zhou J, Wu Z, Wang Q, Yu Z (2022) Fault diagnosis method of Smart Meters based on DBN-CapsNet. Electronics 11(10). https://doi.org/10.3390/electronics11101603

  122. Zhou J, Zhao T, Xie Y, Xiao F, Sun L (2022) Emotion recognition based on brain connectivity reservoir and valence lateralization for cyber-physical-social systems. Pattern Recognit Lett 161:154–160. https://doi.org/10.1016/j.patrec.2022.08.009

    Article  MATH  Google Scholar 

  123. Zhuang X, Liu F, Hou J, Hao J, Cai X (2022) Transformer-based interactive multi-modal attention network for video sentiment detection. Neural Process Lett 54(3):1943–1960. https://doi.org/10.1007/s11063-021-10713-5

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei Choo Ang.

Ethics declarations

Conflict of interest

The author has no competing interests to declare in relation to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Ang, M.C., Chaw, J.K. et al. Personalized emotion analysis based on fuzzy multi-modal transformer model. Appl Intell 55, 227 (2025). https://doi.org/10.1007/s10489-024-05954-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05954-5

Keywords