Skip to main content

Advertisement

UEFN: Efficient uncertainty estimation fusion network for reliable multimodal sentiment analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The rapid evolution of the digital era has greatly transformed social media, resulting in more diverse emotional expressions and increasingly complex public discourse. Consequently, identifying relationships within multimodal data has become increasingly challenging. Most current multimodal sentiment analysis (MSA) methods concentrate on merging data from diverse modalities into an integrated feature representation to enhance recognition performance by leveraging the complementary nature of multimodal data. However, these approaches often overlook prediction reliability. To address this, we propose the uncertainty estimation fusion network (UEFN), a reliable MSA method based on uncertainty estimation. UEFN combines the Dirichlet distribution and Dempster-Shafer evidence theory (DSET) to predict the probability distribution and uncertainty of text, speech, and image modalities, fusing the predictions at the decision level. Specifically, the method first represents the contextual features of text, speech, and image modalities separately. It then employs a fully connected neural network to transform features from different modalities into evidence forms. Subsequently, it parameterizes the evidence of different modalities via the Dirichlet distribution and estimates the probability distribution and uncertainty for each modality. Finally, we use DSET to fuse the predictions, obtaining the sentiment analysis results and uncertainty estimation, referred to as the multimodal decision fusion layer (MDFL). Additionally, on the basis of the modality uncertainty generated by subjective logic theory, we calculate feature weights, apply them to the corresponding features, concatenate the weighted features, and feed them into a feedforward neural network for sentiment classification, forming the adaptive weight fusion layer (AWFL). Both MDFL and AWFL are then used for multitask training. Experimental comparisons demonstrate that the UEFN not only achieves excellent performance but also provides uncertainty estimation along with the predictions, enhancing the reliability and interpretability of the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets supporting our research are publicly accessible at the following URLs: (https://paperswithcode.com/dataset/cmu-mosi) and (https://paperswithcode.com/dataset/cmu-mosei).

Notes

  1. https://github.com/CMU-MultiComp-Lab/CMU-MultimodalSDK

  2. https://imotions.com/products/imotions-lab/modules/fea-facial-expression-analysis/

References

  1. Zhou X, Liang W, Luo Z, Pan Y (2021) Periodic-aware intelligent prediction model for information diffusion in social networks. IEEE Transactions on Network Science and Engineering. 8(2):894–904

    Article  MATH  Google Scholar 

  2. Wang S, Shibghatullah AS, Iqbal TJ, Keoy KH (2024) A review of multimodal-based emotion recognition techniques for cyberbullying detection in online social media platforms. Neural Computing and Applications 1–34

  3. Lu Q, Sun X, Long Y, Gao Z, Feng J, Sun T (2023) Sentiment analysis: Comprehensive reviews, recent advances, and open challenges. IEEE Transactions on Neural Networks and Learning Systems

  4. Singh U, Abhishek K, Azad HK (2024) A survey of cutting-edge multimodal sentiment analysis. ACM Comput Surv 56(9):1–38

    Article  MATH  Google Scholar 

  5. Gandhi A, Adhvaryu K, Poria S, Cambria E, Hussain A (2023) Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion 91:424–444. https://doi.org/10.1016/j.inffus.2022.09.025

  6. Zeng Y, Li Z, Chen Z, Ma H (2024) A feature-based restoration dynamic interaction network for multimodal sentiment analysis. Engineering Applications of Artificial Intelligence 127:107335. https://doi.org/10.1016/j.engappai.2023.107335

  7. Liu Y, Zhang J (2024) Service function chain embedding meets machine learning: Deep reinforcement learning approach. IEEE Trans Netw Serv Manage 21(3):3465–3481. https://doi.org/10.1109/TNSM.2024.3353808

    Article  MATH  Google Scholar 

  8. Zhang J, Liu Y, Ding G, Tang B, Chen Y (2024) Adaptive decomposition and extraction network of individual fingerprint features for specific emitter identification. IEEE Transactions on Information Forensics and Security 19:8515–8528. https://doi.org/10.1109/TIFS.2024.3427361

  9. Zhang J, Liu Y, Ding G, Tang B, Chen Y (2024) Adaptive decomposition and extraction network of individual fingerprint features for specific emitter identification. IEEE Transactions on Information Forensics and Security 19:8515–8528. https://doi.org/10.1109/TIFS.2024.3427361

  10. Xiao Z, Xing H, Qu R, Li H, Feng L, Zhao B, Yang J (2024) Self-bidirectional decoupled distillation for time series classification. IEEE Transactions on Artificial Intelligence. 5(8):4101–4110. https://doi.org/10.1109/TAI.2024.3360180

    Article  MATH  Google Scholar 

  11. Xiao Z, Tong H, Qu R, Xing H, Luo S, Zhu Z, Song F, Feng L (2023) Capmatch: Semi-supervised contrastive transformer capsule with feature-based knowledge distillation for human activity recognition. IEEE Transactions on Neural Networks and Learning Systems 1–15. https://doi.org/10.1109/TNNLS.2023.3344294

  12. Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR et al (2021) A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information fusion. 76:243–297

    Article  Google Scholar 

  13. Olivier A, Shields MD, Graham-Brady L (2021) Bayesian neural networks for uncertainty quantification in data-driven materials modeling. Computer Methods in Applied Mechanics and Engineering 386:114079. https://doi.org/10.1016/j.cma.2021.114079

  14. Xu C, Zhong P-A, Zhu F, Yang L, Wang S, Wang Y (2023) Real-time error correction for flood forecasting based on machine learning ensemble method and its uncertainty assessment. Stoch Env Res Risk Assess 37(4):1557–1577. https://doi.org/10.1007/s00477-022-02336-6

    Article  MATH  Google Scholar 

  15. Alarab I, Prakoonwit S, Nacer MI (2021) Illustrative discussion of mc-dropout in general dataset: uncertainty estimation in bitcoin. Neural Process Lett 53(2):1001–1011. https://doi.org/10.1007/s11063-021-10424-x

    Article  MATH  Google Scholar 

  16. Son J, Kang S (2023) Efficient improvement of classification accuracy via selective test-time augmentation. Information Sciences 642:119148. https://doi.org/10.1016/j.ins.2023.119148

  17. Kaur R, Kautish S (2022) Multimodal sentiment analysis: A survey and comparison. Research anthology on implementing sentiment analysis across multiple disciplines 1846–1870

  18. Dey RK, Das AK (2023) Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimedia Tools and Applications. 82(21):32967–32990

    Article  MATH  Google Scholar 

  19. Dey RK, Das AK (2024) Neighbour adjusted dispersive flies optimization based deep hybrid sentiment analysis framework. Multimedia Tools and Applications 1–24

  20. Xue X, Zhang C, Niu Z, Wu X (2022) Multi-level attention map network for multimodal sentiment analysis. IEEE Trans Knowl Data Eng 35(5):5105–5118

    MATH  Google Scholar 

  21. Moon J, Kim J, Shin Y, Hwang S (2020) Confidence-aware learning for deep neural networks. In: International Conference on Machine Learning, pp. 7034–7044. PMLR

  22. Van Amersfoort J, Smith L, Teh YW, Gal Y (2020) Uncertainty estimation using a single deep deterministic neural network. In: International Conference on Machine Learning, pp. 9690–9700. PMLR

  23. Zadeh A, Chen M, Poria S, Cambria E, Morency LP (2017) Tensor fusion network for multimodal sentiment analysis. arXiv:1707.07250

  24. Hüllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach Learn 110(3):457–506

    Article  MathSciNet  MATH  Google Scholar 

  25. Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, Kruspe A, Triebel R, Jung P, Roscher R et al (2023) A survey of uncertainty in deep neural networks. Artif Intell Rev 56(Suppl 1):1513–1589

    Article  Google Scholar 

  26. Ivšinović J, Dinis MAP, Malvić T, Pleše D (2024) Application of the bootstrap method in low-sampled upper miocene sandstone hydrocarbon reservoirs: a case study. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects. 46(1):4474–4488

    Article  Google Scholar 

  27. Choi S, Lee K, Lim S, Oh S (2018) Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6915–6922. IEEE

  28. Zhang X, Ma Y (2023) An albert-based textcnn-hatt hybrid model enhanced with topic knowledge for sentiment analysis of sudden-onset disasters. Eng Appl Artif Intell 123:106136

    Article  MATH  Google Scholar 

  29. Ruz GA, Henríquez PA, Mascareño A (2020) Sentiment analysis of twitter data during critical events through bayesian networks classifiers. Futur Gener Comput Syst 106:92–104

    Article  Google Scholar 

  30. Najar F, Bouguila N (2022) Emotion recognition: A smoothed dirichlet multinomial solution. Eng Appl Artif Intell 107:104542

    Article  MATH  Google Scholar 

  31. Dempster AP (2008) Upper and lower probabilities induced by a multivalued mapping. In: Yager RR, Liu L (eds) Classic Works of the Dempster-Shafer Theory of Belief Functions. Springer, Berlin, Heidelberg, pp 57–72

    Chapter  MATH  Google Scholar 

  32. Wang X, Qin J (2024) Multimodal recommendation algorithm based on dempster-shafer evidence theory. Multimedia Tools and Applications. 83(10):28689–28704

    Article  MATH  Google Scholar 

  33. Xie Z, Yang Y, Wang J, Liu X, Li X (2024) Trustworthy multimodal fusion for sentiment analysis in ordinal sentiment space. IEEE Transactions on Circuits and Systems for Video Technology

  34. Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on dempster-shafer theory and deep learning. Neurocomputing 450:275–293

    Article  MATH  Google Scholar 

  35. Jsang A (2018) Subjective Logic: A Formalism for Reasoning Under Uncertainty. Springer, New York

    MATH  Google Scholar 

  36. Esposito C, Galli A, Moscato V, Sperlí G (2022) Multi-criteria assessment of user trust in social reviewing systems with subjective logic fusion. Information Fusion. 77:1–18

    Article  Google Scholar 

  37. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  38. Sun Z, Sarma P, Sethares W, Liang Y (2020) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:8992–8999

  39. Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10790–10797

  40. Han Z, Zhang C, Fu H, Zhou JT (2022) Trusted multi-view classification with dynamic evidential fusion. IEEE Trans Pattern Anal Mach Intell 45(2):2551–2566

    Article  MATH  Google Scholar 

  41. Yang X, Yang X, Yang J, Ming Q, Wang W, Tian Q, Yan J (2021) Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Adv Neural Inf Process Syst 34:18381–18394

    MATH  Google Scholar 

  42. Zadeh A, Zellers R, Pincus E, Morency LP (2016) Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv:1606.06259

  43. Zadeh AB, Liang PP, Poria S, Cambria E, Morency LP (2018) Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2236–2246

  44. Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency LP (2018) Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

  45. Tsai YHH, Liang PP, Zadeh A, Morency LP, Salakhutdinov R (2018) Learning factorized multimodal representations. arXiv:1806.06176

  46. Han W, Chen H, Poria S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv:2109.00412

  47. Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) Covarep—a collaborative voice analysis repository for speech technologies. In: 2014 Ieee International Conference on Acoustics, Speech and Signal Processing (icassp), pp. 960–964. IEEE

  48. Guo X, Kong AW-K, Kot A (2022) Deep multimodal sequence fusion by regularized expressive representation distillation. IEEE Trans Multimedia 25:2085–2096

    Article  Google Scholar 

  49. Yu Y, Lado A, Zhang Y, Magnotti JF, Beauchamp MS (2024) Synthetic faces generated with the facial action coding system or deep neural networks improve speech-in-noise perception, but not as much as real faces. Front Neurosci 18:1379988

    Article  Google Scholar 

  50. Tsai YHH, Bai S, Liang PP, Kolter JZ, Morency LP, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, vol. 2019, p. 6558. NIH Public Access

  51. Hazarika D, Zimmermann R, Poria S (2020) Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131

  52. Sun H, Wang H, Liu J, Chen YW, Lin L (2022) Cubemlp: An mlp-based model for multimodal sentiment analysis and depression estimation. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3722–3729

  53. Wang D, Guo X, Tian Y, Liu J, He L, Luo X (2023) Tetfn: A text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recogn 136:109259

    Article  Google Scholar 

  54. Yin G, Liu Y, Liu T, Zhang H, Fang F, Tang C, Jiang L (2024) Token-disentangling mutual transformer for multimodal emotion recognition. Eng Appl Artif Intell 133:108348

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge financial support from the Applied Research Project of Yuncheng University (Grant No. YY-202312, 2023). We would also like to express our sincere gratitude to Dr. Miao Xia Chen for her valuable contributions to this work.

Author information

Authors and Affiliations

Authors

Contributions

[Shuai Wang] contributed to conceptualization, methodology, data collection and analysis, writing the original draft, review, and editing, as well as funding acquisition. [K. Ratnavelu] and [Abdul Samad Bin Shibghatullah] provided project supervision, administration, and essential resources. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Shuai Wang.

Ethics declarations

Competing Interests

The authors state that they have no conflict of interest.

Ethical and informed consent for data used

The authors employ open-source datasets that are devoid of ethical issues. These datasets are publicly accessible at CMU-MOSI and CMU-MOSEI.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Ratnavelu, K. & Bin Shibghatullah, A.S. UEFN: Efficient uncertainty estimation fusion network for reliable multimodal sentiment analysis. Appl Intell 55, 171 (2025). https://doi.org/10.1007/s10489-024-06113-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06113-6

Keywords