Abstract
The rapid evolution of the digital era has greatly transformed social media, resulting in more diverse emotional expressions and increasingly complex public discourse. Consequently, identifying relationships within multimodal data has become increasingly challenging. Most current multimodal sentiment analysis (MSA) methods concentrate on merging data from diverse modalities into an integrated feature representation to enhance recognition performance by leveraging the complementary nature of multimodal data. However, these approaches often overlook prediction reliability. To address this, we propose the uncertainty estimation fusion network (UEFN), a reliable MSA method based on uncertainty estimation. UEFN combines the Dirichlet distribution and Dempster-Shafer evidence theory (DSET) to predict the probability distribution and uncertainty of text, speech, and image modalities, fusing the predictions at the decision level. Specifically, the method first represents the contextual features of text, speech, and image modalities separately. It then employs a fully connected neural network to transform features from different modalities into evidence forms. Subsequently, it parameterizes the evidence of different modalities via the Dirichlet distribution and estimates the probability distribution and uncertainty for each modality. Finally, we use DSET to fuse the predictions, obtaining the sentiment analysis results and uncertainty estimation, referred to as the multimodal decision fusion layer (MDFL). Additionally, on the basis of the modality uncertainty generated by subjective logic theory, we calculate feature weights, apply them to the corresponding features, concatenate the weighted features, and feed them into a feedforward neural network for sentiment classification, forming the adaptive weight fusion layer (AWFL). Both MDFL and AWFL are then used for multitask training. Experimental comparisons demonstrate that the UEFN not only achieves excellent performance but also provides uncertainty estimation along with the predictions, enhancing the reliability and interpretability of the results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets supporting our research are publicly accessible at the following URLs: (https://paperswithcode.com/dataset/cmu-mosi) and (https://paperswithcode.com/dataset/cmu-mosei).
References
Zhou X, Liang W, Luo Z, Pan Y (2021) Periodic-aware intelligent prediction model for information diffusion in social networks. IEEE Transactions on Network Science and Engineering. 8(2):894–904
Wang S, Shibghatullah AS, Iqbal TJ, Keoy KH (2024) A review of multimodal-based emotion recognition techniques for cyberbullying detection in online social media platforms. Neural Computing and Applications 1–34
Lu Q, Sun X, Long Y, Gao Z, Feng J, Sun T (2023) Sentiment analysis: Comprehensive reviews, recent advances, and open challenges. IEEE Transactions on Neural Networks and Learning Systems
Singh U, Abhishek K, Azad HK (2024) A survey of cutting-edge multimodal sentiment analysis. ACM Comput Surv 56(9):1–38
Gandhi A, Adhvaryu K, Poria S, Cambria E, Hussain A (2023) Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion 91:424–444. https://doi.org/10.1016/j.inffus.2022.09.025
Zeng Y, Li Z, Chen Z, Ma H (2024) A feature-based restoration dynamic interaction network for multimodal sentiment analysis. Engineering Applications of Artificial Intelligence 127:107335. https://doi.org/10.1016/j.engappai.2023.107335
Liu Y, Zhang J (2024) Service function chain embedding meets machine learning: Deep reinforcement learning approach. IEEE Trans Netw Serv Manage 21(3):3465–3481. https://doi.org/10.1109/TNSM.2024.3353808
Zhang J, Liu Y, Ding G, Tang B, Chen Y (2024) Adaptive decomposition and extraction network of individual fingerprint features for specific emitter identification. IEEE Transactions on Information Forensics and Security 19:8515–8528. https://doi.org/10.1109/TIFS.2024.3427361
Zhang J, Liu Y, Ding G, Tang B, Chen Y (2024) Adaptive decomposition and extraction network of individual fingerprint features for specific emitter identification. IEEE Transactions on Information Forensics and Security 19:8515–8528. https://doi.org/10.1109/TIFS.2024.3427361
Xiao Z, Xing H, Qu R, Li H, Feng L, Zhao B, Yang J (2024) Self-bidirectional decoupled distillation for time series classification. IEEE Transactions on Artificial Intelligence. 5(8):4101–4110. https://doi.org/10.1109/TAI.2024.3360180
Xiao Z, Tong H, Qu R, Xing H, Luo S, Zhu Z, Song F, Feng L (2023) Capmatch: Semi-supervised contrastive transformer capsule with feature-based knowledge distillation for human activity recognition. IEEE Transactions on Neural Networks and Learning Systems 1–15. https://doi.org/10.1109/TNNLS.2023.3344294
Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR et al (2021) A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information fusion. 76:243–297
Olivier A, Shields MD, Graham-Brady L (2021) Bayesian neural networks for uncertainty quantification in data-driven materials modeling. Computer Methods in Applied Mechanics and Engineering 386:114079. https://doi.org/10.1016/j.cma.2021.114079
Xu C, Zhong P-A, Zhu F, Yang L, Wang S, Wang Y (2023) Real-time error correction for flood forecasting based on machine learning ensemble method and its uncertainty assessment. Stoch Env Res Risk Assess 37(4):1557–1577. https://doi.org/10.1007/s00477-022-02336-6
Alarab I, Prakoonwit S, Nacer MI (2021) Illustrative discussion of mc-dropout in general dataset: uncertainty estimation in bitcoin. Neural Process Lett 53(2):1001–1011. https://doi.org/10.1007/s11063-021-10424-x
Son J, Kang S (2023) Efficient improvement of classification accuracy via selective test-time augmentation. Information Sciences 642:119148. https://doi.org/10.1016/j.ins.2023.119148
Kaur R, Kautish S (2022) Multimodal sentiment analysis: A survey and comparison. Research anthology on implementing sentiment analysis across multiple disciplines 1846–1870
Dey RK, Das AK (2023) Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimedia Tools and Applications. 82(21):32967–32990
Dey RK, Das AK (2024) Neighbour adjusted dispersive flies optimization based deep hybrid sentiment analysis framework. Multimedia Tools and Applications 1–24
Xue X, Zhang C, Niu Z, Wu X (2022) Multi-level attention map network for multimodal sentiment analysis. IEEE Trans Knowl Data Eng 35(5):5105–5118
Moon J, Kim J, Shin Y, Hwang S (2020) Confidence-aware learning for deep neural networks. In: International Conference on Machine Learning, pp. 7034–7044. PMLR
Van Amersfoort J, Smith L, Teh YW, Gal Y (2020) Uncertainty estimation using a single deep deterministic neural network. In: International Conference on Machine Learning, pp. 9690–9700. PMLR
Zadeh A, Chen M, Poria S, Cambria E, Morency LP (2017) Tensor fusion network for multimodal sentiment analysis. arXiv:1707.07250
Hüllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach Learn 110(3):457–506
Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, Kruspe A, Triebel R, Jung P, Roscher R et al (2023) A survey of uncertainty in deep neural networks. Artif Intell Rev 56(Suppl 1):1513–1589
Ivšinović J, Dinis MAP, Malvić T, Pleše D (2024) Application of the bootstrap method in low-sampled upper miocene sandstone hydrocarbon reservoirs: a case study. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects. 46(1):4474–4488
Choi S, Lee K, Lim S, Oh S (2018) Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6915–6922. IEEE
Zhang X, Ma Y (2023) An albert-based textcnn-hatt hybrid model enhanced with topic knowledge for sentiment analysis of sudden-onset disasters. Eng Appl Artif Intell 123:106136
Ruz GA, Henríquez PA, Mascareño A (2020) Sentiment analysis of twitter data during critical events through bayesian networks classifiers. Futur Gener Comput Syst 106:92–104
Najar F, Bouguila N (2022) Emotion recognition: A smoothed dirichlet multinomial solution. Eng Appl Artif Intell 107:104542
Dempster AP (2008) Upper and lower probabilities induced by a multivalued mapping. In: Yager RR, Liu L (eds) Classic Works of the Dempster-Shafer Theory of Belief Functions. Springer, Berlin, Heidelberg, pp 57–72
Wang X, Qin J (2024) Multimodal recommendation algorithm based on dempster-shafer evidence theory. Multimedia Tools and Applications. 83(10):28689–28704
Xie Z, Yang Y, Wang J, Liu X, Li X (2024) Trustworthy multimodal fusion for sentiment analysis in ordinal sentiment space. IEEE Transactions on Circuits and Systems for Video Technology
Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on dempster-shafer theory and deep learning. Neurocomputing 450:275–293
Jsang A (2018) Subjective Logic: A Formalism for Reasoning Under Uncertainty. Springer, New York
Esposito C, Galli A, Moscato V, Sperlí G (2022) Multi-criteria assessment of user trust in social reviewing systems with subjective logic fusion. Information Fusion. 77:1–18
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Sun Z, Sarma P, Sethares W, Liang Y (2020) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:8992–8999
Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10790–10797
Han Z, Zhang C, Fu H, Zhou JT (2022) Trusted multi-view classification with dynamic evidential fusion. IEEE Trans Pattern Anal Mach Intell 45(2):2551–2566
Yang X, Yang X, Yang J, Ming Q, Wang W, Tian Q, Yan J (2021) Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Adv Neural Inf Process Syst 34:18381–18394
Zadeh A, Zellers R, Pincus E, Morency LP (2016) Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv:1606.06259
Zadeh AB, Liang PP, Poria S, Cambria E, Morency LP (2018) Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2236–2246
Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency LP (2018) Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Tsai YHH, Liang PP, Zadeh A, Morency LP, Salakhutdinov R (2018) Learning factorized multimodal representations. arXiv:1806.06176
Han W, Chen H, Poria S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv:2109.00412
Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) Covarep—a collaborative voice analysis repository for speech technologies. In: 2014 Ieee International Conference on Acoustics, Speech and Signal Processing (icassp), pp. 960–964. IEEE
Guo X, Kong AW-K, Kot A (2022) Deep multimodal sequence fusion by regularized expressive representation distillation. IEEE Trans Multimedia 25:2085–2096
Yu Y, Lado A, Zhang Y, Magnotti JF, Beauchamp MS (2024) Synthetic faces generated with the facial action coding system or deep neural networks improve speech-in-noise perception, but not as much as real faces. Front Neurosci 18:1379988
Tsai YHH, Bai S, Liang PP, Kolter JZ, Morency LP, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, vol. 2019, p. 6558. NIH Public Access
Hazarika D, Zimmermann R, Poria S (2020) Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131
Sun H, Wang H, Liu J, Chen YW, Lin L (2022) Cubemlp: An mlp-based model for multimodal sentiment analysis and depression estimation. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3722–3729
Wang D, Guo X, Tian Y, Liu J, He L, Luo X (2023) Tetfn: A text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recogn 136:109259
Yin G, Liu Y, Liu T, Zhang H, Fang F, Tang C, Jiang L (2024) Token-disentangling mutual transformer for multimodal emotion recognition. Eng Appl Artif Intell 133:108348
Acknowledgements
The authors gratefully acknowledge financial support from the Applied Research Project of Yuncheng University (Grant No. YY-202312, 2023). We would also like to express our sincere gratitude to Dr. Miao Xia Chen for her valuable contributions to this work.
Author information
Authors and Affiliations
Contributions
[Shuai Wang] contributed to conceptualization, methodology, data collection and analysis, writing the original draft, review, and editing, as well as funding acquisition. [K. Ratnavelu] and [Abdul Samad Bin Shibghatullah] provided project supervision, administration, and essential resources. All authors reviewed and approved the final manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, S., Ratnavelu, K. & Bin Shibghatullah, A.S. UEFN: Efficient uncertainty estimation fusion network for reliable multimodal sentiment analysis. Appl Intell 55, 171 (2025). https://doi.org/10.1007/s10489-024-06113-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06113-6