Abstract
It is possible to determine people's feelings and opinions about a subject or product from social media posts via sentiment analysis. With the pervasive usage of the Internet and smart devices, the data produced daily by users can be in different modalities such as text, image, audio, and video. Multimodal sentiment analysis is to reveal the feeling of the user's posts by analyzing the data in different modalities as a whole. One of the major challenges of multimodal sentiment is how the sentiment obtained on different modalities is combined to ensure sentiment and meaning integrity of the post. Also, many studies use the same classifying methods to analyze different modalities. In fact, each classifier can be effective in different feature sets. In this study, a soft voting-based ensemble model is proposed that takes advantage of the effective performance of different classifiers on different modalities. In the proposed model, deep feature extraction was made with deep learning methods (BiLSTM, CNN) from the multimodal datasets. After the feature selection was conducted on the features which are a fusion of text and image features, the final feature sets were classified with the soft voting-based ensemble learning model. The performance of the proposed model has been tested on two different benchmark datasets consisting of text–image pairs. As a result of the experimental studies, it was revealed that the proposed model outperformed multiple adversary models on the same datasets.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li Z, Fan Y, Jiang B, Lei T, Liu W (2019) A survey on sentiment analysis and opinion mining for social multimedia. Multimed Tools Appl 78(6):6939–6967. https://doi.org/10.1007/s11042-018-6445-z
Yang X, Feng S, Wang D, Zhang Y (2020) Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.3035277
Soleymani M, Garcia D, Jou B, Schuller B, Chang SF, Pantic M (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14. https://doi.org/10.1016/j.imavis.2017.08.003
Xu N, Mao W (2017) A residual merged neutral network for multimodal sentiment analysis. In: 2017 IEEE 2nd ınternational conference on big data analysis, ICBDA 2017, pp 6–10. https://doi.org/10.1109/ICBDA.2017.8078794
Poria S, Majumder N, Hazarika D, Cambria E, Gelbukh A, Hussain A (2018) Multimodal sentiment analysis: addressing key issues and setting up the baselines. IEEE Intell Syst 33(6):17–25. https://doi.org/10.1109/MIS.2018.2882362
Poria S, Chaturvedi I, Cambria E, Hussain A (2017) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: Proceedings—IEEE ınternational conference on data mining, ICDM, pp 439–448. https://doi.org/10.1109/ICDM.2016.178
Poria S, Cambria E, Howard N, Bin Huang G, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59. https://doi.org/10.1016/j.neucom.2015.01.095
Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. MultiMed Model:15–27
Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image–text sentiment analysis via deep multimodal attentive fusion. Knowl-Based Syst 167:26–37. https://doi.org/10.1016/j.knosys.2019.01.019
Majumder N, Hazarika D, Gelbukh A, Cambria E, Poria S (2018) Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl-Based Syst 161:124–133. https://doi.org/10.1016/j.knosys.2018.07.041
Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1103–1114. https://doi.org/10.18653/v1/D17-1115
Ma H, Wang J, Qian L, Lin H (2020) HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation. Neural Comput Appl 33(7):2685–2703. https://doi.org/10.1007/s00521-020-05063-7
Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, pp 2539–2544. https://doi.org/10.18653/v1/d15-1303
Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10(8):2057–2070. https://doi.org/10.1007/s13042-017-0734-0
Chen F, Gao Y, Cao D, Ji R (2015) Multimodal hypergraph learning for microblog sentiment prediction. In: Proceedings—IEEE ınternational conference on multimedia and expo, 2015, vol 2015. https://doi.org/10.1109/ICME.2015.7177477
Cao D, Ji R, Lin D, Li S (2016) A cross-media public sentiment analysis system for microblog. Multimed Syst 22(4):479–486. https://doi.org/10.1007/s00530-014-0407-8
Xu N (2017) Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: 2017 IEEE ınternational conference on ıntelligence and security ınformatics: security and big data, ISI 2017, pp 152–154. https://doi.org/10.1109/ISI.2017.8004895
Xu N, Mao W (2017) MultiSentiNet: a deep semantic network for multimodal sentiment analysis. In: International Conference on Information and Knowledge Management, Proceedings, vol Part F1318, pp 2399–2402. https://doi.org/10.1145/3132847.3133142
Xu N, Mao W, Chen G (2018) A co-memory network for multimodal sentiment analysis. In: The 41st ınternational ACM SIGIR conference on research & development in ınformation retrieval—SIGIR ’18, pp 929–932. https://doi.org/10.1145/3209978.3210093
Kumar A, Garg G (2019) Sentiment analysis of multimodal twitter data. Multimed Tools Appl 78(17):24103–24119. https://doi.org/10.1007/s11042-019-7390-1
Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. Proc AAAI Conf Artif Intell 33:371–378. https://doi.org/10.1609/aaai.v33i01.3301371
Chen F, Ji R, Su J, Cao D, Gao Y (2018) Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans Multimed 20(4):997–1007. https://doi.org/10.1109/TMM.2017.2757769
Huddar MG, Sannakki SS, Rajpurohit VS (2018) An ensemble approach to utterance level multimodal sentiment analysis. In: 2018 ınternational conference on computational techniques, electronics and mechanical systems (CTEMS), pp 145–150. https://doi.org/10.1109/CTEMS.2018.8769162
Tran H-N, Cambria E (2018) Ensemble application of ELM and GPU for real-time multimodal sentiment analysis. Memetic Comput 10(1):3–13. https://doi.org/10.1007/s12293-017-0228-3
Jiang T, Wang J, Liu Z, Ling Y (2020) Fusion-extraction network for multimodal sentiment analysis. In: Lauw HW, Wong RC-W, Ntoulas A, Lim E-P, Ng S-K, Pan SJ (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 785–797
Huddar MG, Sannakki SS, Rajpurohit VS (2021) Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10285-x
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol 2, pp 427–431. https://doi.org/10.18653/v1/e17-2068
Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp 10691–10700
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database, pp 248–255. https://doi.org/10.1109/cvprw.2009.5206848
Jianqiang Z (2015) Pre-processing boosting twitter sentiment analysis?. In: 2015 IEEE ınternational conference on smart city/socialcom/sustaincom (SmartCity), pp 748–753. https://doi.org/10.1109/SmartCity.2015.158
Yahi N, Belhadef H (2020) Morphosyntactic preprocessing ımpact on document embedding: an empirical study on semantic similarity. Emerg Trends Intell Comput Inform:118–126
Salur MU, Aydın I (2018) The ımpact of preprocessing on classification performance in convolutional neural networks for turkish text. In: 2018 ınternational conference on artificial ıntelligence and data processing (IDAP), pp 1–4. https://doi.org/10.1109/IDAP.2018.8620722
Salur MU, Aydin I (2020) A novel hybrid deep learning model for sentiment classification. IEEE Access 8:58080–58093. https://doi.org/10.1109/ACCESS.2020.2982538
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks, pp 270–279
Robnık Sıkonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and Rrelieff. Mach Learn 53:23–69
Huddar MG, Sannakki SS, Rajpurohit VS (2020) Multi-level feature optimization and multimodal contextual fusion for sentiment analysis and emotion classification. Comput Intell 36(2):861–881. https://doi.org/10.1111/coin.12274
Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Front Comput Sci 14(2):241–258. https://doi.org/10.1007/s11704-019-8208-z
Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. Prepr. https://arxiv.org/abs/1810.04805
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Salur, M.U., Aydın, İ. A soft voting ensemble learning-based approach for multimodal sentiment analysis. Neural Comput & Applic 34, 18391–18406 (2022). https://doi.org/10.1007/s00521-022-07451-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07451-7