Abstract
Text-based sentiment analysis is a popular application of artificial intelligence that has benefited in the past decade from the growth of digital social networks and its almost unlimited amount of data. Currently, social network users can combine different types of information in a single post, such as images, videos, GIFs, and live streams. As a result, they can express more complex thoughts and opinions. The goal of our study is to analyze the impact that incorporating different types of multimodal information may have on social media sentiment analysis. In particular, we give special attention to the interaction between text messages and images with and without text captions. To study this interaction we first create a new dataset in Spanish that contains tweets with images. Afterwards, we manually label several sentiments for each tweet, as follows: the overall tweet sentiment, the sentiment of the text, the sentiment of the individual images, the sentiment of the caption, if present, and—in cases where a single tweet has several images—the aggregate sentiment of all images present in the tweet. We conclude that incorporating visual information into text-based sentiment analysis raises the performance of the classifiers that determine the overall sentiment of a tweet by an average of 25.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The dataset can be downloaded here: https://github.com/lzun/mssaid.
- 3.
- 4.
Note that in this paper we used as examples images of our own authoring instead of the ones contained within the dataset to avoid any violations of the original authors’ copyright.
- 5.
- 6.
- 7.
References
Abdu, S.A., Yousef, A.H., Salem, A.: Multimodal video sentiment analysis using deep learning approaches, a survey. Inf. Fusion 76, 204–226 (2021)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8), 1157–1166 (1997). https://www.sciencedirect.com/science/article/pii/S0169755297000317, papers from the Sixth International World Wide Web Conference
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Chandrasekaran, G., Nguyen, T.N., D., J.H.: Multimodal sentiment analysis for social media applications: a comprehensive review. WIREs Data Min. Knowl. Discov. 11(5) (2021)
Chen, L., Huang, T., Miyasato, T., Nakatsu, R.: Multimodal human emotion/expression recognition. In: Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 366–371 (1998)
Datcu, D., Rothkrantz, L.J.M.: Semantic audio-visual data fusion for automatic emotion recognition. Euromedia (2008)
Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2(4), 42–47 (2012)
Guibon, G., Ochs, M., Bellot, P.: From emojis to sentiment analysis. In: WACAI 2016. Lab-STICC and ENIB and LITIS, Brest, France (2016). https://hal-amu.archives-ouvertes.fr/hal-01529708
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classication. National Taiwan University, Tech. rep. (2016)
Kumar, A., Garg, G.: Sentiment analysis of multimodal twitter data. Multimedia Tool. Appl. 78(17), 24103–24119 (2019). https://doi.org/10.1007/s11042-019-7390-1
Liu, B., et al.: Context-aware social media user sentiment analysis. Tsinghua Sci. Technol. 25(4), 528–541 (2020)
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual emotion recognition using gaussian mixture models for face and voice, pp. 250–257 (2008)
Oliveira, N., Cortez, P., Areal, N.: Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis. Support Syst. 85, 62–73 (2016)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 79–86. Association for Computational Linguistics (2002). https://aclanthology.org/W02-1011
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. Association for Computational Linguistics, pp. 2539–2544 (2015). https://www.aclweb.org/anthology/D15-1303
Poria, S., Cambria, E., Hazarika, D., Mazumder, N., Zadeh, A., Morency, L.P.: Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 873–883 (2017)
Poria, S., Majumder, N., Hazarika, D., Cambria, E., Gelbukh, A., Hussain, A.: Multimodal sentiment analysis: Addressing key issues and setting up the baselines (2018)
Pérez-Rosas, V., Mihalcea, R., Morency, L.P.: Utterance-level multimodal sentiment analysis. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 973–982 (2013)
Rajagopalan, S.S., Morency, L.-P., Baltrus̆aitis, T., Goecke, R.: Extending long short-term memory for multi-view structured learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 338–353. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_21
Rajaraman, A., Ullman, J.D.: Data Mining, pp. 1–17. Cambridge University Press, Cambridge (2011)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection (2015)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)
Rodrigues, A.P., et al.: Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques. Computat. Intell. Neurosci. (2022)
Silva, L.D., Miyasato, T., Nakatsu, R.: Facial emotion recognition using multi-modal information, pp. 397–401. IEEE (1997)
Vapnik, V., Cortes, C.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Van der Walt, S., et al.: The Scikit-image contributors: Scikit-image: image processing in Python. PeerJ 2, e453 (2014). https://doi.org/10.7717/peerj.453
Wiggins, B.E.: The discursive power of memes in digital culture: ideology, semiotics, and intertextuality. Routledge, 1st edn. (2019)
Wöllmer, M., et al.: Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28, 46–53 (2013)
Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31, 82–88 (2016)
Acknowledgements
This work was supported by the Universidad Iberoamericana Ciudad de México and the Institute of Applied Research and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zúñiga-Morales, L.N., González-Ordiano, J.Á., Quiroz-Ibarra, J., Simske, S.J. (2022). Impact Evaluation of Multimodal Information on Sentiment Analysis. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-19496-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19495-5
Online ISBN: 978-3-031-19496-2
eBook Packages: Computer ScienceComputer Science (R0)