Abstract
This study investigates the challenging problem of automatically providing sentiment labels for training and testing multimodal data containing both image and textual information for supervised machine learning. Because both the image and text components, individually and collectively, convey sentiment, assessing the sentiment of multimodal data typically requires both image and text information. Consequently, the majority of studies classify sentiment by combining image and text features (‘Image+Text-features’). In this study, we propose ‘Combined-Text-Features’ that incorporate the object names and attributes identified in an image, as well as any accompanying superimposed or captioned text of that image, and utilize these text features to classify the sentiment of multimodal data. Inspired by our prior research, we employ the Afinn labelling method to automatically provide sentiment labels to the ‘Combined-Text-Features’. We test whether classifier models, using these ‘Combined-Text-Features’ with the Afinn labelling, can provide comparable results as when using other multimodal features and other labelling (human labelling). CNN, BiLSTM, and BERT models are used for the experiments on two multimodal datasets. The experimental results demonstrate the usefulness of the ‘Combined-Text-Features’ as a representation for multimodal data for the sentiment classification task. The results also suggest that the Afinn labelling approach can be a feasible alternative to human labelling for providing sentiment labels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC, vol. 1, p. 3 (2016)
Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Coling 2010: Posters, pp. 36–44 (2010)
Biswas, S., Young, K., Griffith, J.: A comparison of automatic labelling approaches for sentiment analysis. In: Proceedings of the 11th International Conference on Data Science, Technology and Applications, DATA, Portugal, pp. 312–319 (2022)
Cambria, E., Poria, S., Bisio, F., Bajpai, R., Chaturvedi, I.: The CLSA model: a novel framework for concept-level sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 3–22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_1
Camgözlü, Y., Kutlu, Y.: Analysis of filter size effect in deep learning. arXiv preprint arXiv:2101.01115 (2020)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chakraborty, K., Bhatia, S., Bhattacharyya, S., Platos, J., Bag, R., Hassanien, A.E.: Sentiment analysis of COVID-19 tweets by deep learning classifiers-a study to show how popularity is affecting accuracy in social media. Appl. Soft Comput. 97, 106754 (2020)
Chen, F., Ji, R., Su, J., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimedia 20(4), 997–1007 (2017)
Chen, M., Wang, S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 163–171 (2017)
Deepa, D., Tamilarasi, A., et al.: Sentiment analysis using feature extraction and dictionary-based approaches. In: 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790. IEEE (2019)
Deriu, J.M., Gonzenbach, M., Uzdilli, F., Lucchi, A., De Luca, V., Jaggi, M.: SwissCheese at SemEval-2016 task 4: sentiment classification using an ensemble of convolutional neural networks with distant supervision. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1124–1128 (2016)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dimitrakakis, C., Savu-Krohn, C.: Cost-minimising strategies for data labelling: optimal stopping and active learning. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 96–111. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77684-0_9
Druzhkov, P., Kustikova, V.: A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal. 26(1), 9–15 (2016)
Felicetti, A., Martini, M., Paolanti, M., Pierdicca, R., Frontoni, E., Zingaretti, P.: Visual and textual sentiment analysis of daily news social media images by deep learning. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11751, pp. 477–487. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30642-7_43
Ghorbanali, A., Sohrabi, M.K., Yaghmaee, F.: Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks. Inf. Process. Manag. 59(3), 102929 (2022)
Hasan, A., Moin, S., Karim, A., Shamshirband, S.: Machine learning-based sentiment analysis for twitter accounts. Math. Comput. Appl. 23(1), 11 (2018)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Huang, P.Y., Liu, F., Shiang, S.R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 639–645 (2016)
Huang, Q., Chen, R., Zheng, X., Dong, Z.: Deep sentiment representation based on CNN and LSTM. In: 2017 International Conference on Green Informatics (ICGI), pp. 30–33. IEEE (2017)
Kim, Y., et al.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Krishna, R., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)
Li, X., Chen, M.: Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds.) CCL 2020. LNCS (LNAI), vol. 12522, pp. 359–373. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63031-7_26
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lindstrom, P., Delany, S.J., Mac Namee, B.: Handling concept drift in a text data stream constrained by high labelling cost. In: Twenty-Third International FLAIRS Conference (2010)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)
Nielsen, F.Å.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011)
Niu T., Zhu, S., Pang, L., Saddik, A.El: Sentiment analysis on multi-view social data. In: MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22, PP. 15–27 (2016) Springer
Ortis, A., Farinella, G.M., Torrisi, G., Battiato, S.: Exploiting objective text description of images for visual sentiment analysis. Multimedia Tools Appl. 80(15), 22323–22346 (2021)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint CS/0205070 (2002)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., Morency, L.P.: Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 873–883 (2017)
Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448. IEEE (2016)
Ridnik, T., Ben-Baruch, E., Noy, A., Zelnik-Manor, L.: Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021)
Saad, E., et al.: Determining the efficiency of drugs under special conditions from users’ reviews on healthcare web forums. IEEE Access 9, 85721–85737 (2021)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast-but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 254–263 (2008)
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Tan, H., Bansal, M.: Lxmert: learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019)
Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)
Turney, P.D.: Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. Artif. Intell. Res. 2, 369–409 (1994)
Wadera, M., Mathur, M., Vishwakarma, D.K.: Sentiment analysis of tweets-a comparison of classifiers on live stream of twitter. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 968–972. IEEE (2020)
Wang, D., Xiong, D.: Efficient object-level visual context modeling for multimodal machine translation: masking irrelevant objects helps grounding. In: AAAI, pp. 2720–2728 (2021)
Wang, M., Cao, D., Li, L., Li, S., Ji, R.: Microblog sentiment analysis based on cross-media bag-of-words model. In: Proceedings of International Conference on Internet Multimedia Computing and Service, pp. 76–80 (2014)
Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)
Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J., Ruvolo, P.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, vol. 22 (2009)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)
Xu, N., Mao, W.: Multisentinet: a deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2399–2402 (2017)
Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932 (2018)
Xue, X., Zhang, C., Niu, Z., Wu, X.: Multi-level attention map network for multimodal sentiment analysis. IEEE Trans. Knowl. Data Eng. (2022)
Yang, J., She, D., Sun, M., Cheng, M.M., Rosin, P.L., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimedia 20(9), 2513–2525 (2018)
Yang, X., Feng, S., Wang, D., Zhang, Y.: Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans. Multimedia 23, 4014–4026 (2020)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 684–699 (2018)
Yoon, J., Kim, H.: Multi-channel lexicon integrated CNN-BiLSTM models for sentiment analysis. In: Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017), pp. 244–253 (2017)
You, Q., Cao, L., Jin, H., Luo, J.: Robust visual-textual sentiment analysis: when attention meets tree-structured recursive neural networks. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1008–1017 (2016)
You, Q., Luo, J., Jin, H., Yang, J.: Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 13–22 (2016)
Zhao, Z., et al.: An image-text consistency driven multimodal sentiment analysis approach for social media. Inf. Process. Manag. 56(6), 102097 (2019)
Zhou, L., Palangi, H., Zhang, L., Hu, H., Corso, J., Gao, J.: Unified vision-language pre-training for image captioning and VQA. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13041–13049 (2020)
Zhu, T., Li, L., Yang, J., Zhao, S., Liu, H., Qian, J.: Multimodal sentiment analysis with image-text interaction network. IEEE Trans. Multimedia (2022)
Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)
Acknowledgements
This work was supported by the College of Engineering, University of Galway, Ireland.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Biswas, S., Young, K., Griffith, J. (2023). Automatic Sentiment Labelling of Multimodal Data. In: Cuzzocrea, A., Gusikhin, O., Hammoudi, S., Quix, C. (eds) Data Management Technologies and Applications. DATA DATA 2022 2021. Communications in Computer and Information Science, vol 1860. Springer, Cham. https://doi.org/10.1007/978-3-031-37890-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-37890-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37889-8
Online ISBN: 978-3-031-37890-4
eBook Packages: Computer ScienceComputer Science (R0)