Skip to main content

Automatic Sentiment Labelling of Multimodal Data

  • Conference paper
  • First Online:
Data Management Technologies and Applications (DATA 2022, DATA 2021)

Abstract

This study investigates the challenging problem of automatically providing sentiment labels for training and testing multimodal data containing both image and textual information for supervised machine learning. Because both the image and text components, individually and collectively, convey sentiment, assessing the sentiment of multimodal data typically requires both image and text information. Consequently, the majority of studies classify sentiment by combining image and text features (‘Image+Text-features’). In this study, we propose ‘Combined-Text-Features’ that incorporate the object names and attributes identified in an image, as well as any accompanying superimposed or captioned text of that image, and utilize these text features to classify the sentiment of multimodal data. Inspired by our prior research, we employ the Afinn labelling method to automatically provide sentiment labels to the ‘Combined-Text-Features’. We test whether classifier models, using these ‘Combined-Text-Features’ with the Afinn labelling, can provide comparable results as when using other multimodal features and other labelling (human labelling). CNN, BiLSTM, and BERT models are used for the experiments on two multimodal datasets. The experimental results demonstrate the usefulness of the ‘Combined-Text-Features’ as a representation for multimodal data for the sentiment classification task. The results also suggest that the Afinn labelling approach can be a feasible alternative to human labelling for providing sentiment labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)

    Google Scholar 

  2. Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC, vol. 1, p. 3 (2016)

    Google Scholar 

  3. Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Coling 2010: Posters, pp. 36–44 (2010)

    Google Scholar 

  4. Biswas, S., Young, K., Griffith, J.: A comparison of automatic labelling approaches for sentiment analysis. In: Proceedings of the 11th International Conference on Data Science, Technology and Applications, DATA, Portugal, pp. 312–319 (2022)

    Google Scholar 

  5. Cambria, E., Poria, S., Bisio, F., Bajpai, R., Chaturvedi, I.: The CLSA model: a novel framework for concept-level sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 3–22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_1

    Chapter  Google Scholar 

  6. Camgözlü, Y., Kutlu, Y.: Analysis of filter size effect in deep learning. arXiv preprint arXiv:2101.01115 (2020)

  7. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  8. Chakraborty, K., Bhatia, S., Bhattacharyya, S., Platos, J., Bag, R., Hassanien, A.E.: Sentiment analysis of COVID-19 tweets by deep learning classifiers-a study to show how popularity is affecting accuracy in social media. Appl. Soft Comput. 97, 106754 (2020)

    Article  Google Scholar 

  9. Chen, F., Ji, R., Su, J., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimedia 20(4), 997–1007 (2017)

    Article  Google Scholar 

  10. Chen, M., Wang, S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 163–171 (2017)

    Google Scholar 

  11. Deepa, D., Tamilarasi, A., et al.: Sentiment analysis using feature extraction and dictionary-based approaches. In: 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 786–790. IEEE (2019)

    Google Scholar 

  12. Deriu, J.M., Gonzenbach, M., Uzdilli, F., Lucchi, A., De Luca, V., Jaggi, M.: SwissCheese at SemEval-2016 task 4: sentiment classification using an ensemble of convolutional neural networks with distant supervision. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 1124–1128 (2016)

    Google Scholar 

  13. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  14. Dimitrakakis, C., Savu-Krohn, C.: Cost-minimising strategies for data labelling: optimal stopping and active learning. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 96–111. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77684-0_9

    Chapter  MATH  Google Scholar 

  15. Druzhkov, P., Kustikova, V.: A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal. 26(1), 9–15 (2016)

    Article  Google Scholar 

  16. Felicetti, A., Martini, M., Paolanti, M., Pierdicca, R., Frontoni, E., Zingaretti, P.: Visual and textual sentiment analysis of daily news social media images by deep learning. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11751, pp. 477–487. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30642-7_43

    Chapter  Google Scholar 

  17. Ghorbanali, A., Sohrabi, M.K., Yaghmaee, F.: Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks. Inf. Process. Manag. 59(3), 102929 (2022)

    Article  Google Scholar 

  18. Hasan, A., Moin, S., Karim, A., Shamshirband, S.: Machine learning-based sentiment analysis for twitter accounts. Math. Comput. Appl. 23(1), 11 (2018)

    Google Scholar 

  19. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)

    Google Scholar 

  20. Huang, P.Y., Liu, F., Shiang, S.R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 639–645 (2016)

    Google Scholar 

  21. Huang, Q., Chen, R., Zheng, X., Dong, Z.: Deep sentiment representation based on CNN and LSTM. In: 2017 International Conference on Green Informatics (ICGI), pp. 30–33. IEEE (2017)

    Google Scholar 

  22. Kim, Y., et al.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  23. Krishna, R., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  24. Li, X., Chen, M.: Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds.) CCL 2020. LNCS (LNAI), vol. 12522, pp. 359–373. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63031-7_26

    Chapter  Google Scholar 

  25. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  26. Lindstrom, P., Delany, S.J., Mac Namee, B.: Handling concept drift in a text data stream constrained by high labelling cost. In: Twenty-Third International FLAIRS Conference (2010)

    Google Scholar 

  27. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  MathSciNet  Google Scholar 

  28. Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)

    Google Scholar 

  29. Nielsen, F.Å.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903 (2011)

  30. Niu T., Zhu, S., Pang, L., Saddik, A.El: Sentiment analysis on multi-view social data. In: MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22, PP. 15–27 (2016) Springer

    Google Scholar 

  31. Ortis, A., Farinella, G.M., Torrisi, G., Battiato, S.: Exploiting objective text description of images for visual sentiment analysis. Multimedia Tools Appl. 80(15), 22323–22346 (2021)

    Article  Google Scholar 

  32. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint CS/0205070 (2002)

    Google Scholar 

  33. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  34. Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., Morency, L.P.: Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 873–883 (2017)

    Google Scholar 

  35. Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448. IEEE (2016)

    Google Scholar 

  36. Ridnik, T., Ben-Baruch, E., Noy, A., Zelnik-Manor, L.: Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021)

  37. Saad, E., et al.: Determining the efficiency of drugs under special conditions from users’ reviews on healthcare web forums. IEEE Access 9, 85721–85737 (2021)

    Article  Google Scholar 

  38. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast-but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 254–263 (2008)

    Google Scholar 

  39. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

    Article  Google Scholar 

  40. Tan, H., Bansal, M.: Lxmert: learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019)

    Google Scholar 

  41. Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)

    Article  Google Scholar 

  42. Turney, P.D.: Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J. Artif. Intell. Res. 2, 369–409 (1994)

    Article  Google Scholar 

  43. Wadera, M., Mathur, M., Vishwakarma, D.K.: Sentiment analysis of tweets-a comparison of classifiers on live stream of twitter. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 968–972. IEEE (2020)

    Google Scholar 

  44. Wang, D., Xiong, D.: Efficient object-level visual context modeling for multimodal machine translation: masking irrelevant objects helps grounding. In: AAAI, pp. 2720–2728 (2021)

    Google Scholar 

  45. Wang, M., Cao, D., Li, L., Li, S., Ji, R.: Microblog sentiment analysis based on cross-media bag-of-words model. In: Proceedings of International Conference on Internet Multimedia Computing and Service, pp. 76–80 (2014)

    Google Scholar 

  46. Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)

    Google Scholar 

  47. Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J., Ruvolo, P.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, vol. 22 (2009)

    Google Scholar 

  48. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  49. Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)

    Article  Google Scholar 

  50. Xu, N., Mao, W.: Multisentinet: a deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2399–2402 (2017)

    Google Scholar 

  51. Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932 (2018)

    Google Scholar 

  52. Xue, X., Zhang, C., Niu, Z., Wu, X.: Multi-level attention map network for multimodal sentiment analysis. IEEE Trans. Knowl. Data Eng. (2022)

    Google Scholar 

  53. Yang, J., She, D., Sun, M., Cheng, M.M., Rosin, P.L., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimedia 20(9), 2513–2525 (2018)

    Article  Google Scholar 

  54. Yang, X., Feng, S., Wang, D., Zhang, Y.: Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans. Multimedia 23, 4014–4026 (2020)

    Article  Google Scholar 

  55. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  56. Yao, T., Pan, Y., Li, Y., Mei, T.: Exploring visual relationship for image captioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 684–699 (2018)

    Google Scholar 

  57. Yoon, J., Kim, H.: Multi-channel lexicon integrated CNN-BiLSTM models for sentiment analysis. In: Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017), pp. 244–253 (2017)

    Google Scholar 

  58. You, Q., Cao, L., Jin, H., Luo, J.: Robust visual-textual sentiment analysis: when attention meets tree-structured recursive neural networks. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1008–1017 (2016)

    Google Scholar 

  59. You, Q., Luo, J., Jin, H., Yang, J.: Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 13–22 (2016)

    Google Scholar 

  60. Zhao, Z., et al.: An image-text consistency driven multimodal sentiment analysis approach for social media. Inf. Process. Manag. 56(6), 102097 (2019)

    Article  Google Scholar 

  61. Zhou, L., Palangi, H., Zhang, L., Hu, H., Corso, J., Gao, J.: Unified vision-language pre-training for image captioning and VQA. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13041–13049 (2020)

    Google Scholar 

  62. Zhu, T., Li, L., Yang, J., Zhao, S., Liu, H., Qian, J.: Multimodal sentiment analysis with image-text interaction network. IEEE Trans. Multimedia (2022)

    Google Scholar 

  63. Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the College of Engineering, University of Galway, Ireland.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sumana Biswas or Josephine Griffith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Biswas, S., Young, K., Griffith, J. (2023). Automatic Sentiment Labelling of Multimodal Data. In: Cuzzocrea, A., Gusikhin, O., Hammoudi, S., Quix, C. (eds) Data Management Technologies and Applications. DATA DATA 2022 2021. Communications in Computer and Information Science, vol 1860. Springer, Cham. https://doi.org/10.1007/978-3-031-37890-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37890-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37889-8

  • Online ISBN: 978-3-031-37890-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics