Skip to main content
Log in

Image–Text Sentiment Analysis Via Context Guided Adaptive Fine-Tuning Transformer

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Compared with single-modal content, multimodal content conveys user’s sentiments and feelings more vividly. Thus, multimodal sentiment analysis has become a research hotspot. Due to the flawed data-hungry of deep learning-based methods, transfer learning is extensively utilized. However, most transfer learning-based approaches transfer the model pre-trained on source domain to target domain by simply considering it as feature extractor (i.e., parameters are frozen) or applying global fine-tuning strategy (i.e., parameters are trainable) on it. This results in the loss of advantages of both source and target domains. In this paper, we propose a novel Context Guided Adaptive Fine-tuning Transformer (CGAFT) that investigates the strengths of both source and target domains adaptively to achieve image–text sentiment analysis. In CGAFT, a Context Guided Policy Network is first introduced to make optimal weights for each image–text instance. These weights indicate how much image sentiment information is necessary to be absorbed from each layer of the image model pre-trained on source domain and the parallel model fine-tuned on target domain. Then, image–text instance and its weights are fed into Sentiment Analysis Network to extract contextual image sentiment representations that are absorbed from both source and target domains to enhance the performance of image–text sentiment analysis. Besides, we observe that no publicly available image–text dataset is in Chinese. To fill this gap, we build an image–Chinese text dataset Flickr-ICT that contains 13,874 image–Chinese text pairs. The experiments conducted on three image–text datasets demonstrate that CGAFT outperforms strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):1253

    Article  Google Scholar 

  2. Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowl Inf Syst 60(2):617–663

    Article  Google Scholar 

  3. Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews. In: Proceedings of the 21st national conference on artificial intelligence, vol 2, pp 1265–1270

  4. Wei W, Gulla JA (2010) Sentiment learning on product reviews via sentiment ontology tree. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 404–413

  5. Tang D, Qin B, Liu T (2015) Learning semantic representations of users and products for document level sentiment classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 1014–1023

  6. Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23

    Article  Google Scholar 

  7. Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 1354–1364

  8. Abdi A, Shamsuddin SM, Hasan S, Piran J (2019) Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf Process Manag 56(4):1245–1259

    Article  Google Scholar 

  9. Yue Y (2019) Scale adaptation of text sentiment analysis algorithm in big data environment: Twitter as data source. In: International conference on big data analytics for cyber-physical-systems. Springer, pp 629–634

  10. Li G, Zheng Q, Zhang L, Guo S, Niu L (2020) Sentiment information based model for Chinese text sentiment analysis. In: 2020 IEEE 3rd international conference on automation, electronics and electrical engineering (AUTEEE). IEEE, pp 366–371

  11. Kosti R, Alvarez JM, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1960–1968

  12. Rao T, Li X, Zhang H, Xu M (2019) Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333:429–439

    Article  Google Scholar 

  13. Mittal N, Sharma D, Joshi ML (2018) Image sentiment analysis using deep learning. In: 2018 IEEE/WIC/ACM international conference on web intelligence (WI), pp 684–687

  14. Ragusa E, Cambria E, Zunino R, Gastaldo P (2019) A survey on deep learning in image polarity detection: balancing generalization performances and computational costs. Electronics 8(7):66

    Article  Google Scholar 

  15. Kaur R, Kautish S (2019) Multimodal sentiment analysis: a survey and comparison. Int J Serv Sci Manag Eng Technol 10(2):38–58

    Google Scholar 

  16. Soleymani M, Garcia D, Jou B, Schuller B, Chang S-F, Pantic M (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14

    Article  Google Scholar 

  17. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML workshop on unsupervised and transfer learning. JMLR workshop and conference proceedings, pp 17–36

  18. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2021) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76

    Article  Google Scholar 

  19. Liu R, Shi Y, Ji C, Jia M (2019) A survey of sentiment analysis based on transfer learning. IEEE Access 7:85401–85412

    Article  Google Scholar 

  20. Li Z, Fan Y, Jiang B, Lei T, Liu W (2019) A survey on sentiment analysis and opinion mining for social multimedia. Multimed Tools Appl 78(6):6939–6967

    Article  Google Scholar 

  21. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255

  22. Hu A, Flaxman S (2018) Multimodal sentiment analysis to explore the structure of emotions. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 350–358

  23. Thuseethan S, Janarthan S, Rajasegarar S, Kumari P, Yearwood J (2020) Multimodal deep learning framework for sentiment analysis from text-image web data. In: 2020 IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology (WI-IAT), pp 267–274

  24. Basu P, Tiwari S, Mohanty J, Karmakar S (2020) Multimodal sentiment analysis of metoo tweets using focal loss (grand challenge). In: 2020 IEEE sixth international conference on multimedia big data (BigMM), pp 461–465

  25. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image–text sentiment analysis via deep multimodal attentive fusion. Knowl Based Syst 167:26–37

    Article  Google Scholar 

  26. Xu N, Mao W (2017) Multisentinet: a deep semantic network for multimodal sentiment analysis. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 2399–2402

  27. Yang X, Feng S, Wang D, Zhang Y (2021) Image–text multimodal emotion classification via multi-view attentional network. IEEE Trans Multimed 23:4014–4026

    Article  Google Scholar 

  28. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  29. Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  30. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (long and short papers), pp 4171–4186

  31. Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, Wang H (2020) Ernie 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8968–8975

  32. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  33. Kiela D, Bhooshan S, Firooz H, Perez E, Testuggine D (2019) Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950

  34. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Proceedings of the 27th international conference on neural information processing systems—volume 2, pp 3320–3328

  35. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312

    Article  Google Scholar 

  36. Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2016) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802

    Article  Google Scholar 

  37. Guo Y, Shi H, Kumar A, Grauman K, Rosing T, Feris R (2019) Spottune: transfer learning through adaptive fine-tuning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4800–4809

  38. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FEH, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 558–567

  39. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: International conference on multimedia modeling. Springer, pp 15–27

  40. Wu L, Qi M, Jian M, Zhang H (2019) Visual sentiment analysis by combining global and local information. Neural Process Lett 66:1–13

    Google Scholar 

  41. Ben Ahmed K, Bouhorma M, Ben Ahmed M, Radenski A (2016) Visual sentiment prediction with transfer learning and big data analytics for smart cities. In: 2016 4th IEEE international colloquium on information science and technology (CiSt), pp 800–805

  42. Li W, Dong X, Wang Y (2021) Human emotion recognition with relational region-level analysis. IEEE Trans Aff Comput 66:1–1

    Google Scholar 

  43. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  44. Zhou B, Lapedriza A, Torralba A, Oliva A (2017) Places: an image database for deep scene understanding. J Vis 17(10):296–296

    Article  Google Scholar 

  45. Zhang J, Chen M, Sun H, Li D, Wang Z (2020) Object semantics sentiment correlation analysis enhanced image sentiment classification. Knowl Based Syst 191:105245

    Article  Google Scholar 

  46. Zhang J, Liu X, Chen M, Ye Q, Wang Z (2021) Image sentiment classification via multi-level sentiment region correlation analysis. Neurocomputing 6:66

    Google Scholar 

  47. Sagnika S, Mishra BSP, Meher SK (2020) Improved method of word embedding for efficient analysis of human sentiments. Multimed Tools Appl 79(43):32389–32413

    Article  Google Scholar 

  48. Demotte P, Wijegunarathna K, Meedeniya D, Perera I (2021) Enhanced sentiment extraction architecture for social media content analysis using capsule networks. Multimed Tools Appl 66:1–26

    Google Scholar 

  49. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Red Hook, pp 6000–6010

  50. Kumar A, Gupta P, Balan R, Neti LBM, Malapati A (2021) Bert based semi-supervised hybrid approach for aspect and sentiment classification. Neural Process Lett 53(6):4207–4224

    Article  Google Scholar 

  51. Mehrdad F, Mohammad G, Marzieh F, Mohammad M (2021) Parsbert: transformer-based model for Persian language understanding. Neural Process Lett 53(4):3831–3847

    Google Scholar 

  52. Wang K, Wan X (2022) Counterfactual representation augmentation for cross-domain sentiment analysis. IEEE Trans Aff Comput 66:1–1

    Google Scholar 

  53. Guo H, Chi C, Zhan X (2021) Ernie-bilstm based Chinese text sentiment classification method. In: 2021 International conference on computer engineering and application (ICCEA), pp 84–88

  54. Liang B, Su H, Gui L, Cambria E, Xu R (2022) Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl Based Syst 235:107643

    Article  Google Scholar 

  55. Li R, Chen H, Feng F, Ma Z, Wang X, Hovy E (2021) Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pp 6319–6329

  56. Majumder N, Poria S, Peng H, Chhaya N, Cambria E, Gelbukh A (2019) Sentiment and sarcasm classification with multitask learning. IEEE Intell Syst 34(3):38–43

    Article  Google Scholar 

  57. Xu N (2017) Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: 2017 IEEE international conference on intelligence and security informatics (ISI), pp 152–154

  58. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9

  59. Seo S, Na S, Kim J (2020) Hmtl: heterogeneous modality transfer learning for audio-visual sentiment analysis. IEEE Access 8:140426–140437

    Article  Google Scholar 

  60. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition

  61. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: 2015 IEEE international conference on computer vision (ICCV), pp 1026–1034

  62. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Proceedings of the 27th international conference on neural information processing systems, vol 1, pp 487–495

  63. Xu N, Mao W, Chen G (2018) A co-memory network for multimodal sentiment analysis. In: The 41st international ACM SIGIR conference on research and development in information retrieval, pp 929–932

  64. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

  65. Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464

    Article  Google Scholar 

  66. Yu J, Chen K, Xia R (2022) Hierarchical interactive multimodal transformer for aspect-based multimodal sentiment analysis. IEEE Trans Aff Comput 66:1–1

    Google Scholar 

  67. Yang X, Feng S, Zhang Y, Wang D (2021) Multimodal sentiment detection based on multi-channel graph neural networks. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pp 328–339

  68. Liao W, Zeng B, Liu J, Wei P, Fang J (2022) Image–text interaction graph neural network for image–text sentiment analysis. Appl Intell 52:1–15

    Article  Google Scholar 

  69. Zhu T, Li L, Yang J, Zhao S, Liu H, Qian J (2022) Multimodal sentiment analysis with image–text interaction network. IEEE Trans Multimed 66:1–1

    Google Scholar 

  70. Han W, Chen H, Poria S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9180–9192

  71. Cambria E, Howard N, Hsu J, Hussain A (2013) Sentic blending: scalable multimodal fusion for the continuous interpretation of semantics and sentics. In: 2013 IEEE symposium on computational intelligence for human-like intelligence (CIHLI), pp 108–117

  72. Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 10790–10797

  73. Yang B, Wu L, Zhu J, Shao B, Lin X, Liu T-Y (2022) Multimodal sentiment analysis with two-phase multi-task learning. IEEE/ACM Trans Audio Speech Lang Process 30:2015–2024

    Article  Google Scholar 

  74. Jiang D, Wei R, Liu H, Wen J, Tu G, Zheng L, Cambria E (2021) A multitask learning framework for multimodal sentiment analysis. In: 2021 International conference on data mining workshops (ICDMW), pp 151–157

  75. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations

  76. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 9992–10002

  77. Wu K, Peng H, Chen M, Fu J, Chao H (2021) Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10033–10041

  78. Yang J, Sun M, Sun X (2017) Learning visual sentiment distributions via augmented conditional probability neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 224–230

  79. Borth D, Ji R, Chen T, Breuel T, Chang S-F (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on multimedia, pp 223–232

  80. Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia, pp 83–92

Download references

Acknowledgements

This work is supported in part by National Natural Science Foundation of China under Grants 61163019, 61271361, 61761046, U1802271, 61662087 and 62061049; Yunnan Science and Technology Department Project under Grant 2014FA021 and 2018FB100; Key Program of the Applied Basic Research Programs of Yunnan under Grant 202001BB050043 and 2019FA044; Major Special Science and Technology of Yunnan under Grant 202002AD080001; Reserve Talents for Yunnan Young and Middle-aged Academic and Technical Leaders under Grant 2019HB121.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyuan Pu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, X., Pu, Y., Zhao, Z. et al. Image–Text Sentiment Analysis Via Context Guided Adaptive Fine-Tuning Transformer. Neural Process Lett 55, 2103–2125 (2023). https://doi.org/10.1007/s11063-022-11124-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-11124-w

Keywords

Navigation