SFVE: visual information enhancement metaphor detection with multimodal splitting fusion

Yang, Qimeng; Meng, Hao; Yan, Yuanbo; Guo, Shisong; Wei, Qixing

doi:10.1007/s11227-025-06958-9

SFVE: visual information enhancement metaphor detection with multimodal splitting fusion

Published: 07 February 2025

Volume 81, article number 467, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Qimeng Yang¹,
Hao Meng¹,
Yuanbo Yan¹,
Shisong Guo¹ &
…
Qixing Wei¹

259 Accesses
Explore all metrics

Abstract

Metaphors are ubiquitous in natural language, and metaphor detection, as an important prerequisite for metaphor understanding, is widely used in natural language processing tasks such as sentiment analysis, sarcasm interpretation, and text comprehension. Current metaphor detection methods rely mainly on text and identify metaphorical language through language analysis. However, these methods usually focus too much on text content, ignore the importance of visual metaphors, and lack effective multimodal metaphor feature integration methods. This paper proposes a metaphor detection model with visual information enhancement based on multimodal split fusion. Specifically, we first use a multidimensional attention enhancement module to process image information. This module optimizes the recognition and processing of key features by sequentially integrating channel and spatial attention mechanisms, thereby improving the performance of the model in visual tasks. To achieve two-way interaction of multimodal metaphor features, we design a multimodal split-fusion module. This module enhances the model’s metaphor detection ability by dividing each modal data into feature blocks of equal size and aggregating and weighting these blocks. Extensive experimental results on the public multimodal metaphor dataset METMeme and the sarcasm dataset Sarcasm verify the effectiveness of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Metaphor recognition based on cross-modal multi-level information fusion

Article Open access 28 December 2024

Uyghur Metaphor Detection via Considering Emotional Consistency

Word-level and phrase-level strategies for figurative text identification

Article 25 February 2022

Data Availability Statement

No datasets were generated or analyzed during the current study.

References

Campbell G (1988) The philosophy of rhetoric. SIU Press
Ekaterina S (2015) Design and evaluation of metaphor processing systems. Comput Linguistics 41(4):579–623
Article MathSciNet MATH Google Scholar
Steen G et al (2010) A method for linguistic metaphor identification from MIP to MIPVU preface, Method for linguistic metaphor identification: from MIP To MIPVU 14 pp.IX–+
Dan A et al (2013) Why "dark thoughts” aren’t really dark: a novel algorithm for metaphor identification. In: IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB). IEEE. 2013:60–65
Bizzoni Y, Chatzikyriakidis S, Ghanimifard M (2017) "deep” learning: detecting metaphoricity in adjective-noun pairs. In: Proceedings of the Workshop on Stylistic Variation. pp.43–52
Lakoff G, Johnson M (2008) Metaphors we live by. University of Chicago press
Wang S et al. (2023) Metaphor detection with effective context denoising. arXiv preprint arXiv:2302.05611
Turney P et al. (2011) Literal and Metaphorical Sense Identification Through Concrete and Abstract Context. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. pp.680–690
Li L, Sporleder C (2009) Classifier Combination for Contextual Idiom Detection Without Labelled Data. In: Proceedings of the 2009 conference on empirical methods in natural language processing. pp.315–323
Shutova E, Sun L, Korhonen A (2010) Metaphor Identification Using Verb and Noun Clustering. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). pp.1002–1010
Ding Y et al. (2024) Clothes-Eraser: clothing-aware controllable disentanglement for clothes-changing person re-identification. In: Signal, Image and Video Processing. pp. 1–12
Ding Y, Wang A, Zhang L (2024) Multidimensional semantic disentanglement network for clothes-changing person re-identification. In: Proceedings of the 2024 International Conference on Multimedia Retrieval. pp. 1025–1033
Do Dinh E-L, Gurevych I (2016) Token-level metaphor detection using neural networks. Proceedings of the Fourth Workshop on Metaphor in NLP. pp.28–33
Mykowiecka A, Wawer A, Marciniak M (2018) Detecting figurative word occurrences using recurrent neural networks. In: Proceedings of the Workshop on Figurative Language Processing. pp.124–127
Song W et al (2021) Verb metaphor detection via contextual relation learning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol 1: Long Papers). pp.4240–4251
Mao R et al (2022) MetaPro: a computational metaphor processing model for text pre-processing. Inform Fusion 86:30–43
Article MATH Google Scholar
Zhang S, Liu Y (2022) Metaphor detection via linguistics enhanced Siamese network. In: Proceedings of the 29th International Conference on Computational Linguistics. pp.4149–4159
Fu C et al (2020) Beyond literal visual modeling: Understanding image metaphor based on literal-implied concept mapping. In: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part I 26. Springer. pp.111–123
Akula AR et al (2023) Metaclue: Towards comprehensive visual metaphors research. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp.23201–23211
He T et al (2024) Balanced active sampling for person re-identification. In: 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE. pp.1–6
Shutova E, Kiela D, Maillard J (2016) Black holes and white rabbits: metaphor identification with visual features. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies. pp.160–170
Kehat G, Pustejovsky J (2020) Improving neural metaphor detection with visual datasets. In: Proceedings of the Twelfth Language Resources and Evaluation Conference. pp.5928–5933
Chang S et al (2021) Multimodal metaphor detection based on distinguishing concreteness. Neurocomputing 429:166–173
Article MATH Google Scholar
Zhang D et al (2021) In Your Face: Sentiment Analysis of Metaphor with Facial Expressive Features. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE. pp.1–8
Yongkang D et al (2025) Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification. Complex Intell Syst 11:1–15
MATH Google Scholar
Li W et al (2024) rLLM: Relational table learning with LLMs. arXiv preprint arXiv:2407.20157
Wang Z et al (2024) From cluster assumption to graph convolution: Graph-based semi-supervised learning revisited. In: IEEE Transactions on Neural Networks and Learning Systems
Wang Z et al (2021) Zero-shot node classification with decomposed graph prototype network. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. pp.1769–1779
Zheng W et al (2020) Network embedding with completely-imbalanced labels. IEEE Trans Knowledge Data Eng 33(11):3634–3647
MATH Google Scholar
Wang Z et al (2017) Multiple source detection without knowing the underlying propagation model. In: Proceedings of the AAAI Conference on Artificial Intelligence
Alnajjar K , Hämäläinen M, Zhang S (2022) Ring that bell: A corpus and method for multimodal metaphor detection in videos. arXiv preprint arXiv:2301.01134
Zhang D et al (2021) MultiMET: a multimodal dataset for metaphor understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp.3214–3225
Xu B et al (2022) Met-meme: a multimodal meme dataset rich in metaphors. In: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. pp.2887–2899
Kim J (2022) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Liming Z et al (2022) An infrared and visible image fusion algorithm based on ResNet-152. Multimed Tools Appl 81(7):9277–9287
Article MATH Google Scholar
Zhou B et al (2016) Learning deep features for discriminative localization. Proceedings of the IEEE conference on computer vision and pattern recognition. pp.2921–2929
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition. pp.7132–7141
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928
Joze HRV et al (2020) MMTM: multimodal transfer module for CNN fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp.13289–13299
Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp.2506–2515
Lewis M et al (2019) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461
Qassim H, Verma A, Feinzimer D (2018) Compressed residual-VGG16 CNN model for big data places image recognition. In: IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE. 2018:169–175
He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp.770–778
Pan H et al (2020) Modeling intra and inter-modality incongruity for multi-modal sarcasm detection. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp.1383–1392
Yang B et al (2022) Multimodal sentiment analysis with unidirectional modality translation. Neurocomputing 467:130–137
Article MATH Google Scholar
de Toledo GL, Marcacini RM (2022) Transfer learning with joint fine-tuning for multimodal sentiment analysis. arXiv preprint arXiv:2210.05790
Chen X et al (2022) Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. pp.904–915
Licai S et al (2023) Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans Affect Comput 15(1):309–325
MATH Google Scholar
Xiaoyu H et al (2024) VIEMF: multimodal metaphor detection via visual information enhancement with multimodal fusion. Inform Process Manage 61(3):103652
Article MATH Google Scholar
Xu N, Zeng Z, Mao W (2020) Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp.3777–3786
Lou C et al (2021) Affective dependency graph for sarcasm detection. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. pp.1844–1849
Liang B et al (2022) Multi-modal sarcasm detection via cross-modal graph convolutional network. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp.1767–1777
Tan Y et al (2023) KnowleNet: knowledge fusion network for multimodal sarcasm detection. Inform Fusion 100:101921
Article MATH Google Scholar
Clark K (2020) Electra: Pre-training text encoders as discriminators rather than generators, arXiv preprint arXiv:2003.10555
Cai H et al (2020) Feature-level fusion approaches based on multimodal EEG data for depression recognition. Inform Fusion 59:127–138
Article MATH Google Scholar
Ghani MKA et al (2020) Decision-level fusion scheme for nasopharyngeal carcinoma identification using machine learning techniques. Neural Comput Appl 32:625–638
Article MATH Google Scholar
Iqbal MA, Baibhav N, Sarbani R (2022) Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities. Knowledge-Based Syst 244:108580
Article Google Scholar
Selvaraju RR et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision. pp.618–626

Download references

Acknowledgements

This research was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (2023D01C176) and the Xinjiang Uygur Autonomous Region Universities Fundamental Research Funds Scientific Research Project (XJEDU2 022P018). We sincerely thank these foundations for their support.

Author information

Authors and Affiliations

College of Software, Xinjiang University, No. 499 Northwest Road, Urumqi, 830000, Xinjiang Uygur Autonomous Region, China
Qimeng Yang, Hao Meng, Yuanbo Yan, Shisong Guo & Qixing Wei

Authors

Qimeng Yang
View author publications
You can also search for this author inPubMed Google Scholar
Hao Meng
View author publications
You can also search for this author inPubMed Google Scholar
Yuanbo Yan
View author publications
You can also search for this author inPubMed Google Scholar
Shisong Guo
View author publications
You can also search for this author inPubMed Google Scholar
Qixing Wei
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

M.H. was primarily responsible for conceptualization, methodology design, data visualization, drafting the original manuscript, and subsequent review and editing. Y.Q.M handled data curation, provided supervision, and contributed to review and editing. Y.Y.B. oversaw supervision and participated in review and editing. G.S.S. conducted investigation and validation. W.Q.X. provided supervision. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hao Meng.

Ethics declarations

Conflict of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Q., Meng, H., Yan, Y. et al. SFVE: visual information enhancement metaphor detection with multimodal splitting fusion. J Supercomput 81, 467 (2025). https://doi.org/10.1007/s11227-025-06958-9

Download citation

Accepted: 15 January 2025
Published: 07 February 2025
DOI: https://doi.org/10.1007/s11227-025-06958-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SFVE: visual information enhancement metaphor detection with multimodal splitting fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Metaphor recognition based on cross-modal multi-level information fusion

Uyghur Metaphor Detection via Considering Emotional Consistency

Word-level and phrase-level strategies for figurative text identification

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now