Conferences >ICASSP 2024 - 2024 IEEE Inter...

MTA: A Lightweight Multilingual Text Alignment Model for Cross-Language Visual Word Sense Disambiguation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Visual Word Sense Disambiguation (Visual-WSD), as a sub-task of fine-grained image-text retrieval, requires a high level of language-vision understanding to capture and e...Show More

Metadata

Abstract:

Visual Word Sense Disambiguation (Visual-WSD), as a sub-task of fine-grained image-text retrieval, requires a high level of language-vision understanding to capture and exploit the nuanced relationships between text and visual features. However, the cross-linguistic background only with limited contextual information is considered the most significant challenges for this task. In this paper, we propose MTA, which employs a new approach for multilingual contrastive learning with self-distillation to align fine-grained textual features to fixed vision features and align non-English textual features to English textual momentum features. It is a lightweight and end-to-end model since it does not require updating the visual encoder or translation operations. Furthermore, a trilingual fine-grained image-text dataset is developed and a ChatGPT API module is integrated to enrich the word senses effectively during the testing phase. Extensive experiments show that MTA achieves state-of-the-art results on the benchmark English, Farsi, and Italian datasets in SemEval-2023 Task 1 and exhibits impressive generalization abilities when dealing with variations in text length and language.

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10447455

Conference Location: Seoul, Korea, Republic of

Contents

References is not available for this document.

MTA: A Lightweight Multilingual Text Alignment Model for Cross-Language Visual Word Sense Disambiguation

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MTA: A Lightweight Multilingual Text Alignment Model for Cross-Language Visual Word Sense Disambiguation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?