research-article

Multimodal Entity Linking with Gated Hierarchical Fusion and Contrastive Training

Authors:

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 938 - 948

https://doi.org/10.1145/3477495.3531867

Published: 07 July 2022 Publication History

Get Access

Abstract

Previous entity linking methods in knowledge graphs (KGs) mostly link the textual mentions to corresponding entities. However, they have deficiencies in processing numerous multimodal data, when the text is too short to provide enough context. Consequently, we conceive the idea of introducing valuable information of other modalities, and propose a novel multimodal entity linking method with gated hierarchical multimodal fusion and contrastive training (GHMFC). Firstly, in order to discover the fine-grained inter-modal correlations, GHMFC extracts the hierarchical features of text and visual co-attention through the multi-modal co-attention mechanism: textual-guided visual attention and visual-guided textual attention. The former attention obtains weighted visual features under the guidance of textual information. In contrast, the latter attention produces weighted textual features under the guidance of visual information. Afterwards, gated fusion is used to evaluate the importance of hierarchical features of different modalities and integrate them into the final multimodal representations of mentions. Subsequently, contrastive training with two types of contrastive losses is designed to learn more generic multimodal features and reduce noise. Finally, the linking entities are selected by calculating the cosine similarity between representations of mentions and entities in KGs. To evaluate the proposed method, this paper releases two new open multimodal entity linking datasets: WikiMEL and Richpedia-MEL. Experimental results demonstrate that GHMFC can learn meaningful multimodal representation and significantly outperforms most of the baseline methods.

Supplementary Material

MP4 File (GHMFC_presentation.mp4)

Presentation video of our paper "Multimodal Entity Linking with Gated Hierarchical Fusion and Contrastive Training"

Download
92.53 MB

References

[1]

Omar Adjali, Romaric Besancc on, Olivier Ferret, et almbox. 2020 a. Building a Multimodal Entity Linking Dataset From Tweets. In Proceedings of the 12th Conference on Language Resources and Evaluation. 4285--4292.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Multi-Grained Multimodal Interaction Network for Entity Linking

Multimodal Entity Linking with Mixed Fusion Mechanism

Personal Entity, Concept, and Named Entity Linking in Conversations

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations