research-article

Multimodal Entity Linking: A New Dataset and A Baseline

Authors:

Qingming HuangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 993 - 1001

https://doi.org/10.1145/3474085.3475400

Published: 17 October 2021 Publication History

Abstract

In this paper, we introduce a new Multimodal Entity Linking (MEL) task on the multimodal data. The MEL task discovers entities in multiple modalities and various forms within large-scale multimodal data and maps multimodal mentions in a document to entities in a structured knowledge base such as Wikipedia. Different from the conventional Neural Entity Linking (NEL) task that focuses on textual information solely, MEL aims at achieving human-level disambiguation among entities in images, texts, and knowledge bases. Due to the lack of sufficient labeled data for the MEL task, we release a large-scale multimodal entity linking dataset M3EL (abbreviated for MultiModal Movie Entity Linking). Specifically, we collect reviews and images of 1,100 movies, extract textual and visual mentions, and label them with entities registered in Wikipedia. In addition, we construct a new baseline method to solve the MEL problem, which models the alignment of textual and visual mentions as a bipartite graph matching problem and solves it with an optimal-transportation-based linking method. Extensive experiments on the M3EL dataset verify the quality of the dataset and the effectiveness of the proposed method. We envision this work to be helpful for soliciting more research effort and applications regarding multimodal computing and inference in the future. We make the dataset and the baseline algorithm publicly available at https://jingrug.github.io/research/M3EL.

References

[1]

Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Building a Multimodal Entity Linking Dataset From Tweets. LREC (2020).

[2]

Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Multimodal Entity Linking for Tweets. ECIR 12035, 1 (2020), 463--478.

[3]

Meysam Asgari-Chenaghlu, M Reza Feizi-Derakhshi, Leili Farzinvash, M A Balafar, and Cina Motamed. 2020. A multimodal deep learning approach for named entity recognition from social media. arXiv.org (Jan. 2020). arXiv:2001.06888v3 [cs.CL]

[4]

Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. VGGFace2 - A Dataset for Recognising Faces across Pose and Age. FG (2018).

[5]

Liqun Chen, Zhe Gan, Yu Cheng 0001, Linjie Li, Lawrence Carin, and Jingjing Liu 0001. 2020. Graph Optimal Transport for Cross-Domain Alignment. ICML (2020).

[6]

Shuang Chen, Jinpeng Wang, Feng Jiang, and Chin-Yew Lin. 2020. Improving Entity Linking by Modeling Latent Entity Type Information. arXiv.org (Jan. 2020), arXiv:2001.01447. arXiv:2001.01447 [cs.CL]

[7]

Alexandre Davis, Adriano Veloso, Altigran Soares da Silva, Alberto H F Laender, and Wagner Meira Jr. 2012. Named Entity Disambiguation in Streaming Data. Annual Meeting of the Association for Computational Linguistics (2012).

Digital Library

[8]

Zheng Fang, Yanan Cao, Ren Li, Zhenyu Zhang, Yanbing Liu, and ShiWang. 2020. High Quality Candidate Generation and Sequential Graph Attention Network for Entity Linking. In WWW '20: The Web Conference 2020. ACM, New York, NY, USA, 640--650.

Digital Library

[9]

Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. (April 2017). arXiv:1704.04920

[10]

Zhaochen Guo and Denilson Barbosa. 2018. Robust named entity disambiguation with random walks. Semantic Web (2018).

[11]

Nitish Gupta, Sameer Singh 0001, and Dan Roth. 2017. Entity Linking via Joint Encoding of Types, Descriptions, and Context. EMNLP (2017), 2681--2690.

[12]

J Hoffart, M A Yosef, I Bordino, H Fürstenau Proceedings of the, and 2011. [n.d.]. Robust disambiguation of named entities in text. aclweb.org ([n. d.]).

Digital Library

[13]

Q Huang, Y Xiong, A Rao, J Wang, D Lin Computer Vision ECCV 2020, and 2020. [n.d.]. Movienet: A holistic dataset for movie understanding. Springer ([n. d.]).

[14]

Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-End Neural Entity Linking. CoNLL (2018).

[15]

P Le, I Titov arXiv preprint arXiv 1804.10637, and 2018. [n.d.]. Improving entity linking by modeling latent relations between mentions. arxiv.org ([n. d.]).

[16]

Phong Le and Ivan Titov. 2019. Boosting Entity Linking Performance by Leveraging Unlabeled Documents. Annual Meeting of the Association for Computational Linguistics (2019), 1935--1945.

[17]

Phong Le and Ivan Titov. 2019. Distant Learning for Entity Linking with Automatic Noise Detection. Annual Meeting of the Association for Computational Linguistics (2019), 4081--4090.

[18]

Pei-Chi Lo and Ee-Peng Lim. 2020. Interactive Entity Linking Using Entity-Word Representations. SIGIR (2020).

Digital Library

[19]

Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. Annual Meeting of the Association for Computational Linguistics (2018).

[20]

Pedro Henrique Martins, Zita Marinho, and André F T Martins. 2019. Joint Learning of Named Entity Recognition and Entity Linking. Annual Meeting of the Association for Computational Linguistics (2019).

[21]

Seungwhan Moon, Leonardo Neves, and Vitor Carvalho. 2018. Multimodal Named Entity Disambiguation for Noisy Social Media Posts. Annual Meeting of the Association for Computational Linguistics (2018).

[22]

Jose G Moreno, Romaric Besançon, Romain Beaumont, Eva D'hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, and Brigitte Grau. 2017. Combining Word and Entity Embeddings for Entity Linking. ESWC (2017).

[23]

Yasumasa Onoe and Greg Durrett. 2020. Fine-Grained Entity Typing for Domain Independent Entity Linking. AAAI (2020).

[24]

Maria Pershina, Yifan He, and Ralph Grishman. 2015. Personalized Page Rank for Named Entity Disambiguation. HLT-NAACL (2015).

[25]

Lev-Arie Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. Annual Meeting of the Association for Computational Linguistics (2011).

Digital Library

[26]

Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, and Chris Biemann. 2020. Neural Entity Linking: A Survey of Models Based on Deep Learning. arXiv.org (June 2020), arXiv:2006.00575. arXiv:2006.00575 [cs.CL]

[27]

Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural Cross- Lingual Entity Linking. AAAI (2018).

[28]

Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago - a core of semantic knowledge. WWW (2007).

Digital Library

[29]

Yaming Sun, Lei Lin 0001, Duyu Tang, Nan Yang 0002, Zhenzhou Ji, and Xiaolong Wang 0001. 2015. Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation. IJCAI (2015).

Digital Library

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv.org (June 2017). arXiv:1706.03762v5 [cs.CL]

[31]

Titouan Vayer, Nicolas Courty, Romain Tavenard, Laetitia Chapel, and Rémi Flamary. 2019. Optimal Transport for structured data with application on graphs. ICML (2019).

[32]

Junshuang Wu, Richong Zhang, Yongyi Mao, Hongyu Guo, Masoumeh Soflaei, and Jinpeng Huai. 2020. Dynamic Graph Convolutional Networks for Entity Linking. In WWW '20: The Web Conference 2020. ACM, New York, NY, USA, 1149--1159.

Digital Library

[33]

Hongteng Xu, Dixin Luo, and Lawrence Carin. 2019. Scalable Gromov- Wasserstein Learning for Graph Partitioning and Matching. (2019). arXiv:1905.07645

Digital Library

[34]

Jianfei Yu, Jing Jiang 0001, Li Yang, and Rui Xia. 2020. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. Annual Meeting of the Association for Computational Linguistics (2020).

[35]

Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018. Adaptive Coattention Network for Named Entity Recognition in Tweets. AAAI 32, 1 (April 2018).

[36]

Xiaoling Zhou, Yukai Miao, Wei Wang 0011, and Jianbin Qin. 2020. A Recurrent Model for Collective Entity Linking with Adaptive Features. AAAI (2020).

[37]

Stefan Zwicklbauer, Christin Seifert, and Michael Granitzer. 2016. Robust and Collective Entity Disambiguation through Semantic Embeddings. SIGIR (2016).

Digital Library

Cited By

Song SZhao S(2024)Whether Current Large Language Models is Suitable for Multimodal Aspect-based Sentiment Analysis?Proceedings of the 2024 2nd International Conference on Advances in Artificial Intelligence and Applications10.1145/3712623.3712644(125-130)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3712623.3712644
Luo PXu TLiu CZhang SXu LLi MChen ECai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Bridging Gaps in Content and Knowledge for Multimodal Entity LinkingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681661(9311-9320)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681661
Liu QHe YXu TLian DLiu CZheng ZChen ESerra ESpezzano F(2024)UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language ModelsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679793(1909-1919)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679793
Show More Cited By

Index Terms

Multimodal Entity Linking: A New Dataset and A Baseline
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Multi-Grained Multimodal Interaction Network for Entity Linking
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Multimodal entity linking (MEL) task, which aims at resolving ambiguous mentions to a multimodal knowledge graph, has attracted wide attention in recent years. Though large efforts have been made to explore the complementary effect among multiple ...
NILK: Entity Linking Dataset Targeting NIL-linking Cases
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. NIL-linking has two sub-tasks: NIL-detection and NIL-disambiguation. NIL-detection identifies NIL-...
DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model
Pattern Recognition and Computer Vision
Abstract
Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
Key Research Program of Frontier Science, Chinese Academy of Sciences
National Natural Science Foundation of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,055
Total Downloads

Downloads (Last 12 months)143
Downloads (Last 6 weeks)18

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Song SZhao S(2024)Whether Current Large Language Models is Suitable for Multimodal Aspect-based Sentiment Analysis?Proceedings of the 2024 2nd International Conference on Advances in Artificial Intelligence and Applications10.1145/3712623.3712644(125-130)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3712623.3712644
Luo PXu TLiu CZhang SXu LLi MChen ECai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Bridging Gaps in Content and Knowledge for Multimodal Entity LinkingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681661(9311-9320)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681661
Liu QHe YXu TLian DLiu CZheng ZChen ESerra ESpezzano F(2024)UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language ModelsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679793(1909-1919)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679793
Li HYue YMan XLi H(2024)Video Multimodal Entity Linking via Multi-Perspective Enhanced Subgraph Contrastive NetworkInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450036034:11(1757-1781)Online publication date: 30-Aug-2024
https://doi.org/10.1142/S0218194024500360
Zhang XMeng KWang H(2024)TRAFMEL: Multimodal Entity Linking Based on Transformer Reranking and Multimodal Co-Attention FusionInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450013X34:06(973-997)Online publication date: 16-May-2024
https://doi.org/10.1142/S021819402450013X
Xu YChen Q(2024)Radar Can See and Hear as Well: A New Multimodal Benchmark Based on Radar SensingIEEE Internet of Things Journal10.1109/JIOT.2024.339628511:15(26459-26469)Online publication date: 1-Aug-2024
https://doi.org/10.1109/JIOT.2024.3396285
Tharaniya Sairaj RBalasundaram S(2024)Improving Uniqueness of Named Entities with Knowledge-Based Textual Enrichment in Automatic Question GenerationRecent Advances in Artificial Intelligence and Smart Applications10.1007/978-981-97-3485-6_5(57-68)Online publication date: 22-Sep-2024
https://doi.org/10.1007/978-981-97-3485-6_5
Adamik MPernisch RTiddi ISchlobach S(2024)Advancing Robotic Perception with Perceived-Entity LinkingThe Semantic Web – ISWC 202410.1007/978-3-031-77850-6_11(192-209)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1007/978-3-031-77850-6_11
Chen YGe XYang SHu LLi JZhang J(2023)A Survey on Multimodal Knowledge Graphs: Construction, Completion and ApplicationsMathematics10.3390/math1108181511:8(1815)Online publication date: 11-Apr-2023
https://doi.org/10.3390/math11081815
Xing SZhao FWu ZLi CZhang JDai XEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)DRIN: Dynamic Relation Interactive Network for Multimodal Entity LinkingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612575(3599-3608)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612575
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten