ABSTRACT
In this paper, we introduce a new Multimodal Entity Linking (MEL) task on the multimodal data. The MEL task discovers entities in multiple modalities and various forms within large-scale multimodal data and maps multimodal mentions in a document to entities in a structured knowledge base such as Wikipedia. Different from the conventional Neural Entity Linking (NEL) task that focuses on textual information solely, MEL aims at achieving human-level disambiguation among entities in images, texts, and knowledge bases. Due to the lack of sufficient labeled data for the MEL task, we release a large-scale multimodal entity linking dataset M3EL (abbreviated for MultiModal Movie Entity Linking). Specifically, we collect reviews and images of 1,100 movies, extract textual and visual mentions, and label them with entities registered in Wikipedia. In addition, we construct a new baseline method to solve the MEL problem, which models the alignment of textual and visual mentions as a bipartite graph matching problem and solves it with an optimal-transportation-based linking method. Extensive experiments on the M3EL dataset verify the quality of the dataset and the effectiveness of the proposed method. We envision this work to be helpful for soliciting more research effort and applications regarding multimodal computing and inference in the future. We make the dataset and the baseline algorithm publicly available at https://jingrug.github.io/research/M3EL.
- Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Building a Multimodal Entity Linking Dataset From Tweets. LREC (2020).Google Scholar
- Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Multimodal Entity Linking for Tweets. ECIR 12035, 1 (2020), 463--478.Google Scholar
- Meysam Asgari-Chenaghlu, M Reza Feizi-Derakhshi, Leili Farzinvash, M A Balafar, and Cina Motamed. 2020. A multimodal deep learning approach for named entity recognition from social media. arXiv.org (Jan. 2020). arXiv:2001.06888v3 [cs.CL]Google Scholar
- Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. VGGFace2 - A Dataset for Recognising Faces across Pose and Age. FG (2018).Google Scholar
- Liqun Chen, Zhe Gan, Yu Cheng 0001, Linjie Li, Lawrence Carin, and Jingjing Liu 0001. 2020. Graph Optimal Transport for Cross-Domain Alignment. ICML (2020).Google Scholar
- Shuang Chen, Jinpeng Wang, Feng Jiang, and Chin-Yew Lin. 2020. Improving Entity Linking by Modeling Latent Entity Type Information. arXiv.org (Jan. 2020), arXiv:2001.01447. arXiv:2001.01447 [cs.CL]Google Scholar
- Alexandre Davis, Adriano Veloso, Altigran Soares da Silva, Alberto H F Laender, and Wagner Meira Jr. 2012. Named Entity Disambiguation in Streaming Data. Annual Meeting of the Association for Computational Linguistics (2012). Google ScholarDigital Library
- Zheng Fang, Yanan Cao, Ren Li, Zhenyu Zhang, Yanbing Liu, and ShiWang. 2020. High Quality Candidate Generation and Sequential Graph Attention Network for Entity Linking. In WWW '20: The Web Conference 2020. ACM, New York, NY, USA, 640--650. Google ScholarDigital Library
- Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. (April 2017). arXiv:1704.04920Google Scholar
- Zhaochen Guo and Denilson Barbosa. 2018. Robust named entity disambiguation with random walks. Semantic Web (2018).Google Scholar
- Nitish Gupta, Sameer Singh 0001, and Dan Roth. 2017. Entity Linking via Joint Encoding of Types, Descriptions, and Context. EMNLP (2017), 2681--2690.Google Scholar
- J Hoffart, M A Yosef, I Bordino, H Fürstenau Proceedings of the, and 2011. [n.d.]. Robust disambiguation of named entities in text. aclweb.org ([n. d.]). Google ScholarDigital Library
- Q Huang, Y Xiong, A Rao, J Wang, D Lin Computer Vision ECCV 2020, and 2020. [n.d.]. Movienet: A holistic dataset for movie understanding. Springer ([n. d.]).Google Scholar
- Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-End Neural Entity Linking. CoNLL (2018).Google Scholar
- P Le, I Titov arXiv preprint arXiv 1804.10637, and 2018. [n.d.]. Improving entity linking by modeling latent relations between mentions. arxiv.org ([n. d.]).Google Scholar
- Phong Le and Ivan Titov. 2019. Boosting Entity Linking Performance by Leveraging Unlabeled Documents. Annual Meeting of the Association for Computational Linguistics (2019), 1935--1945.Google ScholarCross Ref
- Phong Le and Ivan Titov. 2019. Distant Learning for Entity Linking with Automatic Noise Detection. Annual Meeting of the Association for Computational Linguistics (2019), 4081--4090.Google Scholar
- Pei-Chi Lo and Ee-Peng Lim. 2020. Interactive Entity Linking Using Entity-Word Representations. SIGIR (2020). Google ScholarDigital Library
- Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. Annual Meeting of the Association for Computational Linguistics (2018).Google Scholar
- Pedro Henrique Martins, Zita Marinho, and André F T Martins. 2019. Joint Learning of Named Entity Recognition and Entity Linking. Annual Meeting of the Association for Computational Linguistics (2019).Google Scholar
- Seungwhan Moon, Leonardo Neves, and Vitor Carvalho. 2018. Multimodal Named Entity Disambiguation for Noisy Social Media Posts. Annual Meeting of the Association for Computational Linguistics (2018).Google Scholar
- Jose G Moreno, Romaric Besançon, Romain Beaumont, Eva D'hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, and Brigitte Grau. 2017. Combining Word and Entity Embeddings for Entity Linking. ESWC (2017).Google Scholar
- Yasumasa Onoe and Greg Durrett. 2020. Fine-Grained Entity Typing for Domain Independent Entity Linking. AAAI (2020).Google Scholar
- Maria Pershina, Yifan He, and Ralph Grishman. 2015. Personalized Page Rank for Named Entity Disambiguation. HLT-NAACL (2015).Google Scholar
- Lev-Arie Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. Annual Meeting of the Association for Computational Linguistics (2011). Google ScholarDigital Library
- Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, and Chris Biemann. 2020. Neural Entity Linking: A Survey of Models Based on Deep Learning. arXiv.org (June 2020), arXiv:2006.00575. arXiv:2006.00575 [cs.CL]Google Scholar
- Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural Cross- Lingual Entity Linking. AAAI (2018).Google Scholar
- Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago - a core of semantic knowledge. WWW (2007). Google ScholarDigital Library
- Yaming Sun, Lei Lin 0001, Duyu Tang, Nan Yang 0002, Zhenzhou Ji, and Xiaolong Wang 0001. 2015. Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation. IJCAI (2015). Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv.org (June 2017). arXiv:1706.03762v5 [cs.CL]Google Scholar
- Titouan Vayer, Nicolas Courty, Romain Tavenard, Laetitia Chapel, and Rémi Flamary. 2019. Optimal Transport for structured data with application on graphs. ICML (2019).Google Scholar
- Junshuang Wu, Richong Zhang, Yongyi Mao, Hongyu Guo, Masoumeh Soflaei, and Jinpeng Huai. 2020. Dynamic Graph Convolutional Networks for Entity Linking. In WWW '20: The Web Conference 2020. ACM, New York, NY, USA, 1149--1159. Google ScholarDigital Library
- Hongteng Xu, Dixin Luo, and Lawrence Carin. 2019. Scalable Gromov- Wasserstein Learning for Graph Partitioning and Matching. (2019). arXiv:1905.07645 Google ScholarDigital Library
- Jianfei Yu, Jing Jiang 0001, Li Yang, and Rui Xia. 2020. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. Annual Meeting of the Association for Computational Linguistics (2020).Google ScholarCross Ref
- Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018. Adaptive Coattention Network for Named Entity Recognition in Tweets. AAAI 32, 1 (April 2018).Google Scholar
- Xiaoling Zhou, Yukai Miao, Wei Wang 0011, and Jianbin Qin. 2020. A Recurrent Model for Collective Entity Linking with Adaptive Features. AAAI (2020).Google Scholar
- Stefan Zwicklbauer, Christin Seifert, and Michael Granitzer. 2016. Robust and Collective Entity Disambiguation through Semantic Embeddings. SIGIR (2016). Google ScholarDigital Library
Index Terms
- Multimodal Entity Linking: A New Dataset and A Baseline
Recommendations
Reddit entity linking dataset
AbstractWe introduce and make publicly available an entity linking dataset from Reddit that contains 17,316 linked entities, each annotated by three human annotators and then grouped into Gold, Silver, and Bronze to indicate inter-annotator ...
Highlights- We release a new entity linking dataset taken from Reddit.
- Human annotators ...
NILK: Entity Linking Dataset Targeting NIL-linking Cases
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementThe NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. NIL-linking has two sub-tasks: NIL-detection and NIL-disambiguation. NIL-detection identifies NIL-...
Multi-Grained Multimodal Interaction Network for Entity Linking
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningMultimodal entity linking (MEL) task, which aims at resolving ambiguous mentions to a multimodal knowledge graph, has attracted wide attention in recent years. Though large efforts have been made to explore the complementary effect among multiple ...
Comments