skip to main content
10.1145/3474085.3475400acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multimodal Entity Linking: A New Dataset and A Baseline

Authors Info & Claims
Published:17 October 2021Publication History

ABSTRACT

In this paper, we introduce a new Multimodal Entity Linking (MEL) task on the multimodal data. The MEL task discovers entities in multiple modalities and various forms within large-scale multimodal data and maps multimodal mentions in a document to entities in a structured knowledge base such as Wikipedia. Different from the conventional Neural Entity Linking (NEL) task that focuses on textual information solely, MEL aims at achieving human-level disambiguation among entities in images, texts, and knowledge bases. Due to the lack of sufficient labeled data for the MEL task, we release a large-scale multimodal entity linking dataset M3EL (abbreviated for MultiModal Movie Entity Linking). Specifically, we collect reviews and images of 1,100 movies, extract textual and visual mentions, and label them with entities registered in Wikipedia. In addition, we construct a new baseline method to solve the MEL problem, which models the alignment of textual and visual mentions as a bipartite graph matching problem and solves it with an optimal-transportation-based linking method. Extensive experiments on the M3EL dataset verify the quality of the dataset and the effectiveness of the proposed method. We envision this work to be helpful for soliciting more research effort and applications regarding multimodal computing and inference in the future. We make the dataset and the baseline algorithm publicly available at https://jingrug.github.io/research/M3EL.

References

  1. Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Building a Multimodal Entity Linking Dataset From Tweets. LREC (2020).Google ScholarGoogle Scholar
  2. Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, and Brigitte Grau. 2020. Multimodal Entity Linking for Tweets. ECIR 12035, 1 (2020), 463--478.Google ScholarGoogle Scholar
  3. Meysam Asgari-Chenaghlu, M Reza Feizi-Derakhshi, Leili Farzinvash, M A Balafar, and Cina Motamed. 2020. A multimodal deep learning approach for named entity recognition from social media. arXiv.org (Jan. 2020). arXiv:2001.06888v3 [cs.CL]Google ScholarGoogle Scholar
  4. Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. VGGFace2 - A Dataset for Recognising Faces across Pose and Age. FG (2018).Google ScholarGoogle Scholar
  5. Liqun Chen, Zhe Gan, Yu Cheng 0001, Linjie Li, Lawrence Carin, and Jingjing Liu 0001. 2020. Graph Optimal Transport for Cross-Domain Alignment. ICML (2020).Google ScholarGoogle Scholar
  6. Shuang Chen, Jinpeng Wang, Feng Jiang, and Chin-Yew Lin. 2020. Improving Entity Linking by Modeling Latent Entity Type Information. arXiv.org (Jan. 2020), arXiv:2001.01447. arXiv:2001.01447 [cs.CL]Google ScholarGoogle Scholar
  7. Alexandre Davis, Adriano Veloso, Altigran Soares da Silva, Alberto H F Laender, and Wagner Meira Jr. 2012. Named Entity Disambiguation in Streaming Data. Annual Meeting of the Association for Computational Linguistics (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zheng Fang, Yanan Cao, Ren Li, Zhenyu Zhang, Yanbing Liu, and ShiWang. 2020. High Quality Candidate Generation and Sequential Graph Attention Network for Entity Linking. In WWW '20: The Web Conference 2020. ACM, New York, NY, USA, 640--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. (April 2017). arXiv:1704.04920Google ScholarGoogle Scholar
  10. Zhaochen Guo and Denilson Barbosa. 2018. Robust named entity disambiguation with random walks. Semantic Web (2018).Google ScholarGoogle Scholar
  11. Nitish Gupta, Sameer Singh 0001, and Dan Roth. 2017. Entity Linking via Joint Encoding of Types, Descriptions, and Context. EMNLP (2017), 2681--2690.Google ScholarGoogle Scholar
  12. J Hoffart, M A Yosef, I Bordino, H Fürstenau Proceedings of the, and 2011. [n.d.]. Robust disambiguation of named entities in text. aclweb.org ([n. d.]). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Q Huang, Y Xiong, A Rao, J Wang, D Lin Computer Vision ECCV 2020, and 2020. [n.d.]. Movienet: A holistic dataset for movie understanding. Springer ([n. d.]).Google ScholarGoogle Scholar
  14. Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-End Neural Entity Linking. CoNLL (2018).Google ScholarGoogle Scholar
  15. P Le, I Titov arXiv preprint arXiv 1804.10637, and 2018. [n.d.]. Improving entity linking by modeling latent relations between mentions. arxiv.org ([n. d.]).Google ScholarGoogle Scholar
  16. Phong Le and Ivan Titov. 2019. Boosting Entity Linking Performance by Leveraging Unlabeled Documents. Annual Meeting of the Association for Computational Linguistics (2019), 1935--1945.Google ScholarGoogle ScholarCross RefCross Ref
  17. Phong Le and Ivan Titov. 2019. Distant Learning for Entity Linking with Automatic Noise Detection. Annual Meeting of the Association for Computational Linguistics (2019), 4081--4090.Google ScholarGoogle Scholar
  18. Pei-Chi Lo and Ee-Peng Lim. 2020. Interactive Entity Linking Using Entity-Word Representations. SIGIR (2020). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. Annual Meeting of the Association for Computational Linguistics (2018).Google ScholarGoogle Scholar
  20. Pedro Henrique Martins, Zita Marinho, and André F T Martins. 2019. Joint Learning of Named Entity Recognition and Entity Linking. Annual Meeting of the Association for Computational Linguistics (2019).Google ScholarGoogle Scholar
  21. Seungwhan Moon, Leonardo Neves, and Vitor Carvalho. 2018. Multimodal Named Entity Disambiguation for Noisy Social Media Posts. Annual Meeting of the Association for Computational Linguistics (2018).Google ScholarGoogle Scholar
  22. Jose G Moreno, Romaric Besançon, Romain Beaumont, Eva D'hondt, Anne-Laure Ligozat, Sophie Rosset, Xavier Tannier, and Brigitte Grau. 2017. Combining Word and Entity Embeddings for Entity Linking. ESWC (2017).Google ScholarGoogle Scholar
  23. Yasumasa Onoe and Greg Durrett. 2020. Fine-Grained Entity Typing for Domain Independent Entity Linking. AAAI (2020).Google ScholarGoogle Scholar
  24. Maria Pershina, Yifan He, and Ralph Grishman. 2015. Personalized Page Rank for Named Entity Disambiguation. HLT-NAACL (2015).Google ScholarGoogle Scholar
  25. Lev-Arie Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. Annual Meeting of the Association for Computational Linguistics (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, and Chris Biemann. 2020. Neural Entity Linking: A Survey of Models Based on Deep Learning. arXiv.org (June 2020), arXiv:2006.00575. arXiv:2006.00575 [cs.CL]Google ScholarGoogle Scholar
  27. Avirup Sil, Gourab Kundu, Radu Florian, and Wael Hamza. 2018. Neural Cross- Lingual Entity Linking. AAAI (2018).Google ScholarGoogle Scholar
  28. Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago - a core of semantic knowledge. WWW (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yaming Sun, Lei Lin 0001, Duyu Tang, Nan Yang 0002, Zhenzhou Ji, and Xiaolong Wang 0001. 2015. Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation. IJCAI (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv.org (June 2017). arXiv:1706.03762v5 [cs.CL]Google ScholarGoogle Scholar
  31. Titouan Vayer, Nicolas Courty, Romain Tavenard, Laetitia Chapel, and Rémi Flamary. 2019. Optimal Transport for structured data with application on graphs. ICML (2019).Google ScholarGoogle Scholar
  32. Junshuang Wu, Richong Zhang, Yongyi Mao, Hongyu Guo, Masoumeh Soflaei, and Jinpeng Huai. 2020. Dynamic Graph Convolutional Networks for Entity Linking. In WWW '20: The Web Conference 2020. ACM, New York, NY, USA, 1149--1159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hongteng Xu, Dixin Luo, and Lawrence Carin. 2019. Scalable Gromov- Wasserstein Learning for Graph Partitioning and Matching. (2019). arXiv:1905.07645 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jianfei Yu, Jing Jiang 0001, Li Yang, and Rui Xia. 2020. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. Annual Meeting of the Association for Computational Linguistics (2020).Google ScholarGoogle ScholarCross RefCross Ref
  35. Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. 2018. Adaptive Coattention Network for Named Entity Recognition in Tweets. AAAI 32, 1 (April 2018).Google ScholarGoogle Scholar
  36. Xiaoling Zhou, Yukai Miao, Wei Wang 0011, and Jianbin Qin. 2020. A Recurrent Model for Collective Entity Linking with Adaptive Features. AAAI (2020).Google ScholarGoogle Scholar
  37. Stefan Zwicklbauer, Christin Seifert, and Michael Granitzer. 2016. Robust and Collective Entity Disambiguation through Semantic Embeddings. SIGIR (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multimodal Entity Linking: A New Dataset and A Baseline

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader