Abstract
Multimodal named entity recognition (MNER) is an emerging task that incorporates visual and textual inputs to detect named entities and predicts their corresponding entity types. However, existing MNER methods often fail to capture certain entity-related but text-loosely-related visual clues from the image, which may introduce task-irrelevant noises or even errors. To address this problem, we propose to utilize entity-related prompts for extracting proper visual clues with a pre-trained vision-language model. To better integrate different modalities and address the popular semantic gap problem, we further propose a modality-aware attention mechanism for better cross-modal fusion. Experimental results on two benchmarks show that our MNER approach outperforms the state-of-the-art MNER approaches with a large margin.
This work was conducted when Min Gui worked at Alibaba.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Chen, D., Li, Z., Gu, B., Chen, Z.: Multimodal named entity recognition with image attributes and image knowledge. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 186–201. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_12
Chen, S., Aguilar, G., et al.: Can images help recognize entities? A study of the role of images for multimodal NER (2021)
Fu, J., Huang, X., Liu, P.: SpanNER: Named entity re-/recognition as span prediction. arXiv preprint arXiv:2106.00641 (2021)
Li, X.L., Liang, P.: Prefix-Tuning: optimizing continuous prompts for generation. In: Proceedings of ACL, pp. 4582–4597 (2021)
Liu, C., Fan, H., Liu, J.: Span-based nested named entity recognition with pretrained language model. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 620–628. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_42
Liu, H., Singh, P.: ConceptNet-a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004). https://doi.org/10.1023/B:BTTJ.0000047600.45421.6d
Liu, P., Yuan, W., et al.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv:2107.13586 (2021)
Liu, Y., Ott, M., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019)
Lu, D., Neves, L., et al.: Visual attention model for name tagging in multimodal social media. In: Proceedings of ACL, pp. 1990–1999 (2018)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Moon, S., Neves, L., et al.: Multimodal named entity recognition for short social media posts. In: Proceedings of NAACL, pp. 852–860 (2018)
Sun, L., Wang, J., et al.: RpBERT: a text-image relation propagation-based BERT model for multimodal NER. In: Proceedings of AAAI, vol. 35 (2021)
Sun, L., Wang, J., et al.: RIVA: a pre-trained tweet multimodal model based on text-image relation for multimodal NER. In: COLING, pp. 1852–1862 (2020)
Wu, Z., Zheng, C., et al.: Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts. In: MM (2020)
Yamada, I., Asai, A., et al.: LUKE: deep contextualized entity representations with entity-aware self-attention. arXiv preprint arXiv:2010.01057 (2020)
Yu, J., Jiang, J., et al.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of ACL (2020)
Zhang, D., Wei, S., et al.: Multi-modal graph fusion for named entity recognition with targeted visual guidance. In: Proceedings of AAAI, pp. 14347–14355 (2021)
Zhang, Q., Fu, J., et al.: Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of AAAI (2018)
Acknowledgement
This research was supported by the National Key Research and Development Project (No. 2020AAA0109302), National Natural Science Foundation of China (No. 62072323), Shanghai Science and Technology Innovation Action Plan (No. 19511120400), Shanghai Municipal Science an Technology Major Project (No. 2021SHZDZX0103) and Alibaba Research Intern Program.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X. et al. (2022). PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13247. Springer, Cham. https://doi.org/10.1007/978-3-031-00129-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-00129-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00128-4
Online ISBN: 978-3-031-00129-1
eBook Packages: Computer ScienceComputer Science (R0)