PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition

Wang, Xuwu; Tian, Junfeng; Gui, Min; Li, Zhixu; Ye, Jiabo; Yan, Ming; Xiao, Yanghua

doi:10.1007/978-3-031-00129-1_24

Xuwu Wang¹⁶,
Junfeng Tian¹⁷,
Min Gui¹⁸,
Zhixu Li¹⁶,
Jiabo Ye¹⁹,
Ming Yan¹⁷ &
…
Yanghua Xiao^16,20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13247))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3858 Accesses

Abstract

Multimodal named entity recognition (MNER) is an emerging task that incorporates visual and textual inputs to detect named entities and predicts their corresponding entity types. However, existing MNER methods often fail to capture certain entity-related but text-loosely-related visual clues from the image, which may introduce task-irrelevant noises or even errors. To address this problem, we propose to utilize entity-related prompts for extracting proper visual clues with a pre-trained vision-language model. To better integrate different modalities and address the popular semantic gap problem, we further propose a modality-aware attention mechanism for better cross-modal fusion. Experimental results on two benchmarks show that our MNER approach outperforms the state-of-the-art MNER approaches with a large margin.

This work was conducted when Min Gui worked at Alibaba.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance

Article 22 July 2024

MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition

Article 08 February 2024

P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognition

Notes

1.
http://relatedwords.org.

References

Chen, D., Li, Z., Gu, B., Chen, Z.: Multimodal named entity recognition with image attributes and image knowledge. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 186–201. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_12
Chapter Google Scholar
Chen, S., Aguilar, G., et al.: Can images help recognize entities? A study of the role of images for multimodal NER (2021)
Google Scholar
Fu, J., Huang, X., Liu, P.: SpanNER: Named entity re-/recognition as span prediction. arXiv preprint arXiv:2106.00641 (2021)
Li, X.L., Liang, P.: Prefix-Tuning: optimizing continuous prompts for generation. In: Proceedings of ACL, pp. 4582–4597 (2021)
Google Scholar
Liu, C., Fan, H., Liu, J.: Span-based nested named entity recognition with pretrained language model. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 620–628. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_42
Chapter Google Scholar
Liu, H., Singh, P.: ConceptNet-a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004). https://doi.org/10.1023/B:BTTJ.0000047600.45421.6d
Article Google Scholar
Liu, P., Yuan, W., et al.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv:2107.13586 (2021)
Liu, Y., Ott, M., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019)
Lu, D., Neves, L., et al.: Visual attention model for name tagging in multimodal social media. In: Proceedings of ACL, pp. 1990–1999 (2018)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Moon, S., Neves, L., et al.: Multimodal named entity recognition for short social media posts. In: Proceedings of NAACL, pp. 852–860 (2018)
Google Scholar
Sun, L., Wang, J., et al.: RpBERT: a text-image relation propagation-based BERT model for multimodal NER. In: Proceedings of AAAI, vol. 35 (2021)
Google Scholar
Sun, L., Wang, J., et al.: RIVA: a pre-trained tweet multimodal model based on text-image relation for multimodal NER. In: COLING, pp. 1852–1862 (2020)
Google Scholar
Wu, Z., Zheng, C., et al.: Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts. In: MM (2020)
Google Scholar
Yamada, I., Asai, A., et al.: LUKE: deep contextualized entity representations with entity-aware self-attention. arXiv preprint arXiv:2010.01057 (2020)
Yu, J., Jiang, J., et al.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of ACL (2020)
Google Scholar
Zhang, D., Wei, S., et al.: Multi-modal graph fusion for named entity recognition with targeted visual guidance. In: Proceedings of AAAI, pp. 14347–14355 (2021)
Google Scholar
Zhang, Q., Fu, J., et al.: Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of AAAI (2018)
Google Scholar

Download references

Acknowledgement

This research was supported by the National Key Research and Development Project (No. 2020AAA0109302), National Natural Science Foundation of China (No. 62072323), Shanghai Science and Technology Innovation Action Plan (No. 19511120400), Shanghai Municipal Science an Technology Major Project (No. 2021SHZDZX0103) and Alibaba Research Intern Program.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
Xuwu Wang, Zhixu Li & Yanghua Xiao
Alibaba DAMO Academy, Hangzhou, China
Junfeng Tian & Ming Yan
Shopee, Singapore, Singapore
Min Gui
East China Normal University, Shanghai, China
Jiabo Ye
Fudan-Aishu Cognitive Intelligence Joint Research Center, Shanghai, China
Yanghua Xiao

Authors

Xuwu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Tian
View author publications
You can also search for this author in PubMed Google Scholar
Min Gui
View author publications
You can also search for this author in PubMed Google Scholar
Zhixu Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiabo Ye
View author publications
You can also search for this author in PubMed Google Scholar
Ming Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yanghua Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhixu Li or Yanghua Xiao .

Editor information

Editors and Affiliations

Indian Institute of Technology Kanpur, Kanpur, India
Arnab Bhattacharya
National University of Singapore, Singapore, Singapore
Janice Lee Mong Li
University of California, Santa Barbara, Santa Barbara, CA, USA
Divyakant Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Mukesh Mohania
Ashoka University, Sonepat, Haryana, India
Anirban Mondal
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Vikram Goyal
University of Aizu, Aizu, Japan
Rage Uday Kiran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X. et al. (2022). PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13247. Springer, Cham. https://doi.org/10.1007/978-3-031-00129-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-00129-1_24
Published: 08 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00128-4
Online ISBN: 978-3-031-00129-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition