Skip to main content

PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13247))

Included in the following conference series:

Abstract

Multimodal named entity recognition (MNER) is an emerging task that incorporates visual and textual inputs to detect named entities and predicts their corresponding entity types. However, existing MNER methods often fail to capture certain entity-related but text-loosely-related visual clues from the image, which may introduce task-irrelevant noises or even errors. To address this problem, we propose to utilize entity-related prompts for extracting proper visual clues with a pre-trained vision-language model. To better integrate different modalities and address the popular semantic gap problem, we further propose a modality-aware attention mechanism for better cross-modal fusion. Experimental results on two benchmarks show that our MNER approach outperforms the state-of-the-art MNER approaches with a large margin.

This work was conducted when Min Gui worked at Alibaba.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://relatedwords.org.

References

  1. Chen, D., Li, Z., Gu, B., Chen, Z.: Multimodal named entity recognition with image attributes and image knowledge. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 186–201. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_12

    Chapter  Google Scholar 

  2. Chen, S., Aguilar, G., et al.: Can images help recognize entities? A study of the role of images for multimodal NER (2021)

    Google Scholar 

  3. Fu, J., Huang, X., Liu, P.: SpanNER: Named entity re-/recognition as span prediction. arXiv preprint arXiv:2106.00641 (2021)

  4. Li, X.L., Liang, P.: Prefix-Tuning: optimizing continuous prompts for generation. In: Proceedings of ACL, pp. 4582–4597 (2021)

    Google Scholar 

  5. Liu, C., Fan, H., Liu, J.: Span-based nested named entity recognition with pretrained language model. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 620–628. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_42

    Chapter  Google Scholar 

  6. Liu, H., Singh, P.: ConceptNet-a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004). https://doi.org/10.1023/B:BTTJ.0000047600.45421.6d

    Article  Google Scholar 

  7. Liu, P., Yuan, W., et al.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv:2107.13586 (2021)

  8. Liu, Y., Ott, M., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019)

  9. Lu, D., Neves, L., et al.: Visual attention model for name tagging in multimodal social media. In: Proceedings of ACL, pp. 1990–1999 (2018)

    Google Scholar 

  10. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  11. Moon, S., Neves, L., et al.: Multimodal named entity recognition for short social media posts. In: Proceedings of NAACL, pp. 852–860 (2018)

    Google Scholar 

  12. Sun, L., Wang, J., et al.: RpBERT: a text-image relation propagation-based BERT model for multimodal NER. In: Proceedings of AAAI, vol. 35 (2021)

    Google Scholar 

  13. Sun, L., Wang, J., et al.: RIVA: a pre-trained tweet multimodal model based on text-image relation for multimodal NER. In: COLING, pp. 1852–1862 (2020)

    Google Scholar 

  14. Wu, Z., Zheng, C., et al.: Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts. In: MM (2020)

    Google Scholar 

  15. Yamada, I., Asai, A., et al.: LUKE: deep contextualized entity representations with entity-aware self-attention. arXiv preprint arXiv:2010.01057 (2020)

  16. Yu, J., Jiang, J., et al.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of ACL (2020)

    Google Scholar 

  17. Zhang, D., Wei, S., et al.: Multi-modal graph fusion for named entity recognition with targeted visual guidance. In: Proceedings of AAAI, pp. 14347–14355 (2021)

    Google Scholar 

  18. Zhang, Q., Fu, J., et al.: Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of AAAI (2018)

    Google Scholar 

Download references

Acknowledgement

This research was supported by the National Key Research and Development Project (No. 2020AAA0109302), National Natural Science Foundation of China (No. 62072323), Shanghai Science and Technology Innovation Action Plan (No. 19511120400), Shanghai Municipal Science an Technology Major Project (No. 2021SHZDZX0103) and Alibaba Research Intern Program.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhixu Li or Yanghua Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X. et al. (2022). PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13247. Springer, Cham. https://doi.org/10.1007/978-3-031-00129-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-00129-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-00128-4

  • Online ISBN: 978-3-031-00129-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics