research-article

Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

Authors:

Xiaodong HeAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 3549 - 3558

https://doi.org/10.1145/3503161.3548427

Published: 10 October 2022 Publication History

Get Access

Abstract

Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.

Supplementary Material

MP4 File (MM22-fp3216.mp4)

There is a presentation video of the paper "Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition". In this video we will present the contributions, framework, algorithm details, experimental results and analysis of our paper, and conduct further qualitative analysis with two specific cases. We hope audiences can understand our work as soon as possible and our work can be discovered by more people, via this video.

Download
27.42 MB

References

[1]

Omer Arshad, Ignazio Gallo, Shah Nawaz, and Alessandro Calefati. 2019. Aiding intra-text representations with visual context for multimodal named entity recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 337--342.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Fine-Grained Multimodal Named Entity Recognition and Grounding with a Generative Framework

Learning from Different text-image Pairs: A Relation-enhanced Graph Convolutional Network for Multimodal NER

SpanMRC: Query with Entity Length for MRC-Based Named Entity Recognition

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations