P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognition

Wang, Zhuang; Zhang, Yijia; An, Kang; Zhou, Xiaoying; Lu, Mingyu; Lin, Hongfei

doi:10.1007/978-981-99-6207-5_13

Zhuang Wang¹⁴,
Yijia Zhang¹⁴,
Kang An¹⁴,
Xiaoying Zhou¹⁴,
Mingyu Lu¹⁵ &
…
Hongfei Lin¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14232))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

297 Accesses

Abstract

Multimodal Named Entity Recognition (MNER) is a challenging task in social media due to the combination of text and image features. Previous MNER work has focused on predicting entity information after fusing visual and text features. However, pre-training language models have already acquired vast amounts of knowledge during their pre-training process. To leverage this knowledge, we propose a prompt network for MNER tasks (P-MNER). To minimize the noise generated by irrelevant areas in the image, we design a visual feature extraction model (FRR) based on FasterRCNN and ResNet, which uses fine-grained visual features to assist MNER tasks. Moreover, we introduce a text correction fusion module (TCFM) into the model to address visual bias during modal fusion. We employ the idea of a residual network to modify the fused features using the original text features. Our experiments on two benchmark datasets demonstrate that our proposed model outperforms existing MNER methods. P-MNER’s ability to leverage pre-training knowledge from language models, incorporate fine-grained visual features, and correct for visual bias, makes it a promising approach for multimodal named entity recognition in social media posts.

This work is supported by the National Natural Science Foundation of China (No. 61976124).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, C., Teng, Z., Wang, Z., Zhang, Y.: Discrete opinion tree induction for aspect-based sentiment analysis. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, pp. 2051–2064. Association for Computational Linguistics, May 2022. https://doi.org/10.18653/v1/2022.acl-long.145. https://aclanthology.org/2022.acl-long.145
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Computer Science (2015)
Google Scholar
Cui, Z., Kapanipathi, P., Talamadupula, K., Gao, T., Ji, Q.: Type-augmented relation prediction in knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 7151–7159 (2021)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2019)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gupta, N., Singh, S., Roth, D.: Entity linking via joint encoding of types, descriptions, and context. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2681–2690 (2017)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Joulin, A., Cissé, M., Grangier, D., Jégou, H., et al.: Efficient softmax approximation for GPUs. In: International Conference on Machine Learning, pp. 1302–1310. PMLR (2017)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)
Article Google Scholar
Lu, D., Neves, L., Carvalho, V., Zhang, N., Ji, H.: Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1990–1999 (2018)
Google Scholar
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534 (2011)
Google Scholar
Sun, L., Wang, J., Zhang, K., Su, Y., Weng, F.: RpBERT: a text-image relation propagation-based BERT model for multimodal NER. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13860–13868 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, X., et al.: PromptMNER: prompt-based entity-related visual clue extraction and integration for multimodal named entity recognition. In: Bhattacharya, A., et al. (eds.) Database Systems for Advanced Applications, DASFAA 2022. LNCS, vol. 13247. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-00129-1_24
Wu, Z., Zheng, C., Cai, Y., Chen, J., Leung, H., Li, Q.: Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1038–1046 (2020)
Google Scholar
Xiao, Z., Wu, J., Chen, Q., Deng, C.: BERT4GCN: using BERT intermediate layers to augment GCN for aspect-based sentiment classification. arXiv preprint arXiv:2110.00171 (2021)
Yu, J., Jiang, J., Yang, L., Xia, R.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 3342–3352. Association for Computational Linguistics (July 2020). https://doi.org/10.18653/v1/2020.acl-main.306. https://aclanthology.org/2020.acl-main.306
Zhang, Q., Fu, J., Liu, X., Huang, X.: Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Zhao, Z., et al.: Disease named entity recognition from biomedical literature using a novel convolutional neural network. BMC Med. Genomics 10, 75–83 (2017)
Article Google Scholar
Zheng, C., Wu, Z., Wang, T., Cai, Y., Li, Q.: Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Trans. Multimedia 23, 2520–2532 (2020)
Article Google Scholar
Zhuo, J., Cao, Y., Zhu, J., Zhang, B., Nie, Z.: Segment-level sequence modeling using gated recursive semi-Markov conditional random fields. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1413–1423 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, Dalian Maritime University, Dalian, China
Zhuang Wang, Yijia Zhang, Kang An & Xiaoying Zhou
College of Artificial Intelligence, Dalian Maritime University, Dalian, China
Mingyu Lu
College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Hongfei Lin

Authors

Zhuang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yijia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kang An
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hongfei Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yijia Zhang or Mingyu Lu .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Bing Qin
Fudan University, Shanghai, China
Xipeng Qiu
School of Computing and Information, Singapore Management University, Singapore, Singapore
Jiang Jing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Xianpei Han
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Yubo Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Zhang, Y., An, K., Zhou, X., Lu, M., Lin, H. (2023). P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognition. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2023. Lecture Notes in Computer Science(), vol 14232. Springer, Singapore. https://doi.org/10.1007/978-981-99-6207-5_13

Download citation

DOI: https://doi.org/10.1007/978-981-99-6207-5_13
Published: 20 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6206-8
Online ISBN: 978-981-99-6207-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognition